In this project, I
made use of Python to exploring data related to bike share systems for three
major cities in the United States—Chicago, New York City, and Washington.
I wrote code to import the data and answer interesting questions about it by computing
descriptive statistics. I also wrote a script that takes in raw input to create
an interactive experience in the terminal to present these statistics.
Randomly selected data
for the first six months of 2017 are provided for all three cities. All three
of the data files contain the same core six (6) columns:
- Start Time (e.g., 2017-01-01 00:07:57)
- End Time (e.g., 2017-01-01 00:20:53)
- Trip Duration (in seconds - e.g., 776)
- Start Station (e.g., Broadway & Barry Ave)
- End Station (e.g., Sedgwick St & North Ave)
- User Type (Subscriber or Customer)
The Chicago and New
York City files also have the following two columns:
- Gender
- Birth
Year
Data for the first 10 rides in the new_york_city.csv file
Statistics needed to
be computed:
In this project, I wrote code to provide the following information:
#1 Popular times of travel (i.e., occurs most often at the start
time)
- most
common month
- a most common day of the week
- most
common hour of the day
#2 Popular stations and trip
- most
common start station
- most
common end station
- most
common trip from start to end (i.e., most frequent combination of start
station and end station)
#3 Trip duration
- total
travel time
- average
travel time
#4 User info
- counts
of each user type
- counts
of each gender (only available for NYC and Chicago)
- earliest,
most recent, most common year of birth (only available for NYC and
Chicago)
Tools used:
- Python
3, NumPy, and pandas installed using Anaconda
- A
text editor (Atom).
- A
terminal application (Gitbash).
Results:
In this project, I wrote code to provide the required information, I have appropriately handled the unavailability of gender and birth year columns in Washington data.
I used Descriptive statistics to answer the questions posed about the data. Raw data is displayed upon request by the user in this manner: Script should prompt the user if they want to see 5 lines of raw data, display that data if the answer is "yes", and continue these prompts and displays until the user says 'no'.
You can find the full results of this project here.
No comments:
Post a Comment