Saturday, June 5, 2021

Explore US Bikeshare Data - Project 1 in Data Analysis

In this project, I made use of Python to exploring data related to bike share systems for three major cities in the United States—Chicago, New York City, and Washington. I wrote code to import the data and answer interesting questions about it by computing descriptive statistics. I also wrote a script that takes in raw input to create an interactive experience in the terminal to present these statistics.

 

Divvy is a bicycle sharing system in the City of Chicago and two adjacent suburbs (image: Wikipedia)

The Datasets

Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core six (6) columns:

  • Start Time (e.g., 2017-01-01 00:07:57)
  • End Time (e.g., 2017-01-01 00:20:53)
  • Trip Duration (in seconds - e.g., 776)
  • Start Station (e.g., Broadway & Barry Ave)
  • End Station (e.g., Sedgwick St & North Ave)
  • User Type (Subscriber or Customer)

The Chicago and New York City files also have the following two columns:

  • Gender
  • Birth Year

Data for the first 10 rides in the new_york_city.csv file

Statistics needed to be computed:

In this project, I wrote code to provide the following information:

#1 Popular times of travel (i.e., occurs most often at the start time)

  • most common month
  • a most common day of the week
  • most common hour of the day

#2 Popular stations and trip

  • most common start station
  • most common end station
  • most common trip from start to end (i.e., most frequent combination of start station and end station)

#3 Trip duration

  • total travel time
  • average travel time

#4 User info

  • counts of each user type
  • counts of each gender (only available for NYC and Chicago)
  • earliest, most recent, most common year of birth (only available for NYC and Chicago)

Tools used:

  • Python 3, NumPy, and pandas installed using Anaconda
  • A text editor  (Atom).
  • A terminal application (Gitbash).

Results:

In this project, I wrote code to provide the required information, I have appropriately handled the unavailability of gender and birth year columns in Washington data.

I used Descriptive statistics to answer the questions posed about the data. Raw data is displayed upon request by the user in this manner: Script should prompt the user if they want to see 5 lines of raw data, display that data if the answer is "yes", and continue these prompts and displays until the user says 'no'.

You can find the full results of this project here.

   A sample for the results

No comments:

Post a Comment

Soccer Dataset Analysis & Visualization

Soccer Dataset was stored in 7 tables as a SQL database, each table has some data related to match, team and their attributes, player and th...