Saturday, January 15, 2022

Soccer Dataset Analysis & Visualization


Soccer Dataset was stored in 7 tables as a SQL database, each table has some data related to match, team and their attributes, player and their attributes. I got the table names and number of rows & columns:



I'll explore these tables and select the important columns for me and start my analysis journey. Questions: 1- Is there any relationship between the height & the weight of the players? 2- Who is the best finisher and who is the fastest player? 3- Is there any relationship between the finishing score & the penalties of the players? 4- What is the preferred foot for the players? 5- What is the relation between the player's age and his overall rating? 6- What is the percentage of the attack & defense work rate? 7- What is the distribution of players' age, putting the preferred foot in consideration? 8- Which league has the maximum & minimum goals? 9- Which team has scored the maximum goals on his land during our timeframe? 10 - What teams improved their defense over the time period in the Switzerland Super League?


Solution:
I used Pandas & Numpy to make a data wrangling process to the dataset first and come up with about 20 points to be cleaned and put in a cleaning summary. Then started my exploratory analysis to find out some statistics about the numerical columns of the data. And finally, I started answering the pre-specified questions and used Seaborn & Matplotlip to plot charts that convey my messages. Insights & Conclusions: We have a player who is 49 years old. The shortest player is 157.48 cm. The weight range between 117 & 243 lbs. Height, Weight, and Age are forming a normal distribution.
Most of the player's ages are between 25 & 30. The correlation between weight & height is positive and strong. We can neglect the relationship between age vs. weight & height., it is really weak but positive. It is clear that there is a positive trend between the height & weight of the players with a correlation coefficient of 0.77. Lionel Messi is the best finisher with a score of 97. David Odonkor is the fastest player with a score of 97. There is a strong positive relationship between finishing & penalties of the players. There are a few players who have very low finishing scores and significant-high penalities scores and vice versa. 76% of the players prefer the right foot. There may be high demand for left-foot players. Players with an overall rating of +90 are falling between 22 & 35 years. I can see two trends above 70 ratings, increasing one starting from age 16 to 28, then decreasing one from 28 to 44. The medium defense & attacking work rate is the most percentage. High work rate has bigger percentage than low work rate in attack & defense. High attacking work rate is more than the high defensive work rate. I can see that the two feet have the same age distribution. It makes sense that most of the matches have home and away goals =< 2. The max is significantly high, it's must be a powerful team. Spain LIGA BBVA has the maximum goals with 8412 goals and Switzerland Super League has the minimum goals with 4166. It is clear that each league team scores more goals on their lands. All team attribute minimums are around the '20s and No maximum exceeds 80. All teams that have defense scores above 53 went down over years. The progress done from FC Sion was remarkable from 40 to 50. FC Basel can be condidered as the best in improving because it kept its place











 I'll explore these tables and select the important columns for me and start my analysis journey. Questions: 1- Is there any relationship between the height & the weight of the players? 2- Who is the best finisher and who is the fastest player? 3- Is there any relationship between the finishing score & the penalties of the players? 4- What is the preferred foot for the players? 5- What is the relation between the player's age and his overall rating? 6- What is the percentage of the attack & defense work rate? 7- What is the distribution of players' age, putting the preferred foot in consideration? 8- Which league has the maximum & minimum goals? 9- Which team has scored the maximum goals on his land during our timeframe? 10 - What teams improved their defense over the time period in the Switzerland Super League?

Saturday, June 5, 2021

Explore US Bikeshare Data - Project 1 in Data Analysis

In this project, I made use of Python to exploring data related to bike share systems for three major cities in the United States—Chicago, New York City, and Washington. I wrote code to import the data and answer interesting questions about it by computing descriptive statistics. I also wrote a script that takes in raw input to create an interactive experience in the terminal to present these statistics.

 

Divvy is a bicycle sharing system in the City of Chicago and two adjacent suburbs (image: Wikipedia)

The Datasets

Randomly selected data for the first six months of 2017 are provided for all three cities. All three of the data files contain the same core six (6) columns:

  • Start Time (e.g., 2017-01-01 00:07:57)
  • End Time (e.g., 2017-01-01 00:20:53)
  • Trip Duration (in seconds - e.g., 776)
  • Start Station (e.g., Broadway & Barry Ave)
  • End Station (e.g., Sedgwick St & North Ave)
  • User Type (Subscriber or Customer)

The Chicago and New York City files also have the following two columns:

  • Gender
  • Birth Year

Data for the first 10 rides in the new_york_city.csv file

Statistics needed to be computed:

In this project, I wrote code to provide the following information:

#1 Popular times of travel (i.e., occurs most often at the start time)

  • most common month
  • a most common day of the week
  • most common hour of the day

#2 Popular stations and trip

  • most common start station
  • most common end station
  • most common trip from start to end (i.e., most frequent combination of start station and end station)

#3 Trip duration

  • total travel time
  • average travel time

#4 User info

  • counts of each user type
  • counts of each gender (only available for NYC and Chicago)
  • earliest, most recent, most common year of birth (only available for NYC and Chicago)

Tools used:

  • Python 3, NumPy, and pandas installed using Anaconda
  • A text editor  (Atom).
  • A terminal application (Gitbash).

Results:

In this project, I wrote code to provide the required information, I have appropriately handled the unavailability of gender and birth year columns in Washington data.

I used Descriptive statistics to answer the questions posed about the data. Raw data is displayed upon request by the user in this manner: Script should prompt the user if they want to see 5 lines of raw data, display that data if the answer is "yes", and continue these prompts and displays until the user says 'no'.

You can find the full results of this project here.

   A sample for the results

Saturday, April 3, 2021

The golden triangle to improve your business


I used to work as a sales engineer in the heavy equipment field, I’ve enjoyed this career and really loved it. As you all know, the main target of a salesperson is to find a customer who is interested in the product/service that he worked for, then go through the customer journey with him.

The first approach I learned in finding a customer was the cold calls 📱 which are very boring for you and maybe annoying for the prospected customer. after that I knew about exhibitions which is much better than cold calls but still, it is consuming time and money to show your products in the exhibition. 

And here I start asking myself, there must be an easier way to find customer 😏…

Day by day, finding customer tactics improved to save money and time and communicate the businesses to their real customers, and that was digital marketing 📣. At this point I decided to know more about this field to enhance my career which improved to be business development manager, that was a perfect decision for my career and personal life as well, I start feeling that it is the new version of me who can think out of the box and reach special customers in a timely effective way and with better results.

Digital marketing is an amazing field to know about, even if you are not working as a salesperson, it could help you market for yourself, your own business, your services as a freelancer …etc. To improve your marketing work, you have to analyze the insights and retarget the audience, change your content, or restart from the beginning.

At that moment, I released that data analysis would complete the triangle of success from my point of view (Digital Marketing, Sales and Data Analysis). I started to take data analysis courses with Udacity also and I really found my passion working with data and numbers, I finished the three tracks of data analysis and got familiar with new techniques that enabled  me to understand the insights I got from sales and marketing and also create professional reports prepared with proper graphs and charts to tell a story and lead the decision maker to take his decision based on numbers not just suggestions, then I took a look of digital marketing content in FWD scholarship powered by Udacity and I found that there are new lessons for me which were not available in my last course of digital marketing and I found that a perfect opportunity for me to master these skills (E-mail Marketing, SEO and SEM).

 

Now I’m going through the professional track of digital marketing and hope to reach the advanced track soon.

My final words are the golden triangle to develop certain business consists of Digital Marketing, Sales, and Data Analysis.

Good Luck 😉😉

Soccer Dataset Analysis & Visualization

Soccer Dataset was stored in 7 tables as a SQL database, each table has some data related to match, team and their attributes, player and th...