malmusfer
Experienced Engineer Fascinated with technical transformation and data analysis. Skilled in Project Management, and Business Development.
Riyadh
Pinned Repositories
Animated-plots
This is a mini project that I will illustrate the use of animation function in Matplotlib to communicate your finding. In this project, we will use the data of COVID 19 for the purpose of the demonstration.
Arabic-labeling
This is a mini project that I will illustrate the use of the Arabic-reshaper library in the labeling plots to communicate your finding for a specifics audience. In this project, we will create random data for the purpose of the demonstration.
Bike_sharing_data_analysis
# 2019 Metro Bike Share Data Exploration and Visualization ## Dataset [Metro Bike Share] (https://en.wikipedia.org/wiki/Metro_Bike_Share) is a bicycle sharing system in the Los Angeles, California metropolitan area. The service was launched on July 7, 2016. It is administered by the Los Angeles County Metropolitan Transportation Authority (Metro) and is operated by Bicycle Transit Systems.[ The system uses a fleet of about 1,400 bikes and includes 93 stations in Downtown Los Angeles, Venice, and the Port of Los Angeles. The dataset used for this exploratory analysis consists of [monthly individual trip data](https://bikeshare.metro.net/about/data/) from January 2019 to December 2019 in CSV format. ##### Data wrangling process: - fix multiple fields that are not in the correct dtype, i.e. `start_time`, `end_time` should be DateTime type, `passholder_type` should be categorical data type, etc - add new columns for the day of week and month - filter out outlier trip records where the duration was very long ## Summary of Findings I could define the riders into two groups, Royal customers, (annual, and monthly- pass holders) and regular customers (single-pass holders). In general, there were more trips on workdays (Mon-Fri) compared to weekends. Summar time was the most popular season of a year, likely due to the weather. The riding trips tend to be shorter on Monday through Friday compared to weekends. It indicates a pretty stable and efficient usage of the bike-sharing system on normal workdays, while more casual flexible use on weekends. The users who have a monthly pass are the main end-users with the highest trips per year. And more than half of the total trips were done by using a standard type of bikes. Indeed, More than two-third of trips were booked a one-way trip, larger than a round trip. ## Key Insights for Presentation Different usage patterns/habits between all types of riders are seen from the exploration. Royal customers, (annual, and monthly- pass holders) use the system heavily on workdays i.e. Monday through Friday whereas customers ride a lot on weekends. The efficient/short period of usage for royal customers corresponds to their high concentration on rush hours Monday through Friday, indicating the use is primarily for the work commute. The more relaxing and flexible pattern of customer use shows that they're taking advantage of the bike-sharing system quite different from the regular customers, heavily over weekends, for city tour or leisure purpose probably.
Capstone-Project-Notebook
Enron_fraud_regression
This is a mini-project in Udacity, Intro to Machine Learning, exploratory analysis for Enron fraud data
finding_donors
fuel_consumption_Regression_Modeling
About this Notebook In this notebook, we learn how to use scikit-learn to implement simple linear regression. We download a dataset that is related to fuel consumption and Carbon dioxide emission of cars. Then, we split our data into training and test sets, create a model using training set, evaluate your model using test set, and finally use model to predict unknown value.
Project-Submission-Project-2
UDACITY Data Analysis Nanodegree Program - Project 1 (Project: Investigate a Dataset - No Show Appointments))
Project3_A_B_test
For this project, we will be working to understand the results of an A/B test run by an e-commerce website. The company has developed a new web page in order to try and increase the number of users who "convert," meaning the number of users who decide to pay for the company's product. Our goal is to work through this notebook to help the company understand if they should implement this new page, keep the old page, or perhaps run the experiment longer to make their decision.
Twitter_Data_Wrangling
I completed this project as part of Udacity's Data Analyst Nanodegree. The project is based around the "WeRateDogs" Twitter page, a page which will kindly rate pictures and videos of dogs out of ten. Since dogs are all round fantastic creatures, all of WeRateDogs’ ratings are above ten. They also tag each dog with a different category out of “doggo”, “floofer”, “pupper”, or “puppo”. An archive of this Twitter data for WeRateDogs’ tweets was provided for this project as a CSV file. Two more sources of data were also gathered as part of this project: predictions for which type of dog is present in each picture (carried out previously, not by myself, by being passed through an image classification algorithm) and additional tweet information acquired from Twitter. I approached this project using the three steps of data wrangling: gather, assess, clean. In the gather phase, the image prediction data was downloaded using Python's Requests library. The additional Twitter information (i.e. retweet and favorite counts) was downloaded using the Twitter API. In the following assess step, I then inspected the generated data frames in order to find any quality or tidiness issues. The cleaning step subsequently involved implementing steps to fix the quality and tidiness issues that were previously identified. Following the data wrangling process, and some exploration and analysis of the (now clean and tidy) data, was carried out, numerous interesting results were observed.
malmusfer's Repositories
malmusfer/fuel_consumption_Regression_Modeling
About this Notebook In this notebook, we learn how to use scikit-learn to implement simple linear regression. We download a dataset that is related to fuel consumption and Carbon dioxide emission of cars. Then, we split our data into training and test sets, create a model using training set, evaluate your model using test set, and finally use model to predict unknown value.
malmusfer/Arabic-labeling
This is a mini project that I will illustrate the use of the Arabic-reshaper library in the labeling plots to communicate your finding for a specifics audience. In this project, we will create random data for the purpose of the demonstration.
malmusfer/Animated-plots
This is a mini project that I will illustrate the use of animation function in Matplotlib to communicate your finding. In this project, we will use the data of COVID 19 for the purpose of the demonstration.
malmusfer/Bike_sharing_data_analysis
# 2019 Metro Bike Share Data Exploration and Visualization ## Dataset [Metro Bike Share] (https://en.wikipedia.org/wiki/Metro_Bike_Share) is a bicycle sharing system in the Los Angeles, California metropolitan area. The service was launched on July 7, 2016. It is administered by the Los Angeles County Metropolitan Transportation Authority (Metro) and is operated by Bicycle Transit Systems.[ The system uses a fleet of about 1,400 bikes and includes 93 stations in Downtown Los Angeles, Venice, and the Port of Los Angeles. The dataset used for this exploratory analysis consists of [monthly individual trip data](https://bikeshare.metro.net/about/data/) from January 2019 to December 2019 in CSV format. ##### Data wrangling process: - fix multiple fields that are not in the correct dtype, i.e. `start_time`, `end_time` should be DateTime type, `passholder_type` should be categorical data type, etc - add new columns for the day of week and month - filter out outlier trip records where the duration was very long ## Summary of Findings I could define the riders into two groups, Royal customers, (annual, and monthly- pass holders) and regular customers (single-pass holders). In general, there were more trips on workdays (Mon-Fri) compared to weekends. Summar time was the most popular season of a year, likely due to the weather. The riding trips tend to be shorter on Monday through Friday compared to weekends. It indicates a pretty stable and efficient usage of the bike-sharing system on normal workdays, while more casual flexible use on weekends. The users who have a monthly pass are the main end-users with the highest trips per year. And more than half of the total trips were done by using a standard type of bikes. Indeed, More than two-third of trips were booked a one-way trip, larger than a round trip. ## Key Insights for Presentation Different usage patterns/habits between all types of riders are seen from the exploration. Royal customers, (annual, and monthly- pass holders) use the system heavily on workdays i.e. Monday through Friday whereas customers ride a lot on weekends. The efficient/short period of usage for royal customers corresponds to their high concentration on rush hours Monday through Friday, indicating the use is primarily for the work commute. The more relaxing and flexible pattern of customer use shows that they're taking advantage of the bike-sharing system quite different from the regular customers, heavily over weekends, for city tour or leisure purpose probably.
malmusfer/Capstone-Project-Notebook
malmusfer/Enron_fraud_regression
This is a mini-project in Udacity, Intro to Machine Learning, exploratory analysis for Enron fraud data
malmusfer/finding_donors
malmusfer/Project-Submission-Project-2
UDACITY Data Analysis Nanodegree Program - Project 1 (Project: Investigate a Dataset - No Show Appointments))
malmusfer/Project3_A_B_test
For this project, we will be working to understand the results of an A/B test run by an e-commerce website. The company has developed a new web page in order to try and increase the number of users who "convert," meaning the number of users who decide to pay for the company's product. Our goal is to work through this notebook to help the company understand if they should implement this new page, keep the old page, or perhaps run the experiment longer to make their decision.
malmusfer/Twitter_Data_Wrangling
I completed this project as part of Udacity's Data Analyst Nanodegree. The project is based around the "WeRateDogs" Twitter page, a page which will kindly rate pictures and videos of dogs out of ten. Since dogs are all round fantastic creatures, all of WeRateDogs’ ratings are above ten. They also tag each dog with a different category out of “doggo”, “floofer”, “pupper”, or “puppo”. An archive of this Twitter data for WeRateDogs’ tweets was provided for this project as a CSV file. Two more sources of data were also gathered as part of this project: predictions for which type of dog is present in each picture (carried out previously, not by myself, by being passed through an image classification algorithm) and additional tweet information acquired from Twitter. I approached this project using the three steps of data wrangling: gather, assess, clean. In the gather phase, the image prediction data was downloaded using Python's Requests library. The additional Twitter information (i.e. retweet and favorite counts) was downloaded using the Twitter API. In the following assess step, I then inspected the generated data frames in order to find any quality or tidiness issues. The cleaning step subsequently involved implementing steps to fix the quality and tidiness issues that were previously identified. Following the data wrangling process, and some exploration and analysis of the (now clean and tidy) data, was carried out, numerous interesting results were observed.
malmusfer/github-example
malmusfer/Identify_Customer_Segments
malmusfer/Image_classifer
Intro to Machine Learning - TensorFlow Project Project code for Udacity's Intro to Machine Learning with TensorFlow Nanodegree program. In this project, you will first develop code for an image classifier built with TensorFlow, then you will convert it into a command line application. In order to complete this project, you will need to use the GPU enabled workspaces within the classroom. The files are all available here for your convenience, but running on your local CPU will likely not work well. You should also only enable the GPU when you need it. If you are not using the GPU, please disable it so you do not run out of time! Data The data for this project is quite large - in fact, it is so large you cannot upload it onto Github. If you would like the data for this project, you will want download it from the workspace in the classroom. Though actually completing the project is likely not possible on your local unless you have a GPU. You will be training using 102 different types of flowers, where there ~20 images per flower to train on. Then you will use your trained classifier to see if you can predict the type for new images of the flowers.
malmusfer/Project-Submission--Project-1
UDACITY Data Analysis Nanodegree Program - Project 1 (Exploring Weather Trends)
malmusfer/scraping-wiki--Toronto_Postcode
For this assignment, you will be required to explore and cluster the neighborhoods in Toronto. Start by creating a new Notebook for this assignment. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below: 3. To create the above dataframe: The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned. More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough. Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making. In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe. 4. Submit a link to your Notebook on your Github repository. (10 marks) Note: There are different website scraping libraries and packages in Python. For scraping the above table, you can simply use pandas to read the table into a pandas dataframe. Another way, which would help to learn for more complicated cases of web scraping is using the BeautifulSoup package. Here is the package's main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/ The package is so popular that there is a plethora of tutorials and examples on how to use it. Here is a very good Youtube video on how to use the BeautifulSoup package: https://www.youtube.com/watch?v=ng2o98k983k Use pandas, or the BeautifulSoup package, or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe.
malmusfer/scraping-wiki--Toronto_Postcode-
malmusfer/SQL-Query-a-Digital-Music-Store
For this mini-project, We will help the Chinook team understand the media in their store, their customers and employees, and their invoice information.