Notes: ## FinalCabData.csv is a merged version of all the data, it is not posted here because it is too large. ## ---------------------------------------------------------------------------------------------------------------- The Client XYZ is a private firm in US. Due to remarkable growth in the Cab Industry in last few years and multiple key players in the market, it is planning for an investment in Cab industry and as per their Go-to-Market(G2M) strategy they want to understand the market before taking final decision. Project delivery: You have been provided with multiple data sets that contains information on 2 cab companies. Each file (data set) provided represents different aspects of the customer profile. XYZ is interested in using your actionable insights to help them identify the right company to make their investment. The outcome of your delivery will be a presentation to XYZ’s Executive team. This presentation will be judged based on the visuals provided, the quality of your analysis and the value of your recommendations and insights. Data Set: You have been provided 4 individual data sets. Time period of data is from 31/01/2016 to 31/12/2018. Below are the list of datasets which are provided for the analysis: Cab_Data.csv – this file includes details of transaction for 2 cab companies Customer_ID.csv – this is a mapping table that contains a unique identifier which links the customer’s demographic details Transaction_ID.csv – this is a mapping table that contains transaction to customer mapping and payment mode City.csv – this file contains list of US cities, their population and number of cab users You should fully investigate and understand each data set. Review the Source Documentation Understand the field names and data types Identify relationships across the files Field/feature transformations Determine which files should be joined versus which ones should be appended Create master data and explain the relationship Identify and remove duplicates Perform other analysis like NA value and outlier detection Whatever and how many slides you prepare(Be creative and come up with meaningful insight): The idea is to create a hypothesis, engage with the data, think critically, and use various analytical approaches to produce unique insights. You are not limited to only utilizing the data you have been provided. We encourage you to find third party data sets which correspond to the overall theme and geographical properties of the data provided. For Example: you can leverage US holiday data/ weather data Also, do research on overall cab industry in US and try to relate that with the trend in data Analysis Create multiple hypothesis and investigate: You will need to generate 5-7 hypothesis initially to investigate as some will not prove what you are expecting. For Example: “Is there any seasonality in number of customers using the cab service?” Areas to investigate: Which company has maximum cab users at a particular time period? Does margin proportionally increase with increase in number of customers? What are the attributes of these customer segments? Although not required, we encourage you to document the process and findings What is the business problem? What are the properties of the data provided (data intake report) What steps did you take in order to create an applicable data set? How did you prepare and perform your analysis? What type of analysis did you perform? Why did you choose to use certain analytical techniques over others? What were the results? Prepare a presentation that summarizes your analysis and recommendations and identify which company is performing better and is a better investment opportunity for XYZ. Deliverables of Week 2 are: 1. EDA Notebook (ipynb file) 2. Data Intake report (pdf file) 3. EDA recommendation and hypothesis results ( It should be in the ipynb notebook and you don't need not to present separate document) You can use either EDA or Modeling and EDA both to deliver the result. Remember, there are no wrong answers as long as the data supports them. Note: Sample presentation of previous batch intern, for your reference is added
Gallo13/Insight-For-Cab-Investment
Data Glacier Week 2 Data Analysis and Business Analysis for cab companies for investor
Jupyter Notebook