Analyzing data relating to a startup allows for us to be able to better understand which factors cause a startup to fail or succeed.
The analysis of a startup involves evaluating various aspects of the company to determine its potential for success and investment value. There are several factors that influence the company’s success such as business model, market opportunity, management team, implementation of startup principles, leadership, etc.
The goal of this analysis is to help investors, entrepreneurs and stakeholders to make informed decisions about potential for success and return on investment for a startup company.
- data: contains the raw dataset
- doc: contains the group presentation and project dependencies requirement.
- notebook: contains the Jupyter Notebook from each team memeber's contribution
- src: contains the Python file with respect to the Jupyter Notebook
Startup Analysis Dataset (https://www.kaggle.com/datasets/ajaygorkar/startup-analysis).
The dataset consists of a .csv file, which contains various information about 472 startups like Success, Industry of company, Country of Origin, Experience of founders etc.
This work is accomplished by doing data analysis on a dataset of 472 different companies. With the 115 different attributes that are listed for these companies, useful information can be discovered which will draw conclusions on what makes a startup successful.
In the presentiation, we illustrate analysis of startup success based on:
- Geography
- Business Strategy and Product Management
- Founders Previous Experience
- Exposure across Globe
- Time to Market Service or Product
- Google Page Rank of Company Website
- Number of Direct Competitors
- Last Round of Funding Received
Due to the limited time of presentation, not all attribute analysis is included in the presentation slides. The full attribute analysis could be found in the Jupyter Notebooks.
We count the number of missing values for each attribute. If the number of missing values is greater than 30% of the total number of values for that attribute, we drop this attribute. If the number of missing values is fewer than 30%, we replace the missing values with the mode for categorical atrribute and mean for numerical atrribute.
We use boxplot to visualize data skew and calculate the skew magnitude. After identifying which attribute data has high data skew, we utilize the quartile info for handling outliers. We set the lower and upper bound of valid data by using Inter Quartile Range between Q3 and Q1 as a filter.
We use Pie Chart, Bar Plot, Density Distribution Plot, and Multi-variate Heatmap.