This dataset provides comprehensive sales data for electronics products on Amazon. It includes user ratings, product categories, and the time of sale. This data is crucial for understanding consumer behavior and market trends in the electronics segment on Amazon. The dataset can be downloaded here.
Our aim is to use Python libraries such as Pandas, Numpy, Matplotlib, and Seaborn for an in-depth analysis of the sales data. The dataset contains extensive information on electronics store purchases, including monthly breakdowns, product types, costs, purchase addresses, and more. We will conduct this analysis in a Jupyter Notebook environment.
EDA helps us to understand the dataset by identifying patterns, anomalies, testing hypotheses, and validating assumptions using summary statistics and graphical representations.
- Import necessary libraries (Pandas, Numpy, Matplotlib, Seaborn).
- Load the dataset.
- Use
.head()
and.tail()
functions to view the first and last five rows. - Use
.shape
to understand the structure of the dataset. Our dataset has 1,292,954 observations and 10 features. - Examine the data types of each column and check for null values using
.info()
.
- User ID (int64)
- Product ID (object)
- Rating (int64)
- Timestamp (int64)
- Category (object)
Necessary conversions for accurate data representation (e.g., converting Timestamp from int64 to a proper timestamp format).
Using .describe()
to get a statistical overview of the dataset, including measures like mean, median, standard deviation, and percentile values.
Identifying the count of unique users and items.
Understanding why data might be missing and addressing these gaps is crucial for a comprehensive analysis.
Utilizing Matplotlib and Seaborn for graphical representation of data. Analyzing trends such as best year for sales, best month for sales, top-selling brands and products, category-wise sales performance, and product ratings.
Key findings like the peak sales year, top-selling categories and brands, sales trends over the years, and consumer ratings.
Data Science: Python, Pandas, Numpy. Data Visualization: Matplotlib, Seaborn. Environment: Jupyter Notebook.
Note: Replace [link-to-data-set]
with the actual URL where the dataset can be downloaded. Ensure that all analysis steps and conclusions are based on the actual data available in the dataset. This README should be adapted as per the findings and specifics of the dataset.