Walmart Sales Insights

Welcome to the Walmart Sales Insights repository! This project involves analyzing and visualizing Walmart sales data to derive meaningful insights and answer key business questions. The analysis is performed using Python with libraries such as pandas, matplotlib, and seaborn.

Table of Contents

Introduction

Walmart Sales Insights is a project aimed at exploring Walmart sales data to identify trends, patterns, and insights that can help in decision-making processes. The project involves data preprocessing, analysis, and visualization to provide a comprehensive overview of sales performance across various dimensions such as time, store locations, and external factors.

Features

  • Data preprocessing and cleaning
  • Trend analysis of total weekly sales over time
  • Comparison of sales during holiday and non-holiday weeks
  • Identification of top-performing and underperforming stores
  • Analysis of the impact of external factors (temperature, fuel price, CPI, unemployment) on sales
  • Visualization of sales distribution and correlation between variables
  • Heatmap of sales by store and month
  • Monthly and yearly sales trends
  • Interactive and static visualizations

Data

The dataset used in this project includes the following columns:

  • Store: The store number
  • Date: The week end date
  • Weekly_Sales: Sales for the given store and week
  • Holiday_Flag: Indicates whether the week is a special holiday week (1) or not (0)
  • Temperature: Temperature in the region
  • Fuel_Price: Cost of fuel in the region
  • CPI: Consumer Price Index
  • Unemployment: Unemployment rate

Installation

To run this project, you need to have Python installed along with the necessary libraries. You can install the required:

pip install pandas matplotlib seaborn psycopg2

Usage

  1. Load Data: Load the Walmart sales data from the CSV file.

    import pandas as pd
    
    # Load data from CSV file into a DataFrame
    csv_file_path = 'path_to_your_csv_file.csv'
    df = pd.read_csv(csv_file_path)
    
    # Convert 'Date' column to datetime format
    df['Date'] = pd.to_datetime(df['Date'], format='%d-%m-%Y')
  2. Data Preprocessing: Perform necessary data preprocessing steps such as converting date formats and extracting year and month.

    # Extract Year and Month from Date
    df['Year'] = df['Date'].dt.year
    df['Month'] = df['Date'].dt.month
  3. Visualization: Create various visualizations to analyze the data.

    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # Overall Sales Trends
    plt.figure(figsize=(12, 6))
    sns.lineplot(data=df, x='Date', y='Weekly_Sales', estimator='sum')
    plt.title('Total Weekly Sales Over Time')
    plt.xlabel('Date')
    plt.ylabel('Total Weekly Sales')
    plt.show()
    
    # Add more visualizations as needed...
  4. Run Analysis: Execute the Python script to generate insights and visualizations.

Visualizations

The project includes the following visualizations:

  1. Total Weekly Sales Over Time
  2. Weekly Sales: Holiday vs Non-Holiday Weeks
  3. Top Performing Stores
  4. Sales vs. Temperature
  5. Sales vs. Fuel Price
  6. Sales vs. CPI
  7. Sales vs. Unemployment
  8. Total Monthly Sales
  9. Heatmap of Sales by Store and Month
  10. Distribution of Weekly Sales
  11. Sales Trends by Store
  12. Total Yearly Sales
  13. Correlation Matrix Heatmap
  14. Sales Trends on Holidays vs. Non-Holidays

Contributing

Contributions are welcome! If you have any suggestions or improvements, feel free to open an issue or submit a pull request.

  1. Fork the repository.
  2. Create a new branch: git checkout -b my-feature-branch.
  3. Make your changes and commit them: git commit -m 'Add some feature'.
  4. Push to the branch: git push origin my-feature-branch.
  5. Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more details.