This repository contains data visualization programs on various datasets done using python.
--> Data visualization is the graphical representation of information and data in a pictorial or graphical format(Example: charts, graphs, and maps).
--> Data visualization tools provide an accessible way to see and understand trends, patterns in data, and outliers.
--> Data visualization tools and technologies are essential to analyzing massive amounts of information and making data-driven decisions.
--> The concept of using pictures is to understand data that has been used for centuries. General types of data visualization are Charts, Tables, Graphs, Maps, Dashboards.
--> Python is a high-level, general-purpose, and very popular programming language.
--> Python programming language (latest Python 3) is being used in web development, Machine Learning applications, along with all cutting-edge technology in Software Industry.
--> Python is available across widely used platforms like Windows, Linux, and macOS.
--> The biggest strength of Python is huge collection of standard library.
--> Colaboratory, or βColabβ for short, is a product from Google Research which allows anybody to write and execute python code in Jupyter notebook through the browser.
--> Visit colab at:
--> Create account using google account.
--> Once account creation is done, we can directly start coding in colab.
--> It supports Python and R.
--> Files are directly saved in Google Drive.
-
Download the House Pricing dataset from Kaggle and map the values to Aesthetics.
-
Use different Color scales on the Rainfall Prediction dataset.
-
Create different Bar plots for variables in any dataset.
-
Show an example of Skewed data and removal of skewedness.
-
For a sales dataset do a Time Series Visualization.
-
Build a Scatterplot and suggest dimension reduction.
-
Use Geospatial Data-Projections on datasets.
-
Create the a trend line with a confidence band in any suitable dataset.
-
Illustrate Partial Transparency and Jittering.
-
Illustrate usage of different color codes.
To install python library this command is used-
pip install library_name
--> Dataset is taken from:
--> CSV file which contains house pricing data.
--> Price of house with respect to area and other basic amenties.
--> Dataset is taken from:
--> CSV file which contains the rainfall data.
--> Sub-division wise monthly data for 115 years from 1901-2015.
--> Dataset is taken from:
--> Business financial data provides sales, purchases, salaries and wages, and operating profit estimates for most market industries in New Zealand, and information on stocks for selected industries.
--> This collection uses a combination of survey, tax, and other administrative data.
--> Dataset is taken from:
--> CSV file which contains the sales data.
--> Dataset is taken from:
--> Dataset of minerals found around the world.
--> Dataset is taken from: π
--> This contains data about various automobile in Comma Separated Value (CSV) format.
--> CSV file contains the details of automobile-mileage,length,body-style among other attributes.
--> It contains the following dimensions-[60 rows X 6 columns].
--> The csv file is already preprocessed ,thus their is no need for data cleaning.
--> Dataset is taken from: π
--> This contains data about various NBA Players in Comma Separated Value (CSV) format.
--> CSV file contains the details of players-height,weight,team,position among other attributes.
--> It contains the following dimensions-[457 rows X 9 columns].
--> The csv file is already preprocessed ,thus their is no need for data cleaning.
Short Description about all libraries used.
- NumPy (Numerical Python) β Enables with collection of mathematical functions to operate on array and matrices.
- Pandas (Panel Data/ Python Data Analysis) - This library is mostly used for analyzing, cleaning, exploring, and manipulating data.
- Matplotlib - It is a data visualization and graphical plotting library.
- Seaborn - It is an extension of Matplotlib library used to create more attractive and informative statistical graphics.
- SciPy (Scientific Python) - used for scientific computation. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing
- Scikit-learn - It is a machine learning library that enables tools for used for many other machine learning algorithms such as classification, prediction, etc.
- Geopandas-GeoPandas, as the name suggests, extends the popular data science library pandas by adding support for geospatial data.
Drop a π if you find this repository useful.
If you have any doubts or suggestions, feel free to reach me.
π« How to reach me: