This repository is the basic Introduction to Data Science in Python. I feel it will be great help for those starting with Data Science.
Data scientist need to have these skills:
- Basic Tools: Like python, R or SQL. You do not need to know everything. What you only need is to learn how to use python
- Basic Statistics: Like mean, median or standart deviation. If you know basic statistics, you can use python easily.
- Data Munging: Working with messy and difficult data. Like a inconsistent date and string formatting. As you guess, python helps us.
- Data Visualization: Title is actually explanatory. We will visualize the data with python like matplot and seaborn libraries.
- Machine Learning: You do not need to understand math behind the machine learning technique. You only need is understanding basics of machine learning and learning how to implement it while using python.
- Basics
- Tutorial 1: Introduction to Python
- Tutorial 2: Python Data Science Toolbox
- Tutorial 3: Data Cleaning Methods
- Tutorial 4: Introduction to Pandas
All the codes are written in Jupyter Notebook (Python 2.7.x)
After installing Jupyter Notebook, run it through terminal:
jupyter notebook
NumPy is the fundamental package for scientific computing with Python.
sudo apt-get install python-pip
sudo pip install numpy scipy
Easy-to-use data structures and data analysis tools for the Python programming language.
sudo pip install pandas
Similarly Install Matplotlib, Seaborn, etc.