/introToDataScience

A short Introduction to Data Science in Python

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

introToDataScience

This repository is the basic Introduction to Data Science in Python. I feel it will be great help for those starting with Data Science.

Data scientist need to have these skills:

  1. Basic Tools: Like python, R or SQL. You do not need to know everything. What you only need is to learn how to use python
  2. Basic Statistics: Like mean, median or standart deviation. If you know basic statistics, you can use python easily.
  3. Data Munging: Working with messy and difficult data. Like a inconsistent date and string formatting. As you guess, python helps us.
  4. Data Visualization: Title is actually explanatory. We will visualize the data with python like matplot and seaborn libraries.
  5. Machine Learning: You do not need to understand math behind the machine learning technique. You only need is understanding basics of machine learning and learning how to implement it while using python.

Content:

  1. Basics
  2. Tutorial 1: Introduction to Python
  3. Tutorial 2: Python Data Science Toolbox
  4. Tutorial 3: Data Cleaning Methods
  5. Tutorial 4: Introduction to Pandas

All the codes are written in Jupyter Notebook (Python 2.7.x)

After installing Jupyter Notebook, run it through terminal:

jupyter notebook

To install mentioned libraries:
Numpy:

NumPy is the fundamental package for scientific computing with Python.

sudo apt-get install python-pip  
sudo pip install numpy scipy
Pandas:

Easy-to-use data structures and data analysis tools for the Python programming language.

sudo pip install pandas

Similarly Install Matplotlib, Seaborn, etc.