/py_data_science

Python beginner course on Data Science

Primary LanguageJupyter Notebook

Python for Data Science

NASA NCCS

Python is a free, a general-purpose, and portable programming language. It is easy to use with its simple syntax and readability, which makes the code easy to understand and maintain. Python can be extended via libraries that can be used to tackle problems in machine learning, data analysis, and beyond. It has a vast ecosystem and a dynamic user's community that make Python accessible to everyone.

According to Wikipedia, data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. Because of its features, Python is one of the preferred programming languages that data scientists can use to explore and analyze their datasets. The growth of Python in data science has gone hand in hand with that of Pandas, External link which opened the use of Python for data analysis to a broader audience by enabling it to deal with row-and-column datasets, import CSV files, and much more.

This course introduces the fundamental concepts of the Python programming language. In additio, it presents Python packages (Numpy, Matplotlib, Seaborn and Pandas) used for data manipulation and visualization. This course provides the necessary foundations for Exploratory Data Analysis and Machine Learning.

Objectives

You will learn the following topics:

  • Python basic syntax, variables, and types
  • Conditional statements and loops
  • Data structures: list, tuple, dictionary, set
  • Functions and modules
  • I/O with text files
  • Numpy arrays
  • Basic visualization with Matplotlib
  • Data manipulation (reading data file, performing statistical analysis, visualization, handling time series data) with Pandas

At the end of this course, participants will be able to write their own Python scripts to read csv files, perform data wrangling, perform basic statiscal analysis and visualize data.

It is not required to have a Python distribution installed on your local machine. However, we believe that it is important to have one in order to write and run your own Python applications. We recommend that you install the Anaconda Python distribution by following the instructions at: Anconda installation Guide

To install Git on your local machine, follow the installation instructions: Getting Started - Installing Git

To fully follow all the topics below, you need to have a gmail account in order to access Google Colaboratory. Each course will be taught through the Google cloud based Jupyter notebook.

Starting Point

Lecture Topic Interactive Link
Introduction to Jupyter Notebook Open In Colab
Introduction to Git Open In Colab

Introduction to Python

If you have never been exposed to Python, you need to take this Introduction to Python course. In case you did some Python programming in the past and you want to assess your Python knowledge, take the following test (in less that 15 minutes and without using any help):

Python Assessment Test

If you score at least 80% then only take the I/O on Text Files topic. Otherwise, take the entire course.

Lecture Topic Interactive Link
Running Python Open In Colab
Data Types Open In Colab
Conditional Statements Open In Colab
Loops Open In Colab
Advanced Data Types Open In Colab
Functions Open In Colab
Modules Open In Colab
I/O on Text Files Open In Colab
Lecture Topic Interactive Link
Introduction to Turtle Open In Colab
A place to run the code https://repl.it/

Data Science Tools

Lecture Topic Interactive Link
Introduction to Numpy Open In Colab
Basic Visualization with Matplotlib Open In Colab
Introduction to Pandas Open In Colab