/PythonIT

Primary LanguageJupyter Notebook

Data analysis with Python

In this course an overview is given of different phases of the data analysis pipeline using Python and its data analysis ecosystem. What is typically done in data analysis? We assume that data is already available, so we only need to download it. After downloading the data it needs to be cleaned to enable further analysis. In the cleaning phase the data is converted to some uniform and consistent format. After which the data can, for instance, be

##Combined or divided into smaller chunks
##Grouped or Sorted,
##Condensed into small number of summary statistics
##Numerical or string operations can be performed on the data

The point is to manipulate the data into a form that enables discovery of relationships and regularities among the elements of data. Visualization of data often helps to get a better understanding of the data. Another useful tool for data analysis is machine learning, where a mathematical or statistical model is fitted to the data. These models can then be used to make predictions of new data, or can be used to explain or describe the current data.

Python is a popular, easy to learn programming language. It is commonly used in the field of data analysis, because there are very efficient libraries available to process large amounts of data. This so called data analysis stack includes libraries such of NumPy, Pandas, Matplotlib and SciPy.

No previous knowledge of Python is needed as will start with a quick introduction to Python. It is however assumed that you have good programming skills in some language.

Contents:

Python

    Basic concepts
    String handling
    Modules
    Regular expressions
    Exceptions

NumPy

Creation of arrays
Array types and attributes
Indexing, slicing and reshaping
Array concatenation, splitting and stacking
Fast computation using universal functions
Aggregations: max, min, sum, mean, standard deviation…
Broadcasting
Comparisons and masking
Fancy indexing
Sorting arrays

Matplotlib

Simple figure
Subfigures
Other data visualization libraries for Python

Image processing

Finding clusters in an image

Pandas

Creation and indexing of series
Creation of dataframes
Accessing columns and rows of a dataframe
Alternative indexing and data selection
Summary statistics
Missing data
Converting columns from one type to another
String processing
Additional information    
Catenating datasets
Merging dataframes
Aggregates and groupings
Time series
Additional information