The course site for the Data Processing in Python from IES. See information on SIS. The course is taught by Martin Hronec and Vítek Macháček
The aim of the course is to provide a hands-on experience with the data-manipulation techniques in Python. The special emphasis is put on standard libraries such as Pandas, Numpy or Matplotlib and also collecting web data with requests and BeatifiulSoup. The students will also be guided through the modern social-coding and open-source technologies such as GitHub, Jupyter and Open Data.
The students will gain their experience using the data from the IES website and subject evaluation protocols.
The course would make use of the DataCamp online sources to provide the students with reliable and yet simple resources for learning Python programming.
After passing the course, the students will be able to download the data from APIs or directly from the web, pre-process it, analyze it and visualize it.
Econometrics II. (JEB110) is an explicit prerequisite for bachelor students.
The course is designed for students that have at least some basic coding experience. It does not need to be very advanced, but they should be aware of concepts such as for
loop ,if
and else
,variable
or function
.
No knowledge of Python is required for entering the course.
Pro Git book, Atlassian Git tutorials, Github resources for learning Git
Python, Pandas, Numpy, requests, BeautifulSoup and Matplotlib.
Introduction to Git for Data Science
Intermediate Python for Data Science
Manipulating DataFrames with pandas
Merging DataFrames with pandas
Importing Data in Python (Part 1)
Importing Data in Python (Part 2)
Introduction to Data Visualization
Interactive Data Visualization in Bokeh
Introduction to SQL for Data Science
Introduction to Databases in Python
Practical Introduction to Web Scraping in Python
Passing the course is rewarded with 5 ECTS credits.
The requirement for passing the course are DataCamp assignments (6x5pts) and the final project (70pts).
Assignment 1 - Submission on 27/2 (Introduction to Python Course)
- Python Lists
- Fundamental Data Types
- Function and Packages
Assignment 2 - Submission on 6/3 (Manipulating DataFrames with pandas)
- Exploratory Data Analysis
- Extracting and Transforming Data
- Advanced Indexing
Assignment 3 - Submission on 13/3 (OOP)
- TBD
Assignment 4 - Submission on 20/3 (Web Scraping in Python Course)
- Introduction to HTML
- XPaths and Selectors
- CSS Locators, Chaining, and Responses
Assignment 5 - Submission on 27/3 (Importing Data in Python (Part 2) Course)
- Importing data from the Internet
- Interacting with APIs to import data from the web
- Diving deep into the Twitter API
Assignment 6 - Submission on 4/4 (Merging DataFrames with pandas Course)
- Concatenating and merging data
- Rearranging and reshaping data
- Grouping data
Students choose their own topic and the data source, that would be approved by the TAs. It should not be the data from the seminars.
The final project should fulfill the following criteria:
- Should be submitted as a GitHub repository. Only the github link is submitted to the course TAs.
- The main results should be summarized in the Jupyter notebook in the root of the repository.
- The project should use the raw data scraped either from the public API or directly from the web.
- The data should be pre-processed to analysis ready format.
- The project should contain appropriate analysis and visualization.
- The project should contain the commented ready-to-run data download method.
- A: above 90 (not inclusive)
- B: between 80 (not inclusive) and 90 (inclusive)
- C: between 70 (not inclusive) and 80 (inclusive)
- D: between 60 (not inclusive) and 70 (inclusive)
- E: between 50 (not inclusive) and 60 (inclusive)
- F: below 50 (inclusive)
Jupyter and GitHub intro here
The Jupyter notebook with IES web parser
Date | Topic | who | Project | HW | |
---|---|---|---|---|---|
20-21/2 | Intro + GitHub, Jupyter, DataCamp | both | |||
27-28/2 | Strings, Floats, Lists, Dictionaries, Functions | Vítek | HW 1 | ||
6-7/3 | Pandas, Matplotlib, Numpy | Martin | HW 2 | ||
13-14/3 | Object-Oriented Programming | Martin | HW 3 | ||
20-21/3 | HTML, XML, JSON | Vítek | HW 4 | ||
27-28/3 | API + Scraping | Vítek | Project Topic Proposal | HW 5 | |
3-4/4 | SQLite | Vítek | HW 6 | ||
10-11/4 | Advanced Pandas | Martin | Project Topic Approval | ||
17-18/4 | Bokeh + GitHub Pages | Vítek | |||
24-25/4 | Project Work 1 | ||||
1-2/5 | Project Work 2 | ||||
8-9/5 | Efficient Computing | Martin |