As taught at the Vrije Universiteit Amsterdam in the Humanities Research Master: Linguistics (track Human Language Technology) and the Minor Digital Humanities and Social Analytics (BA).
This is a practical course in Python, geared towards those who want to get some hands-on experience working with language data. No knowledge of programming is required or presupposed. We will work with Python 3. If you haven't worked with Python before, we recommend that you install Anaconda.
(If you have worked with Python 3 before, be sure to check if Jupyter Notebook is installed on your machine. We will work extensively with notebooks.)
This course is based on the material used in previous years and in this course.
This course is meant to introduce you to the basics of the Python programming language. There is a lot to discover about Python and programming in general, and you will probably learn something new every day if you continue programming after this course. Our goal for you is to become an independent programmer, who is able to find solutions to new problems. You will..
- Learn how to work with the standard library of Python
- Learn how to deal with different file types (e.g. plain text, CSV/TSV, JSON, XML)
- Learn how to use some external libraries (e.g. to analyse texts)
- Learn how to document and share your code and results
We will focus on readability and understandability, so that you will be able to share your code and results with others, and re-use your code in the future. This is a practical course, in which you will get a lot of hands-on experience. Due to the nature of this course, active participation is required.
Every course has a set of core principles that its teachers adhere to. We strongly believe in the principles outlined by Mike Bostock in his article What makes software good? Here they are:
-
Good software is approachable. It can be understood completely in independent, easy pieces. You don’t need to understand everything before you can understand anything.
-
Good software is consistent. It lets you take what you’ve learned about one part and extrapolate it to the rest. It doesn’t self-contradict. It is parsimonious, avoiding superfluous elements.
-
Good software explains itself. It has affordances for learning and discovery. It is role-expressive and minimizes hidden magic.
-
Good software teaches. It doesn’t just automate an existing task, but provides insight or imparts knowledge, such as a best practice or a new perspective on a problem.
-
Good software is for humans. It is cognizant of people and the reality in which they live. It does not expect elaborate and arbitrary rules to be memorized. It anticipates the need for learning and debugging.
When you are just learning how to program, it sometimes happens that you get stuck and you don't know what to do next. This is normal. There are many fantastic resources online that we encourage you to use to solve your problem on your own. But we don't want this to be a frustrating experience for you. So if you're stuck for more than 15 minutes: please contact us! No matter how small the problem. If you're stuck, you're stuck.
In our experience it does help to solve the exercises together with a classmate. (See pair programming and rubber duck debugging.) If either one of you gets stuck, try to explain the thought process behind your program, and go through the lines step by step. Making your thought process explicit is a great way to find bugs in your code!
Our materials are structured as follows:
-
The
Chapters
folder contains our primary teaching material. Every week, you will work through a subset of these interactive notebooks. -
The
Class_Notes
folder contains the examples and additional remarks that we discussed during class. -
The
Assignments
folder contains the assignments that you will be asked to submit during the course. -
The
Exam
folder contains sample exams from previous years. -
The
Final_Assignment
folder contains the instructions and data needed for the final assignment (only for the MA students). -
The
Extra_Material
folder contains some extra reading about the Python theory which you may use for future reference. It also contains some information specifically related to natural language processing, and examples on how to organize your code and how to create a Flask website. -
The
Data
folder contains all data used in this course and more, as well as the scripts used to obtain this data. (So you can see what techniques we used.)
This file serves as the syllabus and a general reference for this course.
For the ReMa students taking the 9 ECTS Python for Text Analysis course (L_AAMPLIN017), there will be bi-weekly assignments, an exam, and a final assignment. They are weighted as follows.
Part | weight % | Part | weight % | |
---|---|---|---|---|
Assignment 1 | 5 | Total Assignments | 35 | |
Assignment 2 | 10 | Exam | 20 | |
Assignment 3 | 10 | Final assignment | 45 | |
Assignment 4 | 10 | |||
Total | 100 | |||
Total Assignments | 35 |
For the BA students taking the 6 ECTS Programming for Humanities and Social Sciences course (L_AABAALG069), there will be bi-weekly assignments and an exam. These students will not do a final assignment. The assignments and exam are weighted as follows.
Part | weight % | Part | weight % | |
---|---|---|---|---|
Assignment 1 | 9 | Total Assignments | 60 | |
Assignment 2 | 17 | Exam | 40 | |
Assignment 3 | 17 | |||
Assignment 4 | 17 | |||
Total | 100 | |||
Total Assignments | 60 |
You are asked to hand in 4 assignments in total. The deadlines are either on Friday before 23:59 or on Tuesday at 20:00. Submission is done through Google Drive (see submission forms below in the schedule). Submission 1 day after the deadline results in two points deduction of your grade. After that, the resulting grade is a 1. We have to be strict about this, because we will discuss the assignments in class and we need time to look at your submissions.
The exam on the 18th of December tests your knowledge of the syntax of Python, and your knowledge of the standard library. For the BA students, it is the final test to show what you have learnt. For the MA students, it serves as a check to assure that your knowledge of the language is sufficient to tackle the final assignment. You cannot pass the course without a passing grade on the exam. But don't worry: if you are able to finish the assignments, you will be fine on the exam.
The final part of the MA course consists of a final assignment, for which you will work on your own code project. The exact details of the final assignment will be announced later.
You can expect a project in which you are asked to:
- process a number of files;
- extract relevant information from those files;
- present that information to the user;
- store the information in a useful format
We will consider the following questions (along with the core principles) to evaluate your final assignment:
- Does the code work?
- Does the code fulfill the requirements?
- Is the code well-documented?
- Is the code clear and understandable?
- Is the code modular?
- Is the code easily extensible?
- How scalable is the solution?
- Is the code written in accordance with the community standards? (That is: PEP8)
- Are there tests to ensure that the code works?
The schedule for the entire course follows the same structure, illustrated below. Our philosophy is that programming should be taught in a hands-on manner, so we tried to reduce 'powerpoint time' to a minimum. Most theory is mainly taught through the notebooks, but we'll also address the major topics in class.
Each Block will consist of three lectures. In the first lecture, we introduce some of the new topics. After this lecture, you are expected to work through the Chapters belonging to this block and start on the assignment. In the second lecture, we will further highlight some of the theory and you will have time to work on the assignment. You will finish the assignment between the second and third lecture, and hand it in on either Friday or Thursday. Finally, the third lecture is a feedback session where we will discuss some of the main problems that were encountered in the assignments. We will repeat this cycle 4 times (for each assignment).
New material will be added to this Github repository during the course. For downloading the material, please right click on the links in the schedule below and save the file in your course materials. Save the Chapter X
notebooks in your Chapters
folder, the Assignment X
notebooks in your Assignments
folder. If additional data
is provided, save it in the Data
folder.
week | what | when | description | downloads/uploads |
---|---|---|---|---|
44 | lecture | Monday 30-10-2017 15:30 - 17:15 |
BLOCK 1: Introduction | Chapter 1 Chapter 2 Chapter 3 Chapter 4 Assignment 1 |
44 | lecture | Thursday 2-11-2017 13:30 - 15:15 |
BLOCK 1: Discussion and work time | Class Notes |
44 | DEADLINE | Friday 3-11-2017 before 23:59 |
SUBMIT ASSIGNMENT 1 | submission form |
45 | lecture | Monday 6-11-2017 15:30 - 17:15 |
BLOCK 1: Feedback | |
45 | lecture | Thursday 9-11-2017 13:30 - 15:15 |
BLOCK 2: Introduction | Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 9 Chapter 10 Assignment 2 Class Notes |
46 | lecture | Monday 13-11-2017 15:30 - 17:15 |
BLOCK 2: Discussion and work time | Class notes |
46 | DEADLINE | Tuesday 14-11-2017 before 20:00 |
SUBMIT ASSIGNMENT 2 | submission form |
46 | lecture | Thursday 16-11-2017 13:30 - 15:15 |
BLOCK 2: Feedback | |
47 | lecture | Monday 20-11-2017 15:30 - 17:15 |
BLOCK 3: Introduction | Preparation Chapter 11 Chapter 12 Chapter 13 Chapter 14 Chapter 15 Assignment |
47 | lecture | Thursday 23-11-2017 13:30 - 15:15 |
BLOCK 3: Discussion and work time | Class notes |
47 | DEADLINE | Friday 24-11-2017 before 23:59 |
SUBMIT ASSIGNMENT 3 | submission form |
48 | lecture | Monday 27-11-2017 15:30 - 17:15 |
BLOCK 3: Feedback | |
48 | lecture | Thursday 30-11-2017 13:30 - 15:15 |
BLOCK 4: Introduction | Chapter 16 Chapter 17 hello_world.py the_program.py the_program_v2.py utils.py course.xml (store in Data/xml_data/) naf.xml (store in Data/xml_data/) Assignment-4a Assignment-4b Class notes |
49 | lecture | Monday 4-12-2017 15:30 - 17:15 |
BLOCK 4: Discussion and work time | Class notes |
49 | DEADLINE | Tuesday 5-12-2017 before 20:00 |
SUBMIT ASSIGNMENT 4 | submission form |
49 | lecture | Thursday 7-12-2017 13:30 - 15:15 |
BLOCK 4: Feedback | |
50 | lecture | Monday 11-12-2017 15:30 - 17:15 |
CANCELLED | Dummy exam |
50 | lecture | Thursday 14-12-2017 13:30 - 15:15 |
Practice EXAM + Introduction Final Assignment | |
51 | EXAM | Monday 18-12-2017 08:45 - 11:30 |
EXAM | |
51 | lecture | Thursday 21-12-2017 13:30 - 15:15 |
Start with FINAL ASSIGNMENT | Final Assignment |
51 | DEADLINE | Friday 22-12-2017 before 13:00 |
Decision on Team + Task + Dataset | Please inform us by e-mail |
2 | lecture | Thursday 11-01-2018 13:20 - 14:10 |
Lecture on spaCy | Chapter 18 |
2 | consultation | Monday, Wednesday, Thursday | Individual feedback | Sign up |
3 | lecture | Monday 15-01-2018 15:30 - 17:15 |
Lecture on Visualization and Code organization | Chapter 19 Chapter 20 Tips on Code Documentation/Organization chart chooser image Class Notes |
3 | consultation | Monday, Wednesday, Thursday | Individual feedback | Sign up |
4 | consultation | Monday, Wednesday, Thursday | Individual feedback | Sign up |
5 | PRESENTATIONS | Monday 29-01-2018 15:30 - 17:15 |
Presentations Final Assignment | |
5 | consultation | Monday, Wednesday, Thursday | Individual feedback | Sign up |
5 | DEADLINE | Sunday 04-02-2018 before 23:59 |
SUBMIT FINAL ASSIGNMENT |
You are expected to attempt to attend all lectures. You are allowed to miss two lectures at most. After more than two absences, you are no longer able to obtain credits for the course.
Basically, please don't cheat. For the weekly assignments, let us know in the comments if you have worked together with someone or if you used code from online sources, such as stackoverflow. If you found some useful code online, do try to understand what that piece of code does. If it looks 'complicated', we expect you to provide comments in the code explaining what it does.