This is the GitHub repository of the course "Summer Programming Practice 2020" for RUC Turing Class 2019. Please fork this repository and submit your project to the main repository with Pull Requests.
The main goal of this course is to write project code to solve real-world tasks. In specific, we focus on building a tiny search engine that is able to analyze, index, retrieve and mine the collected webpage pool. In this course, you will learn basic knowledge about information retrieval and natural language processing. The course is intended to start you in these exciting directions.
We encourage the students to implement the project with the programing languages of C++, Java and Python. We also emphasize that the code should be written in an object-oriented manner.
- 📅 July 13 to July 20
- 🕒 Full-day course (08:30 - 11:30, break, 14:00 – 17:00)
- 🌐 Zoom or Tencent Meeting (virtually online)
- 💬 Note, our course link will be sent out through the WeChat group for this course.
Photo | Name | |
---|---|---|
Xin Zhao | batmanfly@gmail.com |
Photo | Name | |
---|---|---|
Haorui Huang | huanghaorui301@gmail.com | |
Shuqing Bian | bianshuqing@ruc.edu.cn | |
Yupeng Hou | houyupeng@ruc.edu.cn | |
Tianyi Tang | steven_tang@ruc.edu.cn |
Course Time | Topic | ||
---|---|---|---|
Friday July 13 |
Morning | 08:30 - 11:30 |
Course Introduction Hypertext Markup Language (HTML) Crawler |
Afternoon | 14:00 - 17:00 | Programming time | |
Saturday July 14 |
Morning | 08:30 - 11:30 | Introduction to Natural Language Processing |
Afternoon | 14:00 - 17:00 | Programming time | |
Sunday July 15 |
Morning | 08:30 - 11:30 | Introduction to Information Retrieval |
Afternoon | 14:00 - 17:00 | Programming time | |
Monday July 16 |
Morning | 08:30 - 11:30 | Evaluation for IR |
Afternoon | 14:00 - 17:00 | Programming time | |
Tuesday July 17 |
Morning | 08:30 - 11:30 | More Topics in Natural Language Processing (word2vec, BERT) |
Afternoon | 14:00 - 17:00 | Programming time | |
Wednesday July 18 |
Morning | 08:30 - 11:30 | Server-client web API |
Afternoon | 14:00 - 17:00 | Programming time | |
Friday July 19 |
Morning | 08:30 - 11:30 | Introduction to LaTeX |
Afternoon | 14:00 - 17:00 | Programming time | |
Saturday July 20 |
Morning | 08:30 - 11:30 | Review and Pre-Presentation |
Afternoon | 14:00 - 17:00 | Selected Project Presentation |
The grading policy is given as follows:
- 🕊️ Online attendance (30%)
The student is required to attend the daytime course from July 13 to July 20. Attendance will be checked regularly each day from Tencent Meeting. If the absence time in a day is more than an hour without any reasonable explanation, you will be penalized with a reduction of 5% in the final score. (Suppose you missed one-day attendance, you will receive at most 25 points in this part). The reduction ratio will be aggregated by day if multiple absences have occurred.
- 👨💻 Project code (55%)
We release a check list for scoring the project code.
- 📝 Course report (5%)
The course report will be scored according to whether it contains sufficient content about the project code, including design, core components, functions and usage. You may consider it as an advanced version for README file. Note that the course report will be written in English with LaTeX. We will share the LaTeX template through the course group.
- 🙋♂️ Selected presentation (10%)
We will select a ratio of students who will make the final presentation. The choice will be made according to the quality of the prepresentation and the willing of the students.
- IR Book
- CS 276 / LING 286: Information Retrieval and Web Search
- CS224n: Natural Language Processing with Deep Learning
Note: We have reused public resources from the above links or other channels, including not limited to slides, text, or figures. The main purpose is only for class. We thank all the valuable materials that support our course. If you were the owner of these materials and found them unsuitable to be distributed on our website, please kindly inform me via email batmanfly@gmail.com. I would remove any unsuitable content upon request.