/MVA_2021_SL

MVA Course - Algorithms for Speech and Language Processing - with B. Sagot & N. Zeghidour

GNU General Public License v3.0GPL-3.0

Algorithms for speech and natural language processing (MVA 2021)

this is the course for the academic year 2020-2021. For the current year (2022-2023), please follow this link: https://github.com/edupoux/MVA_2023_SL/

Contact information

For any question/request related to this course, please send an email to this address: mva.speech.language@gmail.com

Course materials

Course Objectives

Speech and natural language processing is a subfield of artificial intelligence used in an increasing number of applications; yet, while some aspects are on par with human performances, others are lagging behind. This course will present the full stack of speech and language technology, from automatic speech recognition to parsing and semantic processing. The course will present, at each level, the key principles, algorithms and mathematical principles behind the state of the art, and confront them with what is know about human speech and language processing. Students will acquire detailed knowledge of the scientific issues and computational techniques in automatic speech and language processing and will have hands on experience in implementing and evaluating the important algorithms.

Topics:

  • speech features & signal processing
  • hidden markov & finite state modeling
  • probabilistic parsing
  • continuous embeddings
  • deep learning for language-related tasks (DNNs, RNNs)
  • linguistics and psycholinguistics
  • comparing human and machine performance

Prerequisites

Basic linear algebra, calculus, probability theory

Organization

9 courses

The courses take place on Monday, from 9am to 12am, remotely, between Jan 18 and March 22, 2021.

A typical course contains three parts:

  • 9:00am-11:00am : watch the course content. It will be distributed through at Youtube link a few days before (you can watch it before if you don't have a good connection).
  • 11:00am- 11:30am: on-line QUIZZ. Attention!! you must absolutely be on-line in this time period to answer quizz questions on a google form.
  • 11:30am- 12:00am: Q&A session. Live session where the answers to the QUIZZ will be revealed an questions about the course can be asked.

Validation

The validation is in two parts:

  • on-line QUIZZ (40% of the total grade). This is the only part of the course where you are absolutely required to be connected on-line. You'll be given a link of a google form which will be activated exactly at 11:00am and closed down at 11:30am. Any forms submitted after the deadline will be automatically rejected, and graded as zero. The QUIZZES will contain comprehension questions and the best 5 grades out of the 6 quizzes will be used for the average. Between 11:30 and 12:00 there will be a Q&A period where you'll be able to ask questions about the course and QUIZZ using an on-line connection.

  • Project. (60% of the total grade). You'll work in small groups of 2-4 around a recent paper in speech or language processing which has already some existing code. Your task will be 1. to replicate the main result of the paper 2. run a experiment testing a new question not tested in the paper. You'll present your plan in a one page document in week #3, and your results in a 4 pages documend and 10 minutes oral presentation + 5 minutes questions in week #10.

the list of possible projects is here: https://docs.google.com/spreadsheets/d/115ZIe9V0Y-bbaf40KHEjRobgTuqPk1-VNEPC88fleKg/edit#gid=0

ATTENTION: since there is no "exam", there is no possibility of "rattrapage" (ie, of compensating a bad mark by taking another exam). So, if the overall grade obtained in this course is less than 10/20, this course will not be considered validated by the MVA Master.

Schedule and links

Attention: deadline for project proposal (one page: FEB 7, 2021; midnight)

Submit through email HERE. One email per group, CC all members of the group. You will receive an acknowledgment within 24 h.

details for the QUIZZ

The QUIZZ will be composed of comprehension questions regarding the course you've just watched. You will have 30 minutes to complete the Google form which will be activated at 11:00am and closed at 11:30am each week with a quizz session.

details for the PROJECT

You will be given a list of papers to choose from.

Your first task (week #1-#3) is to select a paper of interest, and make up a group of 2-4 people to work on this paper.

Your second task (week #3) will be to decide on a plan for the experiment you'd like to run and write a 1p document describing what you want to do and who will do what. Attention! any delay in submitting your paper will cost you points (1/24th of a point for each hour of delay after sunday midnight before the monday of week #4).

Your third task will be to conduct the work and prepare (1) a written document (4 p max) describing what you've done and the main results. You may differ from the 1p, but will have to explain how and why. The 4 p should also contain a statement of contribution (who did what), (2) and oral 10 minutes presentation. (there will be 5 minutes of question aferwards).

Where:

All the courses will be delivered remotely.

The course materials (PDFs, etc.) are listed in the subdirectories numbered #1 .. #9.

Q&A

_What happens if i get less than 10/20 on average? Can I take another exam? _

No, there is no possibility of 'rattrapage'; any obtained grade is final.

_What happens if i cannot connect to the internet for the on-line QUIZZ on Mondays between 11:00 and 11:30? _

This is the only part where on-line presence to the course is mandatory; only a low bandwidth connection is necessary, since it is a text-based google form. Of course, if you have a low bandwidth, it is perhaps more prudent to start watching the course before 9am on the monday of the QUIZZ, to avoid missing part of it.

Failure to submit the QUIZZ on time will result in a zero/20 for that QUIZZ, unless you can demonstrate that it was materially impossible for you to connect in that timeframe. Such documented requestes should be sent to mva.speech.language@gmail.com together with the name and date of missed QUIZZ.