AdvancedNLP

Sessions

⚠️ There is no oral presentation/evaluation for this course.

The evaluation consists in a team project (3-5 people). There are two options:

Demo : Use a well-known approach to produce a MVP for an original use-case and present it in a demo.
- Example: An online platform that detects AI-generated text.
R&D : Based on a research article, conduct original experiments and produce a report. (see Potential articles)
- Example: Do we need Next Sentence Prediction in BERT? (Answer: No)

It will consist of three steps:

Team announcement (before 15/12/23): send an email to nathan.godey@inria.fr with cc's matthieu.futeral@inria.fr and francis.kulumba@inria.fr explaining
- The team members (also cc'ed)
- Type of project and vague description (can change afterwards)
Project plan (30% of final grade, before 07/01/23): following this template, produce a project plan explaining first attempts (e.g. version alpha), how they failed/succeeded and what you want to do before the delivery.
Project delivery (70% of final grade, before mid-February): deliver a nb_team_members * 2 pages project report and a GitHub repo (more details coming soon)

A Vocabulary-Free Multilingual Neural Tokenizer for End-to-End Task Learning (https://arxiv.org/abs/2204.10815)
BPE-Dropout: Simple and Effective Subword Regularization (https://aclanthology.org/2020.acl-main.170/)

Efficient Streaming Language Models with Attention Sinks (https://arxiv.org/abs/2309.17453)
Lookahead decoding (https://lmsys.org/blog/2023-11-21-lookahead-decoding/)
Efficient Memory Management for Large Language Model Serving with PagedAttention (https://arxiv.org/pdf/2309.06180.pdf)

Detecting Pretraining Data from Large Language Models (https://arxiv.org/abs/2310.16789)
Proving Test Set Contamination in Black Box Language Models (https://arxiv.org/abs/2310.17623)

Mamba: Linear-Time Sequence Modeling with Selective State Spaces (https://arxiv.org/abs/2312.00752)

Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection (https://aclanthology.org/2020.acl-main.647/)
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (https://arxiv.org/abs/2305.18290)
Text Embeddings Reveal (Almost) As Much As Text (https://arxiv.org/abs/2310.06816)