/cl1f19umd

Computational Linguistics 1, Fall 2019, University of Maryland

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

Computational Linguistics 1 (CMSC723, INST735, LING723), Fall 2019, University of Maryland

Computational linguistics (CL) is the science of doing what linguists do with language, but using computers. Natural language processing (NLP) is the engineering discipline of doing what people do with language, but using computers. We'll cover both, though the emphasis is on NLP. We will largely focus on machine learning-based approaches to a wide variety of challenging problems in NLP, with an emphasis on recent deep learning-based techniques. Class time and readings will focus on techniques; homeworks will largely focus on using NLP techniques to address socially relevant problems. A focus throughout the course will be on bias and fairness in machine learning systems.


Basic Course Information

Instructor Hal Daumé III (he/him) Photo of Hal Daumé III
When T/R 3:30pm-4:45pm
Where IRB 1116
TAs


Kianté Brantley (he/him)
Trista Cao (she/her)
Amr Sharaf (he/him)
Photo of Kianté Brantley Photo of Trista Cao Photo of Amr Sharaf
Discussion &
Homework
ELMS
Office Hours



Hal: Thr 1:45p-2:30p, IRB 4150
Trista: Mon 4:00p-5:00p, IRB 4th floor, in front of 4105
Amr: Wed 10:00a-11:00a, IRB 4th floor, in front of 4105

Prerequisites

The required prerequisite for this course in an undergraduate AI course, though a machine learning course, an algorithms course, or LING 689/889 (Computational Psycholinguistics) should be sufficient. In particular, you should be able to:

  • Program in python
  • Use core unix commands (backgound)
  • Function with foundational probability and statistics (background)
  • Apply essential linear algebra (background)
  • Implement and understand central machine learning techniques (e.g., CIML chapters 1-5 and 7)

If you cannot handle all of these things (and cannot pick them up quickly), you should expect to run into challenges in the course.


Cousework and Grading

The components of grading are:

  • Homework assignments (7% each, 35% total)
  • Course project (35% total)
  • Early exam (10%) and late exam (15%)
  • In-class/elms participation (5%)

Final class grades will be assigned based on the following mapping, possibly with thresholds adjusted down (but never adjusted up):

Score Grade        Score Grade        Score Grade
>=94 A >= 90 A-
>=87 B+ >=84 B >= 80 B-
>=77 C+ >=74 C >= 70 C-
>=67 D+ >=64 D >= 60 D-

During this course, you will have five homework assignments that include both programming and written aspects. The written aspects are largely designed to help you do the programming more efficiently, by working through some of the details of what you will implement. These assignments are to be completed individually, and will be graded individually (see "collaboration policy" below). The goal of these assignments are to ensure that you learn and can implement standard NLP techniques, and understand and process language data effectively. These are:

You will also complete one, large, course project, in teams of 4-8 students (exceptions are possible). The goal of this project is to enable you to work on a more significant, potentially impactful, project dealing with natural language. See the course project description for more information.

Participation: You are to participate actively in class or in the online discussions. If you participate online, every question you answer well will get you 1% credit (marked by "instructor approved answer"); every question you ask will get you 0.5% credit (marked as "good question" by the instructor or a TA). Asking/answering questions in class counts the same.

Late-ness: In general, nothing may be handed in late without prior approval. However, every student may use one "stuff happens" card for one homework deadline, and every team may use one "stuff happens" card for one project deadline. These cards give you an additional 48 extra hours at no penalty in grade.

Score adjustments: Everyone makes mistakes, including us on grading. If you handed something in and do not get a score for an assignment, or if you believe there is an error in grading (either a homework or exam or project), you may raise this issue with us within one week of when we hand back grades.


Course Project

A substantial portion of your coursework is a team-based project. You will work in teams. We highly recommend interdisciplinary teams are; and because diverse teams often produce better outcomes than homogenous teams, we encourage you to reach out and work with people who aren't (yet) your friends. As a team, you will complete a project of your choosing throughout the semester. The topic of the course project is open-ended, though it must fulfill certain requirements (most notably, relevance to natural language processing or computational linguistics). This is your opportunity to put your NLP/CL knowledge to use in a project of your choosing.

There are several deliverables for the course project, with associated grade percentages:

  • P1: Project brainstorming, pitch, and feedback (5%)
  • P2: Survey of related work, and plans for data (5%)
  • P3: Description of proposed approach and measures of success (5%)
  • P4: Prototype/baseline implementation and initial results (5%)
  • P5: Final write-up and presentation (15%)

Each team will be assigned one of the TAs with whom you should meet once before Thanksgiving break. You should also meet with Hal once before Thanksgiving break. We will create a signup sheet; use your own judgment for when would be most useful for you to meet with us.

Please see the course project pages for more details!

Credit: Some ideas for course project implementation are from Walter Lasecki's course on Social Computing Systems and/or Chris Callison-Burch's course on Crowdsourcing.


Class Policies

Disability Support: Any student eligible for and requesting reasonable academic accommodations due to a disability is requested to provide, to the instructor in office hours, a letter of accommodation from the Office of Disability Support Services (DSS) within the first TWO weeks of the semester.

Laptops in Class: It's been repeatedly documented in many studies that if you can, you are likely better off not using a laptop in class (example study; h/t Jacob Eisenstein). You can make your own decision, but if your laptop use is distracting others, an instructor may ask you to cease using it (in particular, please avoid using websites with popup videos and the like). Please reach out to any instructor if we can help.

Academic Integrity: Any assignment or exam that is handed in must be your own work. However, talking with one another to understand the material better is strongly encouraged. Recognizing the distinction between cheating and cooperation is very important. If you copy someone else's solution, you are cheating. If you let someone else copy your solution, you are cheating. If someone dictates a solution to you, you are cheating. Everything you hand in must be in your own words, and based on your own understanding of the solution. If someone helps you understand the problem during a high-level discussion, you are not cheating. We strongly encourage students to help one another understand the material presented in class, in the book, and general issues relevant to the assignments. When taking an exam, you must work independently. Any collaboration during an exam will be considered cheating. Any student who is caught cheating will be given an E in the course and referred to the University Student Behavior Committee. Please don't take that chance---if you're having trouble understanding the material, please let us know and we will be more than happy to help.

Anti-Harassment: The open exchange of ideas, the freedom of thought and expression, and respectful scientific debate are central to the aims and goals of a this course. These require a community and an environment that recognizes the inherent worth of every person and group, that fosters dignity, understanding, and mutual respect, and that embraces diversity. Harassment and hostile behavior are unwelcome in any part of this course. This includes: speech or behavior that intimidates, creates discomfort, or interferes with a person’s participation or opportunity for participation in the conference. We aim for this course to be an environment where harassment in any form does not happen, including but not limited to: harassment based on race, gender, religion, age, color, national origin, ancestry, disability, sexual orientation, or gender identity. Harassment includes degrading verbal comments, deliberate intimidation, stalking, harassing photography or recording, inappropriate physical contact, and unwelcome sexual attention. Please contact an instructor or CS staff member if you have questions or if you feel you are the victim of harassment (or otherwise witness harassment of others). (Adapted from the ACL Anti-Harassment Policy.)

Web Accessibility: The University of Maryland is committed to equal access to Web content. If you need to request Web content in an alternative format or have comments or suggestions on accessibility, contact itaccessibility@umd.edu.


Course Schedule

Note that readings and homeworks are to be completed before the class period on which they are marked. For instance, you should have completed reading TODO before class on 29 Aug, and you must hand in HW1 before class on 12 Sep.

Readings may be from:

Date Topic Reading Deadline
T 27 Aug Introduction to computational linguistics
R 29 Aug Distributional semantics SLP3 6.2-6.5
T 03 Sep Review: linear models and loss functions CIML 7 OH Poll
R 05 Sep Text categorization: linguistic features and evaluation SLP3 4.7,
and Stylometry §2,5
T 10 Sep Bias and fairness in NLP systems Webinar*
R 12 Sep Computation graphs and backpropagation NLP 3.1-3.3 HW1
T 17 Sep Word meaning as classification SLP3 6.8-6.9,
and RacistAI
R 19 Sep Data collection and annotation DataInNLP,
and AnnCaseStudy
T 24 Sep Measurement and validity Measurement,
and MeasurementCaseStudy, Sec "Reliability, Validity, ..."
R 26 Sep Crowdsourcing annotations CrowdsourcingNLP,
and AnnMyths
HW2
T 01 Oct CLASS CANCELLED (Hal sick) Multilinguality and linguistic variety TheBenderRule,
and Elicitation, Sec 3,
and optional: ActiveElicitation
R 03 Oct Early exam
T 08 Oct N-gram language models SLP3 3
R 10 Oct Recurrent neural language models SLP3 9
T 15 Oct Sequence labeling CIML 17
R 17 Oct Encoder-decoder models Neu 7-7.3.1 Neu 8 HW3
T 22 Oct Project Pitches P1
R 24 Oct Machine translation and evaluation (guest lecturer: Marine Carpuat) Bleu
T 29 Oct Dependency parsing SLP3 15-15.4
R 31 Oct Imitation learning; notes1 CIML 18
T 05 Nov Imitation learning II (same slides); notes2 DepParse P2
R 07 Nov Reinforcement learning; notes1 RL4IE
T 12 Nov Reinforcement learning II (same slides); notes2 RL4IE
R 14 Nov Late exam
T 19 Nov Semantic parsing Artzi+Zettlemoyer'13 background
R 21 Nov Language grounding Matuszek'18 Regier+Carlson'01, to skim P3
T 26 Nov Language to action Branavan+al'09 + Khanh+D'19
R 28 Nov Thanksgiving Break
T 03 Dec Reading comprehension and question answering Chen+al'17 Jia+Liang'17 P4
R 05 Dec Interpretation of neural models TBA HW5
T 17 Dec Project Poster Session (10:30a-12:30p) P5

* The webinar link requires you to "register"; if this is an issue for you for any reason, please let any instructor know at least three days ahead of time so we can find a work-around.