/awesome-legal-nlp

📖 A curated list of LegalNLP resources from all around the web.

MIT LicenseMIT

Awesome License

Legal Natural Language Processing

🗂 Datasets

Legal Judgement Prediction (LJP)

Dataset Links Domain Language Size
FSCS (Niklaus et al., 2021) 📄 🤗 💻 Swiss court judgments 🇩🇪 🇫🇷 🇮🇹 85K cases w/ 2 outcomes
ECtHR (Chalkidis et al., 2021) 📄 🤗 EU court judgments 🇬🇧 11K cases w/ 11 outcomes
ECHR (Aletras et al., 2019) 📄 💾 EU court judgments 🇬🇧 11.5K cases w/ 11 outcomes
CAIL (Xiao et al., 2018) 📄 💻 Chinese court judgements 🇨🇳 2.6M cases w/ 6 outcomes

Legal Text Classification (LTC)

Dataset Links Domain Language Size
GLC (Papaloukas et al., 2021) 📄 🤗 💻 Greek legislation 🇬🇷 47.5K laws w/ 2.7K labels
CUAD (Hendrycks et al., 2021) 📄 🤗 💻 Contracts 🇬🇧 510 contracts w/ 41 classes
MultiEURLEX (Chalkidis et al., 2021) 📄 🤗 💻 EU legislation 🇬🇧 🇩🇪 🇫🇷 🇮🇹 🇪🇸 (18+) 65K laws w/ 4.5K labels
LEDGAR (Tuggener et al., 2020) 📄 💾 Contracts 🇬🇧 60.5K contracts w/ 12.6K labels
Contract Discovery (Borchmann et al., 2020) 📄 💻 Contracts 🇬🇧 2.6K clauses w/ 21 classes
EURLEX-57K (Chalkidis et al., 2019) 📄 💾 EU legislation 🇬🇧 57K laws w/ 4.3K labels
Unfair-ToS (Lippi et al., 2018) 📄 💾 Contracts 🇬🇧 9.4K sentences w/ 9 classes
Contract Elements (Chalkidis et al., 2017) 📄 💾 Contracts 🇬🇧 2.4K contracts w/ 10 classes
OPP-115 (Wilson et al., 2016) 📄 💾 Privacy laws 🇬🇧 115 policies w/ 23K labels

Legal Information Retrieval (LIR)

Dataset Links Domain Language Size
BSARD (Louis et al., 2022) 📄 🤗 💻 Belgian legislation 🇫🇷 1.1K questions w/ 22.6K candidate statutory articles
EU2UK (Chalkidis et al., 2021) 📄 💾 EU & UK legislation 🇬🇧 2K query documents w/ 52.5K candidate documents
UK2EU (Chalkidis et al., 2021) 📄 💾 EU & UK legislation 🇬🇧 2.1K query documents w/ 3.9K candidate documents
COLIEE-Case-Law-Retrieval (Rabelo et al., 2020) 📄 💾 Canadian precedents 🇬🇧 650 query cases w/ 128K candidate cases
COLIEE-Statute-Law-Retrieval (Rabelo et al., 2020) 📄 💾 Japanese legislation 🇬🇧 🇯🇵 808 questions w/ 768 candidate statutory articles
CAIL2019-SCM (Xiao et al., 2019) 📄 💻 Chinese court judgements 🇨🇳 8.9K triplets of cases

Legal Question Answering (LQA)

Dataset Links Domain Language Size
CaseHOLD (Zheng et al., 2021) 📄 💻 US case holdings 🇬🇧 53.1K multiple-choice questions
JEC-QA (Zhong et al., 2019) 📄 💾 Chinese law 🇨🇳 26.3K multiple-choice questions
CJRC (Duan et al., 2019) 📄 💻 Chinese court judgements 🇨🇳 50K question-answers from 10K documents
PrivacyQA (Ravichander et al., 2019) 📄 💻 Privacy policies 🇬🇧 1.7K question-answers from 35 documents

Legal Textual Entailment (LTE)

Dataset Links Domain Language Size
COLIEE-Case-Law-Entailment (Rabelo et al., 2020) 📄 💾 Canadian precedents 🇬🇧 425 cases w/ related case
COLIEE-Statute-Law-Entailment (Rabelo et al., 2020) 📄 💾 Japanese legislation 🇬🇧 🇯🇵 808 questions w/ related statutory article

Legal Text Summarization (LTS)

Dataset Links Domain Language Size
UK-Abs (Shukla et al., 2022) 📄 💻 💾 UK court cases 🇬🇧 793 pairs of (case, abastractive summary) from the UK Supreme Court
IN-Abs (Shukla et al., 2022) 📄 💻 💾 Indian court cases 🇬🇧 7.1K pairs of (case, abastractive summary) from the Indian Supreme Court
IN-Ext (Shukla et al., 2022) 📄 💻 💾 Indian court cases 🇬🇧 50 pairs of (case, extractive summary) from the Indian Supreme Court
TOS;DR (Keymanesh et al., 2020) 📄 💻 Terms of service 🇬🇧 1.6K pairs of (agreement text, summary) from data privacy policies
BillSum (Kornilova et al., 2019) 📄 💻 💾 US Congressional bills 🇬🇧 22.2K pairs of (bill, summary)
TL;DRLegal (Manor et al., 2019) 📄 💻 Terms of service 🇬🇧 84 pairs of (agreement text, summary) from software licenses
TOS;DR (Manor et al., 2019) 📄 💻 Terms of service 🇬🇧 421 pairs of (agreement text, summary) from data privacy policies
BVA Cases (Zhong et al., 2019) 📄 💻 US court cases 🇬🇧 92 pairs of (case, summary) from the US Board of Veterans' Appeal
LCR (Galgani et al., 2012) 📄 💾 Australian court cases 🇬🇧 3.9K pairs of (case, catchphrases)

Legal Language Modeling (LLM)

Dataset Links Language Size
Pile of Law (Henderson et al., 2022) 📄 🤗 💻 🇬🇧 ~256GB of legal and administrative legal text

Benchmarks

Dataset Task Language Tasks
FairLex (Chalkidis et al., 2022) 📄 🤗 💻 🇬🇧 🇩🇪 🇫🇷 🇮🇹 🇨🇳 Clasification (x1), legal judgement prediction (x3)
LexGLUE (Chalkidis et al., 2022) 📄 🤗 💻 🇬🇧 Classsification (x6), multiple-choice QA (x1)

🔥 Models

Model Links Language Size
Legal-HeBERT (Chriqui et al., 2022) 📄 🤗 💻 🇮🇱 110M
PoL-BERT-Large (Henderson et al., 2022) 📄 🤗 💻 🇬🇧 336M
Italian-LEGAL-BERT (Licari and Comande, 2022) 📄 🤗 🇮🇹 110M
JuriBERT (Douka et al., 2021) 📄 💾 🇫🇷 {6M, 15M, 42M, 110M}
Custom-LEGAL-BERT (Zheng et al., 2021) 📄 🤗 💻 🇬🇧 110M
LEGAL-BERT (Chalkidis et al., 2020) 📄 🤗 🇬🇧 {35M, 110M}
LEGAL-GPT-{1,2} (Borchmann et al., 2020) 📄 💻 🇬🇧 {117M, 1.5B}

📚 Books

  • [2017] Artificial Intelligence and Legal Analytics: New Tools for Law Practice in the Digital Age, K. Ashley. [link]

📄 Surveys

  • [2020-05] How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence, H. Zhong et al. [pdf]
  • [2019-09] A Brief History of the Changing Roles of Case Prediction in AI and Law, K. Ashley [pdf]
  • [2018-12] Deep learning in law: early adaptation and legal word embeddings trained on large corpora, I. Chalkidis et al. [pdf]

🎙 Talks

  • [2019-06] Law as Data: The Promise and Challenges of Natural Language Processing for Legal Research, A. Dyevre. [slides]
  • [2019-04] Artificial Intelligence and Law – An Overview and History, H. Surden. [video]

🗓 Conferences & Workshops

  • The Natural Legal Language Processing (NLLP) Workshop [website]
  • The International Conference on Artificial Intelligence and Law (ICAIL) [website]
  • The International Conference on Legal Knowledge and Information Systems (JURIX) [website]
  • The EXplainable AI in Law (XAILA) Workshop [website]
  • The International Workshop on Juris-informatics (JURISIN) [website]
  • The Competition on Legal Information Extraction/Entailment (COLIEE) [website]
  • The International Workshop on Legal Information Retrieval [website]