Apart from the fact that they're all legal documents:
- They tend to have complex sentence structure and vocabulary choices that aren't accessible to people only familiar with conversational English
- Are difficult to comprehend for non-native speakers of the language they are written in
- Can run into several tens of pages (if not more)
-
QnA over legal documents: Copy your document and ask it questions. Useful whether you have questions about the document as a whole or a specific clause.
-
Document summarization: Generate a summary of the document. Options include changing the length of the summary (
small
,medium
orlarge
) and a choice betweenparagraphs
orbullets
. -
Multi & Cross-lingual document search: Perform cross-lingual semantic search over a collection of legal documents. This is currently a showcase feature allowing the user to perform keyword as well as semantic search over a collection of COVID-19 pandemic legislative documents and returns the top-3 document matches. Also features the option to translate into other languages [currently English-only].
-
Create a free-tier Cohere account and set the COHERE_API_KEY environment variable.
-
Create a free-tier Qdrant cluster and set the following environment variables - QDRANT_API_KEY AND QDRANT_HOST.
-
Install requirements.
cd <project_dir>
conda create -n legal-ease --file requirements.txt
conda activate legal-ease
In the project dir, run:
python gradio_demo.py
To run the app in reload mode:
gradio gradio_demo.py
The app should typically appear on the url: http://localhost:7860
-
Cohere: Cohere offers capability to add cutting-edge language processing to any system. They train large language models with API access. Legal-ease uses Cohere's
multilingual-22-12
model to obtain multilingual embeddings, thesummarize-xlarge
model for summarization andcommand-xlarge-nightly
for question answering. -
Qdrant: Qdrant is a vector similarity engine & vector database and comes with an API service for semantic search - searching for the nearest high-dimensional vectors.
-
Langchain: It is an open source library that provides abstractions for building LLM-based applications
-
Gradio: The frontend of the application is built using Gradio.
-
HF Spaces: Hugging Face Spaces offers deployment support for ML applications. Here is the link to our space
- We'd like to thank Joel Niklaus for open-sourcing so many datasets and models related to the legal domain. We particularly found the english_contracts_summarization and covid19_emergency_event datasets to be very useful for our project.