A suite of NLP tools to simplify legal documents

Apart from the fact that they're all legal documents:

They tend to have complex sentence structure and vocabulary choices that aren't accessible to people only familiar with conversational English
Are difficult to comprehend for non-native speakers of the language they are written in
Can run into several tens of pages (if not more)

Legal-ease addresses these issues using three tools:

QnA over legal documents: Copy your document and ask it questions. Useful whether you have questions about the document as a whole or a specific clause.
Document summarization: Generate a summary of the document. Options include changing the length of the summary (small, medium or large) and a choice between paragraphs or bullets.
Multi & Cross-lingual document search: Perform cross-lingual semantic search over a collection of legal documents. This is currently a showcase feature allowing the user to perform keyword as well as semantic search over a collection of COVID-19 pandemic legislative documents and returns the top-3 document matches. Also features the option to translate into other languages [currently English-only].

Create a free-tier Cohere account and set the COHERE_API_KEY environment variable.
Create a free-tier Qdrant cluster and set the following environment variables - QDRANT_API_KEY AND QDRANT_HOST.
Install requirements.

cd <project_dir>

conda create -n legal-ease --file requirements.txt

conda activate legal-ease

In the project dir, run:

python gradio_demo.py

To run the app in reload mode:

gradio gradio_demo.py

The app should typically appear on the url: http://localhost:7860

Cohere: Cohere offers capability to add cutting-edge language processing to any system. They train large language models with API access. Legal-ease uses Cohere's multilingual-22-12 model to obtain multilingual embeddings, the summarize-xlarge model for summarization and command-xlarge-nightly for question answering.
Qdrant: Qdrant is a vector similarity engine & vector database and comes with an API service for semantic search - searching for the nearest high-dimensional vectors.
Langchain: It is an open source library that provides abstractions for building LLM-based applications
Gradio: The frontend of the application is built using Gradio.
HF Spaces: Hugging Face Spaces offers deployment support for ML applications. Here is the link to our space

We'd like to thank Joel Niklaus for open-sourcing so many datasets and models related to the legal domain. We particularly found the english_contracts_summarization and covid19_emergency_event datasets to be very useful for our project.