NLP project for MTS AI
This system, designed for efficient airline ticket booking, consists of three main components: the Q4 quantized Mistral-7B-Instruct-v0.1 language model, the ChromaDB database system, and a user name extraction module powered by bert-large-NER. The language model processes and responds to user requests, while the user name extraction module, utilizing a fine-tuned BERT model, accurately identifies user names from inputs. The ChromaDB system stores and retrieves user ticket data, initially held in a pandas dataframe with flights information for efficient manipulation. These components work together to automate ticket booking, providing a personalized user experience.
- bert_ner.py - A fine-tuned BERT model for entity recognition
- chat.py - Runs the chat
- embedder.py - An embedding sup-simcse-roberta-large model to operate with text in vector db
- evaluator.py - Evaluates the model answer correctness
- flights_db_filler.py - Fills the database with synthetic data
- flights_db.py - A class to operate on the pandas flights dataframe
- llm.py - A class to interact with the language model
- tickets_db.py - A class to operate on the ChromaDB database
- utils.py - Utility functions for features extraction from text
This system uses the Q4 version of LLM through the LLAMA_cpp_python binding. Other Language Models work through the transformers library.
- Install requirements.txt
- Download the Mistral-7B-Instruct-v0.1 Q4 version
- Specify variables in .env
- Run flight_db_filler.py to fill the database with synthetic data
- Run chat.py