Bio-Medical-NER Project
This project is designed to extract biomedical entities such as symptoms, duration, diseases, medications, and medical tests from text inputs. It utilizes the Langchain library to interact with OpenAI's GPT-3.5 Turbo model to perform the extraction and build an API for biomedical entity extraction.
Project Structure
The project consists of the following files:
bio_medical_ner_parser.py
app.py
Dockerfile
parser.py
This file contains the core logic for extracting biomedical entities from text. Here's an overview of its components:
Initialization and Model Selection
- The
BioMedicalParser
class is initialized, which selects the appropriate OpenAI model based on the current date. - It sets up the ChatOpenAI instance with the chosen model.
Output Schema Definition
- The parser defines output schemas for different biomedical entities such as symptoms, duration, diseases, medications, and medical tests.
Chat Prompt Template
- A chat prompt template is defined, instructing the GPT-3.5 Turbo model to extract specific biomedical information and format the output as JSON.
Entity Extraction
- The
extract_biomedical_entities
method takes an input text, formats it into chat messages, sends it to the GPT-3.5 Turbo model, and then parses the model's response to extract biomedical entities.
Message Formatting
- The
format_messages
method prepares input text for the chat by incorporating the chat prompt template.
app.py
This file contains a Flask web application that exposes an API for interacting with the biomedical entity extraction functionality defined in parser.py
. Here's an overview:
Health Check Endpoint
- The
/health
endpoint checks the status of the API.
Bio-Medical-NER Query Endpoint
- The
/bio-medical-ner/query
endpoint expects POST requests with aninput_text
parameter. - It calls the
extract_biomedical_entities
method fromparser.py
to extract biomedical entities from the input text. - The extracted entities are returned as JSON in the response.
Error Logging
- Errors are logged to a file named
bio_medical_ner.log
.
Dockerfile
This Dockerfile is used to containerize the application. It sets up the necessary environment and dependencies for running the Flask web application.
Usage
- Build the Docker container using the provided Dockerfile.
- Run the container to start the Flask web application.
- Send POST requests to the
/bio-medical-ner/query
endpoint with theinput_text
parameter to extract biomedical entities from text.
Example API Request:
curl -X POST -F "input_text=Patient has a headache and fever" http://localhost:5000/bio-medical-ner/query
Dependencies
- Python 3.8
- Flask
- Langchain
- OpenAI GPT-3.5 Turbo model
- dotenv
Environment Configuration
- The environment variables can be configured in a
.env
file.
Running the Application
To run the application, execute the following command:
docker build -t bio-medical-ner .
docker run -p 5000:5000 bio-medical-ner
The API will be accessible at http://localhost:5000
.
Logging
Error logs are written to the bio_medical_ner.log
file within the container.
Health Check
You can check the health of the API by sending a GET request to http://localhost:5000/health
.
Contributors
- Rafi
Inspiration
This project is based on the knowledge acquired from this course: https://www.deeplearning.ai/short-courses/langchain-for-llm-application-development/