Copy the required documents to their appropriate folders,
Resume
files in.pdf
format todocuments/resume
Report
files in.docx
format todocuments/reports
- main.py #Runs the dev server
- document_summarizer.py #loads documents, splits, initialises vectordb, and queries for documents.
- documents
- resume/*.pdf #For prototype 1, query for files based on criteria
- reports/*.docx #For prototype 2, summarising huge documents
Ideally, create a virtual environment by running
python -m venv .venv
python -m venv .venv
pip install -r requirements.txt
python main.py
Runs on http://127.0.0.1:8095
METHOD: POST
HOST: http://127.0.0.1:8095
Content-Type: application/json
RAW-BODY:
{
"query": String,
"doc_type": "resume" | "document"
}
This structure works for 1st one, for summarising resume. The Second one, for summarising huge documents, needs more work to output the required data structure for consumption/generating a file in the requird format.
{
"Match result criteria": String,
"Query": String,
"File": String, #Filename
"Reference": {
"Page Number": Number,
"Content": String
}
}
{
"query": "which resume would be a good fit for a marketing job?",
"doc_type": "resume"
}
{
"File": "documents/resume/....-Resume.pdf",
"Match result criteria": "Results -driven and motivated marketing professional with one year of experience and a recent Bachelor of Commerce degree",
"Query": "which resume would be a good fit for a marketing job?",
"Reference": {
"Content": ".....",
"Page Number": 0
}
}
- Play around with the
pipeline
parameters (temperature, top_p, etc) to tweak the results huggingface_pipelines
- Internally build a function to query the required fields for the output.
- Return the required datastructure.