/Document-processing-Pdf-Structured-Data-Extractor

This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.

Primary LanguageJupyter Notebook

pdf-structured-data-extractor

This project demonstrates how to extract structured information from PDF documents using a combination of Langchain, OpenAI models, and the DocLing library. It provides a framework for parsing PDFs and leveraging LLMs to identify and format key data points.

Key Technologies:

  • Langchain: Used for orchestrating the data extraction process and interacting with LLMs.
  • OpenAI Models: Provides the large language model capabilities for identifying and structuring information.
  • DocLing: A library for processing documents.