Tabula PDF Extraction

This branch is dedicated to the development of a tabular information extraction feature from PDF files using the Tabula library. The purpose of this branch is to extract structured data from PDF documents in a tabular format.

Overview

In this branch, we aim to leverage the Tabula library to extract tabular information from PDF files. The Tabula library provides a convenient way to extract tables from PDF documents, making it easier to work with structured data.

Features

  • Utilizes the Tabula library for PDF table extraction.
  • Extracts tabular information from PDF files.
  • Outputs structured data in a tabular format.

Getting Started

To get started with the tabular PDF extraction feature, follow the steps below:

  1. Clone this repository to your local machine.

    git clone git@github.com:shekolla/tabula_pdf_extraction.git
    
  2. Switch to the tabula_pdf_extraction branch.

    git checkout tabula_pdf_extraction
    
  3. Install the necessary dependencies. Make sure you have Java and the required JRE installed on your system.

  4. Run the application and provide the PDF file path as input.

  5. The application will use Tabula to extract tabular data from the PDF file and output the results in a tabular format.

Contributing

Contributions to the tabula_pdf_extraction branch are welcome. If you have any improvements, bug fixes, or new features to propose, feel free to open a pull request. Please ensure that your changes align with the purpose of the branch and maintain code quality by following the repository's guidelines.

License

This project is licensed under the MIT License. Feel free to modify and distribute this code as per the terms of the license.

Contact

If you have any questions or need further assistance, please feel free to contact the project maintainer.

Happy PDF table extraction!