/Document-Layout-Analysis

Object Detection Model for Scanned Documents

Primary LanguageJupyter NotebookMIT LicenseMIT


Layout Analysis of Scanned Documents

Document Layout Analysis using YOLOv8
View Demo · Report Bug · Request Feature

Table of Contents
  1. Updates
  2. About The Project
  3. Getting Started
  4. Works Cited
  5. Acknowledgments

Updates

In this project, I provided 1 object detection model trained on the existing YOLOv8 weights. They are uploaded in my Hugging Face Space of the project. If you feel the need to use or fine-tune the models in any parts of your work, please cite this repository. Thank you, and don't forget to give this repo a 🌟!

About The Project

Due to the lack of computational resources, I only performed the training process on the Doclaynet-base dataset which contains 6910 train images, 648 val images, 499 test images. However, the model could perform relatively well, further proving the superiority of YOLOv8 model.

(back to top)

Built With

(back to top)

Prerequisites

  1. python 3
  2. ultralytics
  3. numpy
  4. opencv-python

Installation

  1. Clone the repo
    git clone https://github.com/LynnHaDo/Document-Layout-Analysis.git
  2. Install packages
    pip install ultralytics
    pip install numpy
    pip install opencv-python
  3. Download Doclaynet dataset and save it as datasets/doclaynet-base
  4. (Optional) Download pretrained YOLOv8s weights

(back to top)

Works Cited

  1. Ultralytics YOLOv8

    authors:
     - family-names: Jocher
       given-names: Glenn
       orcid: "https://orcid.org/0000-0001-5950-6979"
     - family-names: Chaurasia
       given-names: Ayush
       orcid: "https://orcid.org/0000-0002-7603-6750"
     - family-names: Qiu
       given-names: Jing
       orcid: "https://orcid.org/0000-0003-3783-7069"
    title: "YOLO by Ultralytics"
    version: 8.0.0
    date-released: 2023-1-10
    license: AGPL-3.0
    url: "https://github.com/ultralytics/ultralytics"
  2. Doclaynet-base dataset

    @article{doclaynet2022,
     title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation},
     doi = {10.1145/3534678.353904},
     url = {https://doi.org/10.1145/3534678.3539043},
     author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},
     year = {2022},
     isbn = {9781450393850},
     publisher = {Association for Computing Machinery},
     address = {New York, NY, USA},
     booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
     pages = {3743–3751},
     numpages = {9},
     location = {Washington DC, USA},
     series = {KDD '22}
     }

Contact

Linh Do - do24l@mtholyoke.edu/dohalinh2303@gmail.com (personal)

Project Link: https://github.com/LynnHaDo/Document-Layout-Analysis

LinkedIn: https://linkedin.com/in/Linh Do

(back to top)