/vkit

a Visionary toolKIT made with peace & love

Primary LanguagePythonOtherNOASSERTION

Overview

Introduction

logo.svg

license schedule

NOTICE: Documentation is out-of-date. Will be updated in version 22.3.1.

vkit is a toolkit designed for CV (Computer Vision) developers, especially targeting document image analysis and optical character recognition workloads:

  • Supporting rich data augmentation strategies:
    • Common photometric distortion strategies such as various colorspace manipulation methods and image noise related techniques
    • ⭐ Common geometric distortion strategies such as various affine transformations and non-linear transformations (e.g. similarity MLS, camera-model based 3D surface curving, folding effect, etc.)
    • ⭐ Simultaneously transforming labeled data while performing geometric distortion. As an example, while an image was rotated, vkit will rotate the corresponding positional label (e.g. image mask, polygons) at the same time without manual intervention.
  • Supporting comprehensive data type encapsulation and the corresponding visualization:
    • Image type (encapsulation based on PIL, supporting reading/writing various image file types)
    • Labeled data type: mask, score map, box, polygon and so on
  • Industrial-grade code quality:
    • Auto-completion and type hint friendly, making it practical to be used in production
    • Matured package and dependency management
    • Automated code style enforcement (based on flake8) and static type checker (based on pyright)

Remarks: ⭐ Highlights (features that other similar projects have not, or not elegantly supported)

Demo!

camera_cubic_curve:

home_page_camera_cubic_curve.gif

rotate:

home_page_camera_cubic_curve.gif

Objectives

The author, as a CV/NLP engineer, wishes to bring the convenience to developers in the aforementioned disciplines through this project:

  • To free developers from the tedious data governance tasks, therefore more time can be spent on actual high-value work such as the data governance strategies, model designing and fine tuning
  • To consolidate common data augmentation techniques, aiming to aid document image analysis and recognition researches, and their industrial practices. The author wishes to make the "secret sauce", i.e. the industrial grade data augmentation methods, available to public
  • To construct open-source industrial document image analysis and recognition solutions powered by vkit:
    • Distortion correction
    • Hyper resolution
    • OCR
    • Layout Analysis
    • ...

Installation

CPython version requirement: 3.8 or above

To install the stable release:

pip install vkit

To install the nightly version (tracking the latest commit in main branch):

pip install vkit-nightly

(click here to visit the nightly documentation)

Recent release plans

  • 22.3.0
    • Support dataset pipeline for OCR text detection
    • Support dataset pipeline for OCR text recognition
  • 22.3.1
    • Refactoring
    • Improve documentation
    • Release resources for pipelines

Recent stable releases

  • 22.2.0
    • Improve element classes design.
    • Improve element visualization.
    • Support dataset pipeline for OCR text detection (adaptive_scaling)
    • Support CPython 3.10
  • 22.1.0
    • Use the CalVer versioning convention
    • Complete CI testing pipeline
    • Redesign project structure
    • Support font rendering
    • Add more data augmentation methods
    • Support data augmentation policy
  • 0.1.2
    • Remove strict dependency versioning
  • 0.1.1
    • User manual (English version)
    • GitHub Page for serving user manual
  • 0.1.0
    • Support CPython 3.9
    • Support CPython 3.8
    • Image type encapsulation
    • Labeled data type encapsulation
    • Common photometric distortion strategies
    • Common geometric distortion strategies
    • User manual

Communication

Your kind understanding will be greatly appreciated if the response is slow on these forums as the author is busy with his work while he cannot devote his full time into this project