/vltk

A toolkit for vision-language processing to support the increasing popularity of mulit-modal transformer-based models

Primary LanguageHTMLApache License 2.0Apache-2.0

Installation

To install (add editable for personal custimization)

git clone https://github.com/eltoto1219/vltk.git && cd vltk && pip install -e .

Alternatively:

pip install vltk

Documentation

The documentation is up! at vltk documentation

It is pretty bare bones for now, however first on the agenda to be added will be:

  1. Usage of adapters to rapidly create datasets.
  2. An overview of all the config options for automatically instantiating PyTorch dataloaders from one to many different datasets at once
  3. An overview of how dataset metadata is automatically + deterministically collected from multiple datasets
  4. Usage of modality prcoessors for language, vision, and language X vision which make it possible to universally load any visn, lang, visn-lang dataset.

Collaboration

There are many exciting directions and improvements I have in mind to make in vltk. While this is the "official" beginning of the project, please email me for any suggestions/collaboration ideas: antonio36764@gmail.com