Open Model Initiative - Data Pipeline

Welcome to the Open Model Initiative Data Pipeline repository!

This project aims to provide open-source, community-driven training pipelines and code for developing baseline AI models for image generation. Other modalities may be released in future updates, whether in this repo or others.

About the Project

The Open Model Initiative Data Pipeline is a collaborative effort to create and maintain high-quality, openly licensed baseline AI models, and provide tooling to maintain and curate datasets for large training projects. Our goal is to empower individuals and organizations to leverage and build upon these models for their own solutions, and to allow creatives the capacity to utilize these emerging tools for their own creative pursuits.

Key goals:

Effective captioning management and curation tools for large datasets
Standardized metadata format for utilizing datasets across applications
Comprehensive documentation and examples

Contributing

We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or proposing new features, your input is valuable. Please read our Contribution Guidelines and our Getting Stated Guide for more information on how to get started.

License

This project and its artifacts are planned to be licensed under various permissive licenses:

Software source code: Apache License, Version 2.0

Model parameters, weights, and metadata: CDLA-Permissive 2.0 License

Please see the respective license files for full details.

Contact

For questions, suggestions, or discussions, please:

Open an issue in this repository
Join our community Discord server