Welcome to the Open Model Initiative Data Pipeline repository!
This project aims to provide open-source, community-driven training pipelines and code for developing baseline AI models for image generation. Other modalities may be released in future updates, whether in this repo or others.
The Open Model Initiative Data Pipeline is a collaborative effort to create and maintain high-quality, openly licensed baseline AI models, and provide tooling to maintain and curate datasets for large training projects. Our goal is to empower individuals and organizations to leverage and build upon these models for their own solutions, and to allow creatives the capacity to utilize these emerging tools for their own creative pursuits.
- Effective captioning management and curation tools for large datasets
- Standardized metadata format for utilizing datasets across applications
- Comprehensive documentation and examples
We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or proposing new features, your input is valuable. Please read our Contribution Guidelines and our Getting Stated Guide for more information on how to get started.
This project and its artifacts are planned to be licensed under various permissive licenses:
Software source code: Apache License, Version 2.0
Model parameters, weights, and metadata: CDLA-Permissive 2.0 License
Please see the respective license files for full details.
For questions, suggestions, or discussions, please:
- Open an issue in this repository
- Join our community Discord server
We look forward to your participation in the Open Model Initiative!