/multiLangSpeechToImage

Multiple language speech to image running locally using OpenAI's whisper and Stability AI's stable-diffusion

Primary LanguageJupyter NotebookMIT LicenseMIT

Open-source multi-lingual speech-to-image project

Objective:

Given that:

  • Visual art is a foundational form of human self-expression
  • Speech is the foundational form of human communication
  • Not everyone is literate
  • Not everyone is sufficiently skilled or confident to generate visual art through traditional or digital media

And in particular:

  • Not everyone is a native English speaker
  • The most powerful AI text-to-image generation models are based on exclusively English-language prompts

Therefore:

  • This project intends to provide a means for anyone to generate visual art directly through their speech, without presumption or prejudice with regard to their native language or level of literacy.

Pipeline:

Basic requirements:

Installation:

  • Create and activate a fresh python v3.10.6 venv
  • git clone this repository
  • Install the dependencies with pip install -r requirements.txt
  • Download the stable-diffusion weights
    • git lfs install
    • git clone https://huggingface.co/runwayml/stable-diffusion-v1-5