ChatSD is designed to make image generation tasks easily.
ChatSD is based on LLM(Large Language Model) and Stable Diffusion model. So when you communicate with ChatSD, it can understand your intentions and interpret them to appropriate prompts, and pass them into Stable Diffusion model for image generation.
At this point, ChatSD uses ChatGLM-6B and Openjourney, it may support more LLMs and Diffusion models in the future. (Note: this is a project for me to understand llm/diffusion/langchain better)
- Clone the project and go to the project workspace:
# clone the project
git clone ....
# go to directory
cd ChatSD
- Create a conda environment named
chatsd
and activate it:
# create a environment named `chatsd` and activate it
conda env create -f environment.yaml
conda activate chatsd
Note: if you want to remove the environment, then execute:
conda deactivate
conda remove -n chatsd --all
- Install the cuda version of
torch
:
refer to https://pytorch.org/ and execute:
## cuda version 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
- Run the
main.py
script:
python main.py
Note: run the script will download pretrained models from Hugging Face, and if the download process is interrupted due to the unstable network, you can re-execute the script for multiple times for downloading the models continuously.
If you want to input your instructions to ChatSD, then execute:
python main.py --input "Generate an image of cat for me" --grid_rows 2 --grid_cols 2 --image_output_dir "images"
I want to generate a logo for this project, so I execute the following command for 4 times:
python main.py --input "logo of cat, cute, happy, smile" --grid_rows=3 --grid_cols=3
and the results are:
I appreciate the open source of the following projects. Thanks to all the developers, your efforts make the world a better place:
visual-chatgpt Hugging Face LangChain Stable Diffusion ChatGLM-6B clip-interrogator text2image-prompt-generator prompt-generator openjourney