Pinned Repositories
mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
mini-omni's Repositories
mini-omni/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
mini-omni/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。