longcw's Stars
chenfei-wu/TaskMatrix
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Mintplex-Labs/anything-llm
The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.
Rudrabha/Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
mlfoundations/open_clip
An open source implementation of CLIP.
leptonai/search_with_lepton
Building a quick conversation-based search demo with Lepton AI.
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
OpenTalker/video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
IDEA-Research/GroundingDINO
Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
OpenBMB/MiniCPM-V
MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
tencent-ailab/IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
OpenBMB/MiniCPM
MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
ali-vilab/AnyDoor
Official implementations for paper: Anydoor: zero-shot object-level image customization
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
TencentARC/T2I-Adapter
T2I-Adapter
sail-sg/EditAnything
Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)
promptfoo/promptfoo
Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.
omerbt/TokenFlow
Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)
Computer-Vision-in-the-Wild/CVinW_Readings
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
CircleRadon/Osprey
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
bytedance/MVDream
Multi-view Diffusion for 3D Generation
LLaVA-VL/LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
lkeab/gaussian-grouping
Gaussian Grouping for open-world Anything reconstruction, segmentation and editing.
bytedance/MVDream-threestudio
3D generation code for MVDream
aimagelab/multimodal-garment-designer
This is the official repository for the paper "Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing". ICCV 2023
LAION-AI/laion-datasets
Description and pointers of laion datasets
xuanandsix/GFPGAN-onnxruntime-demo
This is the onnxruntime inference code for GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior (CVPR 2021). Official code: https://github.com/TencentARC/GFPGAN
bychen7/Face-Restoration-TensorRT
A simple face restoration TensorRT deployment solution.