tworuler's Stars
xtekky/gpt4free
The official gpt4free repository | various collection of powerful language models
XingangPan/DragGAN
Official Code for DragGAN (SIGGRAPH 2023)
huggingface/pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
mli/paper-reading
深度学习经典、新论文逐段精读
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
LlamaFamily/Llama-Chinese
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
KwaiVGI/LivePortrait
Bring portraits to life!
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
openai/shap-e
Generate 3D objects conditioned on text or images
ggerganov/ggml
Tensor library for machine learning
wolfpld/tracy
Frame profiler
modelscope/facechain
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
voxel51/fiftyone
Refine high-quality datasets and visual AI models
01-ai/Yi
A series of large language models trained from scratch by developers @01-ai
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
OpenGVLab/InternGPT
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
VainF/Awesome-Anything
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
IDEA-Research/awesome-detection-transformer
Collect some papers about transformer for detection and segmentation. Awesome Detection Transformer for Computer Vision (CV)
Ma-Lab-Berkeley/CRATE
Code for CRATE (Coding RAte reduction TransformEr).
cvdfoundation/open-images-dataset
Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.
mbzuai-oryx/groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
ziqihuangg/Collaborative-Diffusion
[CVPR 2023] Collaborative Diffusion
mayuelala/FollowYourEmoji
[Siggraph Asia 2024] Follow-Your-Emoji: This repo is the official implementation of "Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation"
cosmicman-cvpr2024/CosmicMan
CosmicMan: A Text-to-Image Foundation Model for Humans (CVPR 2024)
FacePerceiver/LAION-Face
The human face subset of LAION-400M for large-scale face pretraining.
diffusion-facex/FaceX
kyegomez/Kosmos2.5
My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"