rexnxiaobai's Stars
meta-llama/llama
Inference code for Llama models
xai-org/grok-1
Grok open release
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
state-spaces/mamba
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
QwenLM/Qwen1.5
Qwen1.5 is the improved version of Qwen, the large language model series developed by Qwen team, Alibaba Cloud.
PKU-YuanGroup/Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
X-PLUG/MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
alibaba/EasyCV
An all-in-one toolkit for computer vision
invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning
lichao-sun/Mora
Mora: More like Sora for Generalist Video Generation
GaParmar/img2img-turbo
One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
microsoft/LLaVA-Med
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
PKU-YuanGroup/LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Yuliang-Liu/MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
HongguLiu/Deepfake-Detection
The Pytorch implemention of Deepfake Detection based on Faceforensics++
hendrycks/imagenet-r
ImageNet-R(endition) and DeepAugment (ICCV 2021)
Haiyang-W/GiT
Official Implementation of "GiT: Towards Generalist Vision Transformer through Universal Language Interface"
fdbtrs/ElasticFace
Official repository for ElasticFace: Elastic Margin Loss for Deep Face Recognition
large-ocr-model/large-ocr-model.github.io
layumi/U_turn
IJCV22 :see_no_evil: Attack your retrieval model via Query! They are not robust as you expected! :hear_no_evil:
MonsterZhZh/HRN
Implementation for Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification
JaySon-Huang/pyjpegtbx
一个针对JPEG格式图像提取原始数据,方便图像数据操作的python库
juan-csv/eye_blink_detection
eye blink detection