SikaStar
I am now a fifth-year PhD student at National Engineering Lab for Video Technology in Peking University, Beijing, China
Peking UniversityBeijing China
SikaStar's Stars
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
microsoft/autogen
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
tloen/alpaca-lora
Instruct-tune LLaMA on consumer hardware
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
FlagOpen/FlagEmbedding
Retrieval and Retrieval-augmented LLMs
PKU-YuanGroup/ChatLaw
ChatLaw:A Powerful LLM Tailored for Chinese Legal. 中文法律大模型
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
pengxiao-song/LaWGPT
🎉 Repo for LaWGPT, Chinese-Llama tuned with Chinese Legal knowledge. 基于中文法律知识的大语言模型
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
AILab-CVC/YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
LLaVA-VL/LLaVA-NeXT
jeinlee1991/chinese-llm-benchmark
中文大模型能力评测榜单:目前已囊括128个大模型,覆盖chatgpt、gpt-4o、谷歌gemini、百度文心一言、阿里通义千问、百川、讯飞星火、商汤senseChat、minimax等商用模型, 以及qwen2.5、llama3.1、glm4、书生internLM2.5、openbuddy、AquilaChat等开源大模型。不仅提供能力评分排行榜,也提供所有模型的原始输出结果!
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
FoundationVision/GLEE
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models
A collection of resources on controllable generation with text-to-image diffusion models.
beichenzbc/Long-CLIP
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
jianzhnie/awesome-text-to-video
A Survey on Text-to-Video Generation/Synthesis.
THU-KEG/EvaluationPapers4ChatGPT
Resource, Evaluation and Detection Papers for ChatGPT
Skytliang/Multi-Agents-Debate
MAD: The first work to explore Multi-Agent Debate with Large Language Models :D
yuweihao/MM-Vet
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
Traffic-X/ViT-CoMer
Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.
OatmealLiu/FineR
[ICLR'24] Democratizing Fine-grained Visual Recognition with Large Language Models
PKU-YuanGroup/ProLLaMA
A Protein Large Language Model for Multi-Task Protein Language Processing
WangWenhao0716/VidProM
[NeurIPS 2024] VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models
ByZ0e/AI2Thor_keyboard_player
AI2-THOR Data Collection Tool Based On Keyboard Interaction
MM-LLMs/mm-llms.github.io