xhyandwyy's Stars
All-Hands-AI/OpenHands
🙌 OpenHands: Code Less, Make More
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
microsoft/autogen
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
meta-llama/llama3
The official Meta Llama 3 GitHub site
stanford-oval/storm
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
e2b-dev/awesome-ai-agents
A list of AI autonomous agents
axolotl-ai-cloud/axolotl
Go ahead and axolotl questions
apple/corenet
CoreNet: A library for training deep neural networks
Ucas-HaoranWei/GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
pywinauto/pywinauto
Windows GUI Automation with Python (based on text properties)
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 100+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
AILab-CVC/VideoCrafter
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
google-research/big_vision
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
ytongbai/LVM
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
landing-ai/vision-agent
Vision agent
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
PKU-YuanGroup/MagicTime
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
mini-sora/minisora
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
Vchitect/SEINE
[ICLR 2024] SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
kyegomez/ScreenAI
Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"
TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" (TMLR2024)
AILab-CVC/Make-Your-Video
[IEEE TVCG 2024] Customized Video Generation Using Textual and Structural Guidance
zjunlp/KnowAgent
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
OpenGVLab/GUI-Odyssey
GUI Odyssey is a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos.
bytarnish/AGILE
X-PLUG/MM_StoryAgent