HYLcool's Stars
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
yt-dlp/yt-dlp
A feature-rich command-line audio/video downloader
Stirling-Tools/Stirling-PDF
#1 Locally hosted web application that allows you to perform various operations on PDF files
ray-project/ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
karpathy/llm.c
LLM training in simple, raw C/CUDA
KindXiaoming/pykan
Kolmogorov Arnold Networks
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
XavierXiao/Dreambooth-Stable-Diffusion
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
trekhleb/state-of-the-art-shitcode
💩State-of-the-art shitcode principles your project should follow to call it a proper shitcode
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
modelscope/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
LLaVA-VL/LLaVA-NeXT
xinyu1205/recognize-anything
Open-source and strong foundation image recognition models.
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Vchitect/Latte
Latte: Latent Diffusion Transformer for Video Generation.
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Unity-Technologies/com.unity.perception
Perception toolkit for sim2real training and validation in Unity
Vchitect/VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
snap-research/Panda-70M
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
mira-space/MiraData
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
vis-nlp/ChartQA
evalcrafter/EvalCrafter
[CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
modelscope/lite-sora
An initiative to replicate Sora
OpenGVLab/MMT-Bench
ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
sayakpaul/single-video-curation-svd
Educational repository for applying the main video data curation techniques presented in the Stable Video Diffusion paper.
locuslab/scaling_laws_data_filtering
zwq2018/Multi-modal-Self-instruct
The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
wenhanwu95/FreqMixFormer
[ACM MM 2024] Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer