shiyuzh2007's Stars
AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion web UI
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
CompVis/stable-diffusion
A latent text-to-image diffusion model
meta-llama/llama
Inference code for Llama models
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
dennybritz/reinforcement-learning
Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
google-deepmind/deepmind-research
This repository contains implementations and illustrative code to accompany DeepMind publications
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
LargeWorldModel/LWM
Large World Model With 1M Context
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
baichuan-inc/Baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
pengzhiliang/MAE-pytorch
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners
microsoft/i-Code
opendilab/DI-star
An artificial intelligence platform for the StarCraft II with large-scale distributed training and grand-master agents.
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
yeyupiaoling/Whisper-Finetune
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
VinF/deer
DEEp Reinforcement learning framework
ReinholdM/Offline-Pre-trained-Multi-Agent-Decision-Transformer
pengzhendong/welm
One command to build TLG.fst for WeNet.
shiyuzh2007/jaxrl
JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.