shiyuzh2007

shiyuzh2007's Stars

AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion web UI
Language:Python142k 1.1k 7.7k26.8k
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python83.4k 1.7k 46k22.5k
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python70.4k 574 08.3k
CompVis/stable-diffusion
A latent text-to-image diffusion model
Language:Jupyter Notebook68.1k 558 71310.1k
meta-llama/llama
Inference code for Llama models
Language:Python56.2k 526 9769.6k
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
Language:Jupyter Notebook35.9k 329 4414.2k
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python22.1k 186 4902.2k
dennybritz/reinforcement-learning
Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
Language:Jupyter Notebook20.5k 860 1556k
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python20k 157 1.5k2.2k
google-deepmind/deepmind-research
This repository contains implementations and illustrative code to accompany DeepMind publications
Language:Jupyter Notebook13.2k 324 3212.6k
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
12.4k 274 116791
Unstructured-IO/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Language:HTML8.9k 57 1.1k733
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Language:Python8.8k 134 1.1k1.4k
LargeWorldModel/LWM
Large World Model With 1M Context
Language:Python7.1k 66 71551
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python5.8k 65 166450
baichuan-inc/Baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
Language:Python5.7k 67 128506
facebookresearch/mmf
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Language:Python5.5k 114 657938
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Language:Python5k 49 447375
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
Language:Python4k 22 1.2k357
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Language:Python3.3k 59 102323
pengzhiliang/MAE-pytorch
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners
Language:Python2.6k 24 96341
microsoft/i-Code
Language:Jupyter Notebook1.7k 40 74161
opendilab/DI-star
An artificial intelligence platform for the StarCraft II with large-scale distributed training and grand-master agents.
Language:Python1.2k 18 26115
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
Language:Python1k 26 5780
yeyupiaoling/Whisper-Finetune
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
Language:C863 8 90139
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
Language:Python748 71 14110
VinF/deer
DEEp Reinforcement learning framework
Language:Python485 50 32126
ReinholdM/Offline-Pre-trained-Multi-Agent-Decision-Transformer
Language:Python105 2 1216
pengzhendong/welm
One command to build TLG.fst for WeNet.
Language:C++29 3 21
shiyuzh2007/jaxrl
JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.
Language:Jupyter Notebook1 0 00