JerryWei1985's Stars
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
adobe-fonts/source-code-pro
Monospaced font family for user interface and coding environments
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Stability-AI/generative-models
Generative Models by Stability AI
Stability-AI/stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models
bmaltais/kohya_ss
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
THUDM/CogVideo
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
guoyww/AnimateDiff
Official implementation of AnimateDiff.
TachibanaYoshino/AnimeGAN
A Tensorflow implementation of AnimeGAN for fast photo animation ! This is the Open source of the paper 「AnimeGAN: a novel lightweight GAN for photo animation」, which uses the GAN framwork to transform real-world photos into anime images.
kanasimi/work_crawler
Download comics novels 小说漫画下载工具 小説漫画のダウンローダ 小說漫畫下載:腾讯漫画 大角虫漫画 有妖气 咪咕 SF漫画 哦漫画 看漫画 漫画柜 汗汗酷漫 動漫伊甸園 快看漫画 微博动漫 733动漫网 大古漫画网 漫画DB 無限動漫 動漫狂 卡推漫画 动漫之家 动漫屋 古风漫画网 36漫画网 亲亲漫画网 乙女漫画 webtoons 咚漫 ニコニコ静画 ComicWalker ヤングエースUP モアイ pixivコミック サイコミ;アルファポリス カクヨム ハーメルン 小説家になろう 起点中文网 八一中文网 顶点小说 落霞小说网 努努书坊 笔趣阁→epub.
gpakosz/.tmux
🇫🇷 Oh my tmux! My self-contained, pretty & versatile tmux configuration made with ❤️
instantX-research/InstantID
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
thuanz123/enhancing-transformers
An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch
lucidrains/magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
google-research/magvit
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
facebookresearch/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
microsoft/LLaVA-Med
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
microsoft/ML-For-Beginners
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
microsoft/generative-ai-for-beginners
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
avaneev/r8brain-free-src
High-quality pro audio resampler / sample rate conversion C++ library. Very fast, for both audio resampling and time-series interpolation.
mct10/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
mli/autocut
用文本编辑器剪视频
microsoft/table-transformer
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS evaluation metric.