alfredplpl
Research Scientist. Interests: data science, machine learning, robotics, neuroscience
CyberAgent, incJapan
alfredplpl's Stars
kohya-ss/musubi-tuner
iejMac/video2dataset
Easily create large video dataset from video urls
microsoft/TRELLIS
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation".
CelebV-HQ/CelebV-HQ
[ECCV 2022] CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
PKU-YuanGroup/LLaVA-CoT
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
NVIDIA/Cosmos-Tokenizer
A suite of image and video neural tokenizers
kijai/ComfyUI-MochiWrapper
genmoai/mochi
The best OSS video generation models
lucidrains/rectified-flow-pytorch
Implementation of rectified flow and some of its followup research / improvements in Pytorch
yukara-ikemiya/friendly-stable-audio-tools
Refactored / updated version of `stable-audio-tools` which is an open-source code for audio/music generative models originally by Stability AI.
a-r-r-o-w/finetrainers
Memory-optimized training scripts for video models based on Diffusers
TheDenk/cogvideox-controlnet
Simple Controlnet module for CogvideoX model.
SerialLain3170/AwesomeAnimeResearch
Papers, repository and other data about anime or manga research. Please let me know if you have information that the list does not include.
aigc-apps/CogVideoX-Fun
📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
dailenson/One-DM
Official Code for ECCV 2024 paper — One-Shot Diffusion Mimicker for Handwritten Text Generation
zhenyuw16/GenArtist
Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"
baaivision/Emu3
Next-Token Prediction is All You Need
willisma/SiT
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
shiml20/FlowTurbo
Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"
IsaacGuan/3D-VAE
A variational autoencoder for volumetric shape generation
openai/simple-evals
Taited/clip-score
Quick scripts to calculate CLIP text-image similarity
LuChengTHU/dpm-solver
Official code for "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps" (Neurips 2022 Oral)
reppy4620/diffusion
My implementation of diffusion (like) models
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
discus0434/metrics-utils
EricGuo5513/momask-codes
Official implementation of "MoMask: Generative Masked Modeling of 3D Human Motions (CVPR2024)"
Vchitect/Vchitect-2.0
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).