lawrence-cj
Research Intern @ NVIDIA Research. Research Assistant @ HKU. Ph.D. Candidate @ DLUT.
Dalian University of TechnologyBeijing, China
lawrence-cj's Stars
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Fanghua-Yu/SUPIR
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
FoundationVision/VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
XPixelGroup/DiffBIR
Official codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Alpha-VLLM/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
Alpha-VLLM/Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
YangLing0818/RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
PixArt-alpha/PixArt-sigma
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
dvlab-research/ControlNeXt
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
aigc-apps/EasyAnimate
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Vchitect/LaVie
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
zhijian-liu/torchprofile
A general and accurate MACs / FLOPs profiler for PyTorch models
NVlabs/edm2
Analyzing and Improving the Training Dynamics of Diffusion Models (EDM2)
tianweiy/DMD2
cloneofsimo/minRF
Minimal implementation of scalable rectified flow transformers, based on SD3's approach
bojone/papers.cool
Cool Papers - Immersive Paper Discovery
HaozheLiu-ST/T-GATE
T-GATE: Temporally Gating Attention to Accelerate Diffusion Model for Free!
IceClear/CLIP-IQA
[AAAI 2023] Exploring CLIP for Assessing the Look and Feel of Images
OpenGVLab/ControlLLM
ControlLLM: Augment Language Models with Tools by Searching on Graphs
daooshee/HD-VG-130M
The HD-VG-130M Dataset
djghosh13/geneval
GenEval: An object-focused framework for evaluating text-to-image alignment
sayakpaul/cmmd-pytorch
PyTorch implementation of CLIP Maximum Mean Discrepancy (CMMD) for evaluating image generation models.
sayakpaul/single-video-curation-svd
Educational repository for applying the main video data curation techniques presented in the Stable Video Diffusion paper.