pixeli99

🤖

DUT IIAUDalian, China

pixeli99's Stars

academicpages/academicpages.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Language:JavaScript12.4k 92 36743.9k
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Language:Python5.3k 34 564434
kohya-ss/sd-scripts
Language:Python5.3k 54 1.1k876
fundamentalvision/BEVFormer
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
Language:Python3.4k 72 268546
facebookresearch/ijepa
Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."
Language:Python2.8k 59 58356
mistralai/mistral-finetune
Language:Python2.7k 37 39228
apple/ml-4m
4M: Massively Multimodal Masked Modeling
Language:Python1.6k 33 2495
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Language:Python1.3k 22 6056
megvii-research/PETR
[ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection & [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
Language:Python871 13 162131
MyNiuuu/MOFA-Video
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
Language:Python627 23 4836
AIGText/Glyph-ByT5
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""
Language:Jupyter Notebook510 18 1722
LPengYang/MotionClone
Official implementation of MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Language:Python402 18 2031
exx8/differential-diffusion
Language:Python396 10 3023
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Language:HTML350 13 420
OpenDriveLab/OpenScene
3D Occupancy Prediction Benchmark in Autonomous Driving
Language:Python310 10 921
GigaAI-research/General-World-Models-Survey
291 11 014
genforce/ctrl-x
Official implementation of "Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance" (NeurIPS 2024)
Language:Python252 22 79
YangLing0818/VideoTetris
[NeurIPS 2024] VideoTetris: Towards Compositional Text-To-Video Generation
Language:Python206 19 76
RockeyCoss/SPO
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
Language:Python157 7 213
sterzhang/image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
Language:Python144 4 58
Bin-ze/BEVFormer_segmentation_detection
Implemented BEVFormer support for BEV segmentation
Language:Python104 1 339
OpenDriveLab/MPI
[RSS 2024] Learning Manipulation by Predicting Interaction
Language:Python90 3 31
OpenRobotLab/Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
Language:Python89 6 82
BraveGroup/LAW
Enhancing End-to-End Autonomous Driving with Latent World Model
86 9 82
sramshetty/ShortGPT
Unofficial implementations of block/layer-wise pruning methods for LLMs.
Language:Jupyter Notebook51 2 67
buxiangzhiren/VD-IT
Language:Python32 2 71
pixeli99/OwLore
Official Pytorch Implementation of "OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning" by Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu
Language:Python27 2 36
weihaox/UMBRAE
[ECCV 2024] UMBRAE: Unified Multimodal Brain Decoding | Unveiling the 'Dark Side' of Brain Modality
Language:Jupyter Notebook26 4 72
sooyeon-go/eye_for_an_eye
Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models
Language:Jupyter Notebook24 6 11
pixeli99/W-CODA2024-Track2
This repository is dedicated to Track 2 of the W-CODA 2024 Workshop, "Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving," held at ECCV 2024.
Language:Python7 3 10

pixeli99

pixeli99's Stars

academicpages/academicpages.github.io

THUDM/GLM-4

kohya-ss/sd-scripts

fundamentalvision/BEVFormer

facebookresearch/ijepa

mistralai/mistral-finetune

apple/ml-4m

FoundationVision/LlamaGen

megvii-research/PETR

MyNiuuu/MOFA-Video

AIGText/Glyph-ByT5

LPengYang/MotionClone

exx8/differential-diffusion

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

OpenDriveLab/OpenScene

GigaAI-research/General-World-Models-Survey

genforce/ctrl-x

YangLing0818/VideoTetris

RockeyCoss/SPO

sterzhang/image-textualization

Bin-ze/BEVFormer_segmentation_detection

OpenDriveLab/MPI

OpenRobotLab/Grounded_3D-LLM

BraveGroup/LAW

sramshetty/ShortGPT

buxiangzhiren/VD-IT

pixeli99/OwLore

weihaox/UMBRAE

sooyeon-go/eye_for_an_eye

pixeli99/W-CODA2024-Track2