pixeli99's Stars
academicpages/academicpages.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
THUDM/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
kohya-ss/sd-scripts
fundamentalvision/BEVFormer
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
facebookresearch/ijepa
Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."
mistralai/mistral-finetune
apple/ml-4m
4M: Massively Multimodal Masked Modeling
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
megvii-research/PETR
[ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection & [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
MyNiuuu/MOFA-Video
[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
AIGText/Glyph-ByT5
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""
LPengYang/MotionClone
Official implementation of MotionClone: Training-Free Motion Cloning for Controllable Video Generation
exx8/differential-diffusion
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
OpenDriveLab/OpenScene
3D Occupancy Prediction Benchmark in Autonomous Driving
GigaAI-research/General-World-Models-Survey
genforce/ctrl-x
Official implementation of "Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance" (NeurIPS 2024)
YangLing0818/VideoTetris
[NeurIPS 2024] VideoTetris: Towards Compositional Text-To-Video Generation
RockeyCoss/SPO
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step
sterzhang/image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
Bin-ze/BEVFormer_segmentation_detection
Implemented BEVFormer support for BEV segmentation
OpenDriveLab/MPI
[RSS 2024] Learning Manipulation by Predicting Interaction
OpenRobotLab/Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
BraveGroup/LAW
Enhancing End-to-End Autonomous Driving with Latent World Model
sramshetty/ShortGPT
Unofficial implementations of block/layer-wise pruning methods for LLMs.
buxiangzhiren/VD-IT
pixeli99/OwLore
Official Pytorch Implementation of "OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning" by Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu
weihaox/UMBRAE
[ECCV 2024] UMBRAE: Unified Multimodal Brain Decoding | Unveiling the 'Dark Side' of Brain Modality
sooyeon-go/eye_for_an_eye
Eye-for-an-eye: Appearance Transfer with Semantic Correspondence in Diffusion Models
pixeli99/W-CODA2024-Track2
This repository is dedicated to Track 2 of the W-CODA 2024 Workshop, "Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving," held at ECCV 2024.