Pinned Repositories
M2PT
[CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
bidiff
[CVPR'24] Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
BiDiff.github.io
GeMap
Online Vectorized HD Map Construction using Geometry
OneLLM
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
InteractiveVideo
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
MetaTransformer
Meta-Transformer for Unified Multimodal Learning
PointLanguage
UniDG
Towards Unified and Effective Domain Generalization
invictus717's Repositories
invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning
invictus717/InteractiveVideo
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
invictus717/UniDG
Towards Unified and Effective Domain Generalization
invictus717/PointLanguage