Einstone-rose
Ph.D & BIGAI@BIT || Research Intern@Ant Group || VQA, VideoLLM, 3D Understanding
Beijing Institute of TechnologyBeijing
Einstone-rose's Stars
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
gradio-app/gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
meta-llama/llama3
The official Meta Llama 3 GitHub site
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
fengdu78/lihang-code
《统计学习方法》的代码实现
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
dragen1860/Deep-Learning-with-TensorFlow-book
深度学习入门开源书,基于TensorFlow 2.0案例实战。Open source Deep Learning book, based on TensorFlow 2.0 framework.
facebookresearch/SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
zhaoxin94/awesome-domain-adaptation
A collection of AWESOME things about domian adaptation
dk-liang/Awesome-Visual-Transformer
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
pengzhiliang/MAE-pytorch
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners
luca-medeiros/lang-segment-anything
SAM with text prompt
Eurus-Holmes/Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
ttengwang/Awesome_Long_Form_Video_Understanding
Awesome papers & datasets specifically focused on long-term videos.
BeierZhu/Prompt-align
[ICCV 2023] Prompt-aligned Gradient for Prompt Tuning
snap-research/discoscene
CVPR 2023 Highlight: DiscoScene
seba-1511/lstms.pth
PyTorch implementations of LSTM Variants (Dropout + Layer Norm)
Yimin-Liu/Awesome-Unsupervised-Person-Re-identification
Awesome-Unsupervised-Person-Re-identification
rentainhe/TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
kkahatapitiya/Coarse-Fine-Networks
Code for our CVPR 2021 paper "Coarse-Fine Networks for Temporal Activity Detection in Videos"
luogen1996/SimREC
A lightweight codebase for referring expression comprehension and segmentation
PhoebusSi/VQA-VS
Code for our EMNLP-2022 paper: "Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA"
scottyih/Slides
Huntersxsx/TSGV-Learning-List
Temporal Sentence Grounding in Videos / Natural Language Video Localization / Video Moment Retrieval的相关工作
ttharden/Keyframe-Extraction-for-video-summarization
kophy/py4db
Python with SQLite/MySQL/LMDB/LevelDB.
Trunpm/PMT-AAAI23
Efficient End-to-End Video-Question Answering with Pyramidal Multimodal Transformer - AAAI23
Einstone-rose/Awesome-TSGV
Temporal Sentence Grounding in Videos / Natural Language Video Localization / Video Moment Retrieval的相关工作