ziwei-cui's Stars
labuladong/fucking-algorithm
刷算法全靠套路,认准 labuladong 就够了!English version supported! Crack LeetCode, not only how, but also why.
geekxh/hello-algorithm
🌍 针对小白的算法训练 | 包括四部分:①.大厂面经 ②.力扣图解 ③.千本开源电子书 ④.百张技术思维导图(项目花了上百小时,希望可以点 star 支持,🌹感谢~)推荐免费ChatGPT使用网站
facebookresearch/sam2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
jacobgil/pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
QwenLM/Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
facebookresearch/SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
LLaVA-VL/LLaVA-NeXT
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
yunlong10/Awesome-LLMs-for-Video-Understanding
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
WeThinkIn/Interview-for-Algorithm-Engineer
【三年面试五年模拟】AI算法工程师面试秘籍。涵盖AIGC、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、SLAM、具身智能、元宇宙、AGI等AI行业面试笔试经验与干货知识。
dvlab-research/LLaMA-VID
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
microsoft/XPretrain
Multi-modality pre-training
mbzuai-oryx/Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
google-deepmind/perception_test
aimansnigdha/Ambiguous-Medical-Image-Segmentation-using-Diffusion-Models
Accepted in CVPR 2023
hustvl/GaussianDreamerPro
GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality
TencentARC/ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
ruiwang2021/mvd
[CVPR2023] Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning (https://arxiv.org/abs/2212.04500)
Berry-Wu/Visualization
visualization:filter、feature map、attention map、image-mask、grad-cam、human keypoint、guided-backpro
wdndev/mllm_interview_note
主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识
TencentARC/mllm-npu
mllm-npu: training multimodal large language models on Ascend NPUs
Yxxxb/VoCo-LLaMA
VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
Ziyang412/VideoTree
Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
hustvl/LKCell
[arXiv '24] Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels
cjf8899/FeatureMap_Visualize_Pytorch
:eyes:Feature-map visualized, Implementation in Pytorch
shyoulala/LMSYS_BlackPearl
liu673/LeetCode_Interviewer
LeetCode Interviewer:该项目汇集并解析LeetCode上与面试紧密相关的经典算法题目,帮助有需要的同学高效备战技术面试。
miteshkothari259/Video-Highlights-Generator
Video Highlight generation using short time analysis and keyframe algorithm