XDUwt's Stars
xai-org/grok-1
Grok open release
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
HRNet/HRNet-Semantic-Segmentation
The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919
OpenGVLab/InternImage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Kedreamix/Linly-Talker
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬
lucidrains/reformer-pytorch
Reformer, the efficient Transformer, in Pytorch
facebookresearch/ConvNeXt-V2
Code release for ConvNeXt V2 model
chengdazhi/Deformable-Convolution-V2-PyTorch
Deformable ConvNets V2 (DCNv2) in PyTorch
google/neuroglancer
WebGL-based viewer for volumetric data
PantoMatrix/PantoMatrix
PantoMatrix: Generating Face and Body Animation from Speech
THU-MIG/RepViT
RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
HuguesTHOMAS/KPConv
Kernel Point Convolutions
GistNoesis/FourierKAN
zjp-shadow/CharacterGen
[SIGGRAPH'24] CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization
shenyunhang/APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
Traffic-X/ViT-CoMer
Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.
lucidrains/flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
sming256/OpenTAD
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
ViTAE-Transformer/MTP
The official repo for [JSTARS'24] "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining"
WHU-Sigma/HyperSIGMA
The official repo for the paper "HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model"
Linwei-Chen/FADC
CVPR 2024 Highlight: Frequency-Adaptive Dilated Convolution for Semantic Segmentation
karttikeya/minREV
A simple minimal implementation of Reversible Vision Transformers
MzeroMiko/vHeat
vHeat: Building Vision Models upon Heat Conduction
wufeim/DST3D
Official implementation of "Generating images with 3D annotations using diffusion models".
SMU-MedicalVision/ECG-MoCo-Classfication
Practical cardiac events intelligent diagnostic algorithm for wearable 12-lead ECG via self-supervised learning on large-scale dataset
IIGROUP/SCL
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
coolbay/Dr2Net
Source code of the CVPR24 work Dr2Net
TangXu-Group/FDLdet
FDLdet: A Change Detector Based on Forward Dictionary Learning for Remote Sensing Images