XDUwt

XDUwt's Stars

xai-org/grok-1
Grok open release
Language:Python49.8k 592 2148.3k
facebookresearch/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
Language:Jupyter Notebook11.1k 144 3701.1k
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python6.7k 57 720521
HRNet/HRNet-Semantic-Segmentation
The OCR approach is rephrased as Segmentation Transformer: https://arxiv.org/abs/1909.11065. This is an official implementation of semantic segmentation for HRNet. https://arxiv.org/abs/1908.07919
Language:Python3.2k 57 266693
OpenGVLab/InternImage
[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Language:Python2.6k 35 268240
Kedreamix/Linly-Talker
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬
Language:Python2.2k 28 117370
lucidrains/reformer-pytorch
Reformer, the efficient Transformer, in Pytorch
Language:Python2.1k 54 121256
facebookresearch/ConvNeXt-V2
Code release for ConvNeXt V2 model
Language:Python1.6k 7 75121
chengdazhi/Deformable-Convolution-V2-PyTorch
Deformable ConvNets V2 (DCNv2) in PyTorch
Language:Cuda1.5k 22 88230
google/neuroglancer
WebGL-based viewer for volumetric data
Language:TypeScript1.1k 49 356300
PantoMatrix/PantoMatrix
PantoMatrix: Generating Face and Body Animation from Speech
Language:Python1k 54 178180
THU-MIG/RepViT
RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
Language:Jupyter Notebook864 10 8061
HuguesTHOMAS/KPConv
Kernel Point Convolutions
Language:Python722 22 169132
GistNoesis/FourierKAN
Language:Python716 8 759
zjp-shadow/CharacterGen
[SIGGRAPH'24] CharacterGen: Efficient 3D Character Generation from Single Images with Multi-View Pose Canonicalization
Language:JavaScript618 19 2749
shenyunhang/APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Language:Python501 8 6531
jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
393 21 425
Traffic-X/ViT-CoMer
Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.
Language:Python252 3 2418
lucidrains/flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
Language:Cuda210 12 1011
sming256/OpenTAD
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Language:Python204 5 4014
ViTAE-Transformer/MTP
The official repo for [JSTARS'24] "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining"
Language:Python185 3 2911
WHU-Sigma/HyperSIGMA
The official repo for the paper "HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model"
Language:Python175 4 1716
Linwei-Chen/FADC
CVPR 2024 Highlight: Frequency-Adaptive Dilated Convolution for Semantic Segmentation
Language:Python120 1 239
karttikeya/minREV
A simple minimal implementation of Reversible Vision Transformers
Language:Python117 3 108
MzeroMiko/vHeat
vHeat: Building Vision Models upon Heat Conduction
Language:Python102 3 66
wufeim/DST3D
Official implementation of "Generating images with 3D annotations using diffusion models".
Language:Python60 12 110
SMU-MedicalVision/ECG-MoCo-Classfication
Practical cardiac events intelligent diagnostic algorithm for wearable 12-lead ECG via self-supervised learning on large-scale dataset
Language:Python21 2 16
IIGROUP/SCL
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
Language:Python20 1 22
coolbay/Dr2Net
Source code of the CVPR24 work Dr2Net
5 1 10
TangXu-Group/FDLdet
FDLdet: A Change Detector Based on Forward Dictionary Learning for Remote Sensing Images
Language:Python5 2 01