JasonZhang156

Speech Recognition

Shanghai UniversityShang Hai

JasonZhang156's Stars

openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python70.5k 574 08.3k
azl397985856/leetcode
LeetCode Solutions: A Record of My Problem Solving Journey.( leetcode题解，记录自己的leetcode解题之路。)
Language:JavaScript54.7k 1.3k 2559.5k
huggingface/pytorch-image-models
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
Language:Python32.1k 313 9214.8k
Chanzhaoyu/chatgpt-web
用 Express 和 Vue3 搭建的 ChatGPT 演示网页
Language:Vue31.5k 206 1.7k11.2k
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language:Python20.1k 309 1.4k2.5k
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。
15.7k 203 261.5k
kaldi-asr/kaldi
kaldi-asr/kaldi is the official location of the Kaldi project.
Language:Shell14.2k 693 1.6k5.3k
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python14k 117 1.1k1.3k
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
12.4k 271 117793
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language:Python12.2k 123 7161k
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python12k 206 2.3k2.5k
google-research/vision_transformer
Language:Jupyter Notebook10.3k 106 2071.3k
google/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
Language:C++10.2k 127 7491.2k
TheR1D/shell_gpt
A command-line productivity tool powered by AI large language models like GPT-4, will help you accomplish your tasks faster and more efficiently.
Language:Python9.6k 90 322760
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python8.6k 76 547609
espnet/espnet
End-to-End Speech Processing Toolkit
Language:Python8.4k 180 2.4k2.2k
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python5.9k 52 593459
ChanChiChoi/awesome-Face_Recognition
papers about Face Detection; Face Alignment; Face Recognition && Face Identification && Face Verification && Face Representation; Face Reconstruction; Face Tracking; Face Super-Resolution && Face Deblurring; Face Generation && Face Synthesis; Face Transfer; Face Anti-Spoofing; Face Retrieval;
4.5k 208 7962
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language:Python4.5k 35 334462
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
Language:Python4.1k 41 394293
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Language:Python3.7k 31 256338
onnx/onnx-tensorrt
ONNX-TensorRT: TensorRT backend for ONNX
Language:C++2.9k 68 664544
LLaVA-VL/LLaVA-NeXT
Language:Python2.8k 32 287222
wolfcw/libfaketime
libfaketime modifies the system time for a single application
Language:C2.7k 62 341326
kpu/kenlm
KenLM: Faster and Smaller Language Model Queries
Language:C++2.5k 70 370512
jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
2.5k 120 8214
gengyanlei/fire-smoke-detect-yolov4
fire-smoke-detect-yolov4-yolov5 and fire-smoke-detection-dataset 火灾检测，烟雾检测
Language:Jupyter Notebook1.3k 17 53299
christophschuhmann/improved-aesthetic-predictor
CLIP+MLP Aesthetic Score Predictor
Language:Python896 6 1088
LAION-AI/CLIP_benchmark
CLIP-like model evaluation
Language:Jupyter Notebook605 12 6478
Beckschen/ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
Language:Python165 6 115

JasonZhang156

JasonZhang156's Stars

openai/whisper

azl397985856/leetcode

huggingface/pytorch-image-models

Chanzhaoyu/chatgpt-web

microsoft/unilm

HqWu-HITCS/Awesome-Chinese-LLM

kaldi-asr/kaldi

Dao-AILab/flash-attention

BradyFU/Awesome-Multimodal-Large-Language-Models

SYSTRAN/faster-whisper

NVIDIA/NeMo

google-research/vision_transformer

google/sentencepiece

TheR1D/shell_gpt

facebookresearch/xformers

espnet/espnet

OpenGVLab/InternVL

ChanChiChoi/awesome-Face_Recognition

OFA-Sys/Chinese-CLIP

baichuan-inc/Baichuan2

rom1504/img2dataset

onnx/onnx-tensorrt

LLaVA-VL/LLaVA-NeXT

wolfcw/libfaketime

kpu/kenlm

jingyi0000/VLM_survey

gengyanlei/fire-smoke-detect-yolov4

christophschuhmann/improved-aesthetic-predictor

LAION-AI/CLIP_benchmark

Beckschen/ViTamin