ethanyhzhang

Pinned Repositories

3d-ken-burns
an implementation of 3D Ken Burns Effect from a Single Image using PyTorch
Language:Python00
3DUnetCNN
Keras 3D U-Net Convolution Neural Network (CNN) designed for medical image segmentation
Language:Python0 1 00
6d-object-pose-estimation
This repository summarizes papers and codes for 6D Object Pose Estimation.
0 1 00
A-Light-and-Fast-Face-Detector-for-Edge-Devices
A light and fast one class detection framework for edge devices. We provide face detector, head detector, pedestrian detector, vehicle detector......
Language:Python00
Abnormal-Behavior-Detection-Based-On-Optical-Flow-Features
CSC2515-University of Toronto. This project applied computer vision and mechine learning methods aimed to detect abnormal behaved object in crowd, by Hanwen Liang and Haohan Li.
Language:Python0 1 00
abnormal-spatiotemporal-ae
Codes for "Abnormal Event Detection in Videos using Spatiotemporal Autoencoder".
Language:Python0 1 00
Abnormal_Event_Detection
Abnormal Event Detection in Videos using SpatioTemporal AutoEncoder
Language:Python0 1 00
AbnormarCrowdDetection
Abnormal Crowd Detection Implementation with Python
Language:Python0 1 00
AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Language:Python00
AudioClassification-Pytorch
The Pytorch implementation of sound classification supports EcapaTdnn, PANNS, TDNN, Res2Net, ResNetSE and other models, as well as a variety of preprocessing methods.
Language:Python1 0 00

ethanyhzhang's Repositories

ethanyhzhang/FinGLM
ethanyhzhang/NaturalSpeech2
ethanyhzhang/DAIL-SQL
A efficient and effective few-shot NL2SQL method on GPT-4.
ethanyhzhang/AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
ethanyhzhang/Pix2Text
Pix In, Latex & Text Out. Recognize Chinese, English Texts, and Math Formulas from Images.
ethanyhzhang/Pointnet_Pointnet2_pytorch
PointNet and PointNet++ implemented by pytorch (pure python) and on ModelNet, ShapeNet and S3DIS.
ethanyhzhang/StyleTTS
Official Implementation of StyleTTS
ethanyhzhang/DecryptPrompt
总结Prompt&LLM论文，开源数据&模型，AIGC应用
ethanyhzhang/audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
ethanyhzhang/LMFlow
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Model for All.
ethanyhzhang/LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
ethanyhzhang/Font_Recognition-DeepFont
Its a implementation of DeepFont : Identify Your Font from An Image using Keras
ethanyhzhang/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
ethanyhzhang/bark
🔊 Text-prompted Generative Audio Model
ethanyhzhang/AutoX
AutoX is an efficient automl tool, which is mainly aimed at data mining tasks with tabular data.
ethanyhzhang/actionformer_release
Code release for ActionFormer (ECCV 2022)
ethanyhzhang/AMR-Benchmark
A Unified Implementation of Several Baseline Deep Learning Models for Automatic Modulation Recognition
ethanyhzhang/SpectralCluster
Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.
ethanyhzhang/asr
沪语（上海话）ASR（语音识别）模型
ethanyhzhang/wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit
ethanyhzhang/FastASR
基于PaddleSpeech所使用的conformer模型，使用C++的高效实现模型推理，在树莓派4B等ARM平台运行也可流畅运行。
ethanyhzhang/HowToLiveLonger
程序员延寿指南 | A programmer's guide to live longer
ethanyhzhang/kws
An End-to-End Architecture for Keyword Spotting and Voice Activity Detection
ethanyhzhang/ctc_decoder
A ctc decoder for both online and offline asr model
ethanyhzhang/simple_ddp_test
toy code for ddp test
Language:Python
ethanyhzhang/wise-ft
Robust fine-tuning of zero-shot models
ethanyhzhang/phkit
phoneme toolkit. 好用的音素处理工具箱，包含中文音素、英文音素、文本转拼音、文本正则化等模块。
ethanyhzhang/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector, Language Classifier and Spoken Number Detector
ethanyhzhang/Chinese-BERT-wwm
Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）
ethanyhzhang/ECAPA-TDNN
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)