Pinned Repositories
Automatic_Speech_Recognition
End-to-end automatic speech recognition from scratch in Tensorflow(从头实现一个端对端的自动语音识别系统).
AutomaticSpeechChunker
From a large speech audio file and its corresponding body of text, automatically chunk the audio and text into (phrase, audio_snippet) pairs. For use with the Connectionist Temporal Classification (CTC) cost algorithm.
cocktailparty
Multi-Modal Multi-Channel System and Corpus For Cocktail Party Problem
CTC-Connectionist-Temporal-Classification
Theano implementation of CTC.
deep-limits
Repo for a paper about constructing priors on very deep models.
Deep-Speech
Deep Learning for Speech Recogntion based on Theano
jihanki
Jihanki is a lightweight WFST toolkit implemented entirely in python
Speech_Recognition
A simple speech recognition using HMM (python)
tfkaldi
Speech recognition software where the neural net is trained with TensorFlow and GMM training and decoding is done in Kaldi
timit_tools
tools around preparing TIMIT for HMM (with HTK) and deep learning (with Theano) methods
ZhangAustin's Repositories
ZhangAustin/cocktailparty
Multi-Modal Multi-Channel System and Corpus For Cocktail Party Problem
ZhangAustin/algo
Set up a personal VPN in the cloud
ZhangAustin/ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
ZhangAustin/asteroid
The PyTorch-based audio source separation toolkit for researchers || Pretrained models available
ZhangAustin/AugLy
A data augmentations library for audio, image, text, and video.
ZhangAustin/BigCiDian
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
ZhangAustin/chinese_text_normalization
Chinese text normalization for speech processing
ZhangAustin/edex-ui
A cross-platform, customizable science fiction terminal emulator with advanced monitoring & touchscreen support.
ZhangAustin/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
ZhangAustin/g1
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
ZhangAustin/GiantMIDI-Piano
ZhangAustin/google-research
Google Research
ZhangAustin/Halide
a language for fast, portable data-parallel computation
ZhangAustin/Hey-Jetson
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
ZhangAustin/jetson-inference
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
ZhangAustin/lhotse
ZhangAustin/LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
ZhangAustin/local-llms-analyse-finance
In this project, I explored how local LLMs can be used to label data and support analyses. Specifically, I used Llama2 model to automatically categorise my bank transaction data.
ZhangAustin/Megatron-LLM
distributed trainer for LLMs
ZhangAustin/MMdnn
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
ZhangAustin/netron
Visualizer for deep learning and machine learning models
ZhangAustin/Nuklear
A single-header ANSI C immediate mode cross-platform GUI library
ZhangAustin/odas
ODAS: Open embeddeD Audition System
ZhangAustin/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding
ZhangAustin/pyctcdecode
A fast and lightweight python-based CTC beam search decoder for speech recognition.
ZhangAustin/pyflow
Fast, accurate and easy to run dense optical flow with python wrapper
ZhangAustin/pytorch-struct
Fast, general, and tested differentiable structured prediction in PyTorch
ZhangAustin/rendezvous
Next generation videoconference system
ZhangAustin/rnnt_decoder_cuda
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
ZhangAustin/vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node