ZhangAustin

Cambridge University --> Microsoft --> Tencent AI Lab

MicrosoftRedmond

Pinned Repositories

Automatic_Speech_Recognition
End-to-end automatic speech recognition from scratch in Tensorflow(从头实现一个端对端的自动语音识别系统).
Language:Python5 2 00
AutomaticSpeechChunker
From a large speech audio file and its corresponding body of text, automatically chunk the audio and text into (phrase, audio_snippet) pairs. For use with the Connectionist Temporal Classification (CTC) cost algorithm.
Language:Python6 3 00
cocktailparty
Multi-Modal Multi-Channel System and Corpus For Cocktail Party Problem
2 3 00
CTC-Connectionist-Temporal-Classification
Theano implementation of CTC.
Language:Python1 2 00
deep-limits
Repo for a paper about constructing priors on very deep models.
Language:TeX1 2 00
Deep-Speech
Deep Learning for Speech Recogntion based on Theano
Language:Python15 4 35
jihanki
Jihanki is a lightweight WFST toolkit implemented entirely in python
Language:Python6 3 00
Speech_Recognition
A simple speech recognition using HMM (python)
Language:Python4 2 00
tfkaldi
Speech recognition software where the neural net is trained with TensorFlow and GMM training and decoding is done in Kaldi
Language:Python2 2 00
timit_tools
tools around preparing TIMIT for HMM (with HTK) and deep learning (with Theano) methods
Language:Python2 2 00

ZhangAustin's Repositories

ZhangAustin/cocktailparty
Multi-Modal Multi-Channel System and Corpus For Cocktail Party Problem
2 3 00
ZhangAustin/algo
Set up a personal VPN in the cloud
Language:Python1 0
ZhangAustin/ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Language:Python1 0
ZhangAustin/asteroid
The PyTorch-based audio source separation toolkit for researchers || Pretrained models available
Language:Python1 0
ZhangAustin/AugLy
A data augmentations library for audio, image, text, and video.
Language:Python1 0
ZhangAustin/BigCiDian
Pronunciation lexicon covering both English and Chinese languages for Automatic Speech Recognition.
Language:Python1 0
ZhangAustin/chinese_text_normalization
Chinese text normalization for speech processing
Language:Python1 0
ZhangAustin/edex-ui
A cross-platform, customizable science fiction terminal emulator with advanced monitoring & touchscreen support.
Language:JavaScript1 0
ZhangAustin/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Language:Python1 0
ZhangAustin/g1
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
Language:Python0 0
ZhangAustin/GiantMIDI-Piano
Language:Python1 0
ZhangAustin/google-research
Google Research
Language:Jupyter Notebook1 0
ZhangAustin/Halide
a language for fast, portable data-parallel computation
Language:C++2 0
ZhangAustin/Hey-Jetson
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
Language:Jupyter Notebook1 0
ZhangAustin/jetson-inference
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
Language:C++1 0
ZhangAustin/lhotse
Language:Python1 0
ZhangAustin/LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
ZhangAustin/local-llms-analyse-finance
In this project, I explored how local LLMs can be used to label data and support analyses. Specifically, I used Llama2 model to automatically categorise my bank transaction data.
Language:Jupyter Notebook0 0
ZhangAustin/Megatron-LLM
distributed trainer for LLMs
Language:Python0 0
ZhangAustin/MMdnn
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
Language:Python1 0
ZhangAustin/netron
Visualizer for deep learning and machine learning models
Language:JavaScript1 0
ZhangAustin/Nuklear
A single-header ANSI C immediate mode cross-platform GUI library
Language:C1 0
ZhangAustin/odas
ODAS: Open embeddeD Audition System
Language:C1 0
ZhangAustin/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding
Language:Python1 0
ZhangAustin/pyctcdecode
A fast and lightweight python-based CTC beam search decoder for speech recognition.
Language:Python0 0
ZhangAustin/pyflow
Fast, accurate and easy to run dense optical flow with python wrapper
Language:C++1 0
ZhangAustin/pytorch-struct
Fast, general, and tested differentiable structured prediction in PyTorch
Language:Jupyter Notebook2 0
ZhangAustin/rendezvous
Next generation videoconference system
Language:C1 0
ZhangAustin/rnnt_decoder_cuda
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
Language:Cuda1 0
ZhangAustin/vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Language:C++1 0