Speech and Natural Language Processing

This page shows some of the open source projects, toolkits and websites which are typically useful for researches or applications in speech and language processing.

Other lists for SNLP: [List1-Github-SNLP] [List2-Zhihu-SNLP] [List3-Github-Speech]

Jump to: [Speech] [NLP] [Machine Learning and Neural Net] [Research] [Courses]

Speech related:

CMUSphinx [link]
OPEN SOURCE SPEECH RECOGNITION TOOLKIT
Festival [link] [documentation]
Speech Synthesis System by CSTR
Speech-Corpus-Collection [link]
Kaldi ASR [link]
Praat [link]
LibriSpeech ASR corpus [link]
Large-scale (1000 hours) corpus of read English speech
Common voice [Github] [website]
Common Voice is Mozilla's initiative to help teach machines how real people speak.
https://github.com/mozilla/voice-web/tree/master/server/data/zh-HK
TED-LIUM Release 3 [link]
452 hours of audio
VoxForge [link]
VoxForge was set up to collect transcribed speech for use in Open Source Speech Recognition Engines ("SRE"s) such as such as ISIP, HTK, Julius and Sphinx.
Tatoeba [link]
Tatoeba is a collection of sentences and translations.
EMIME Project [link]
CMU_ARCTIC speech synthesis databases [link]
The World English Bible [link]
Nancy Corpus [link]
Google uis-rnn [Github] [paper]
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
A simple interface for the CMU pronouncing dictionary [link]
E-Guide dog [link]
PyTorch implementation of GAN-based text-to-speech synthesis and voice conversion (VC) [link]
Something useful for speech and natural language processing [link]
Something useful for speech and natural language processing
Saarbruecken Voice Database [link]
A British National Corpus Spoken Audio Sampler [link]
This site presents a selection of audio files from the spoken part of the British National Corpus, digitized from the analogue audio cassette tapes deposited at the British Library Sound Archive, together with associated transcription and annotation files created during the Mining a Year of Speech project.
pronouncingpy [link]
A simple interface for the CMU pronouncing dictionary
Sound & MIDI Software For Linux [link]
Linguistic Variation in Chinese Speech Communities 泛華語地區漢語共時語料庫 [link]
Translatotron [link]
Translatotron: An End-to-End Speech-to-Speech Translation Model
Pyworld [link]
A Python wrapper for the high-quality vocoder "World"
NAME [link]
INFO

NLP related:

BBC news corpus [link]
GEO query database [link]
FreeBase for QA [link]
Google Bert [link]
TensorFlow code and pre-trained models for BERT https://arxiv.org/abs/1810.04805
Google Dialogflow(previously api.ai) [link]
DouBan DuShu [link]
DouBan DuShu is a Chinese website where users can share their reviews about various kinds of books.
Chinese-Forum-Corpus [link]
Chinese-Forum-Corpus is a corpus of informal Chinese text
CLAMP [link]
Clinical Language Annotation, Modeling, and Processing Toolkit
Bytecup2018 [link]
Bytecup Dataset
Facebook fastText [link]
fastText is a library for efficient learning of word representations and sentence classification.
開放中文轉換（Pure Python） [link]
Open Chinese convert (OpenCC) in pure Python.
pkuseg-python [link]
pkuseg 是由北京大学语言计算与机器学习研究组研制推出的一套全新的中文分词工具包。
OpenAI GPT-2 [link]
Code for the paper "Language Models are Unsupervised Multitask Learners"
XLNET [link]
XLNet: Generalized Autoregressive Pretraining for Language Understanding

Machine Learning / Neural Network related:

An MIT Press book by Ian Goodfellow and Yoshua Bengio and Aaron Courville [link]
Deepmind trfl [link]
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms [link]
Neural Networks and Deep Learning, a free online book [link]
TensorSpace.js [link]
A neural network 3D visualization framework built by TensorFlow.js, Three.js and Tween.js
Deep Reinforcement Learning for Keras [link]
keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seamlessly integrates with the deep learning library Keras.
《Attention is All You Need》中的Attention机制的实现 [link]
pytorch-beginner [link]
Toy code for pytorch beginner

Research paper:

One Model To Learn Them All [link]
#2017 #MultiTasking
Attention Is All You Need [link]
#2017 #NeuralNetwork #Attention
Low-Resource Speech-to-Text Translation [link]
speech-to-speech translation [link]
Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents [link]
Multimodal Machine Translation with Reinforcement Learning [link]
Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning [link]
Self-managed Speech Therapy [link]
#2018 #SpeechTherapy
Voice-based determination of physical and emotional characteristics of users [link]
Systems, methods, and computer-readable media are disclosed for voice-based determination of physical and emotional characteristics of users. Example methods may include determining first voice data, wherein the first voice data is generated by a user, determining a first real-time user status of the user using the first voice data, generating a first data tag indicative of the first real-time user status, determining first audio content for presentation at a speaker device using the first data tag and the first voice data, and causing presentation of the first audio content via a speaker of the speaker device.
香港成人粵語口語語料庫 [link]
INFO
粵語研究新資源：《香港二十世紀中期粵語語料庫》 [link]
mentioned (1) 香港兒童粵語語料庫（Hong Kong Cantonese Child Language Corpus - CANCORP）（Lee and Wong 1998）(2) 香港雙語兒童語料庫（Yip and Matthews 2007）(3) 香港粵語語料庫（Hong Kong University Cantonese Corpus）（Wong 2006）(4) The Hong Kong Cantonese Adult Corpus（Leung and Law 2001）
Large-Scale Study of Curiosity-Driven Learning [link]
Curiosity-Driven RL
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient [link]
#2017
Recent Trends in Deep Learning Based Natural Language Processing [link]
#2018
SEQUENCE-TO-SEQUENCE ASR OPTIMIZATION VIA REINFORCEMENT LEARNING [link]
Listen, Attend and Spell [link]
Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters.
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models [link]
Attention-based encoder-decoder architectures such as Listen, Attend, and Spell (LAS), subsume the acoustic, pronunciation and language model components of a traditional automatic speech recognition (ASR) system into a single neural network. On a 12, 500 hour voice search task, we find that the proposed changes improve the WER from 9.2% to 5.6%, while the best conventional system achieves 6.7%; on a dictation task our model achieves a WER of 4.1% compared to 5% for the conventional system.
PyText [link]
Open-sourcing PyText for faster NLP development
NAME [link]
INFO

Courses

Oxford Deep NLP 2017 course [link]
Deepmind UCL Deep RL [link]
Steps by steps - learn Computer Science and Artificial Intelligence [link]
AI For Everyone - Coursera [link]
Neural Networks and Deep Learning [link]
Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you many of the core concepts behind neural networks and deep learning.
Introduction to Digital Speech Processing 數位語音處理概論 - 李琳山教授 [link]
本課程專為大學部同學所開授。所需要的最主要基礎能力是數學模型(機率、線性代數)及軟體程式，所有難題由數學模型分析，並由程式求解；其中大部份核心觀念均與機器學習(Machine Learning)
Merlin:中文统计参数语音合成实战 [link]
本文目标是详细解释如何基于开源Merlin项目搭建中文统计参数语音合成系统，但笔者目前尚未实现中文语音合成，本文记录了笔者的进展并且会持续更新直到实现中文语音合成为止。
Reinforcement Learning: An Introduction [pdf-link] [main-link]
Second Edition, in progress - MIT Press, Cambridge, MA, 2017
NAME [link]
INFO

kennethli319/SNLP-resources

Speech and Natural Language Processing

Speech related:

NLP related:

Machine Learning / Neural Network related:

Research paper:

Courses