Suggestions for students.
Audio and acoustics students sometimes ask "How do I get started learning machine learning?" Not everyone gets their start in a major research environment.
This page began after @drscotthawley felt sufficiently embarassed about not having a coherent answer. Until someone creates a "ML for Audio" online course -- update 1/7/20: See Valerio Velardo's "Deep Learning for Audio"! -- this page may prove helpful.
Notes:
- This is a collaborative page. Please suggest additions, re-organizations, edits, updates, etc., either via Issues or Pull Requests. (In addition, @drscotthawley may gladly cede control of this content to whichever student or group wants to Wiki-fy it!)
"Read all the tutorials and papers you can, watch videos of all the talks you can, try out and modify whatever code you can get your hands on, take whatever courses you can find, go to whatever conferences you can. Try to build your own system, and spend all your nights and weekends improving it."
This was the best advice some of us could give, because it was the path we took. Some such stories are shared below. This page is an attempt to offer something more "direct" for newcomers.
Many practicioners took very different interdisciplinary paths, learning from a hodgepodge of information, in order to complement their existing strengths and fill in gaps in their knowledge. Here are some stories.
(For submissions: Either link to elsewhere on the web, or add a file to the repo via PR. Try to make submissions conclude with a section on what you would say to new students.)
- How
__[someone]__
got started __[a young person]'s__
story- ...your name(s) here!...Chris Donahue, Christian Steinmetz, Jordi Pons, Keunwoo Choi, Faro, Justin Salomon,...?
Many of us learn about and contribue to news of new developments, papers, conferences, grants, and networking opportunities via Twitter.
- Audio ML Twitter list by Fabian-Robert Stöter (@faroit). <-- Follow these people!
- Justin Salomon: "Anyone working in ML, anyone, should be obliged to curate a dataset before they're allowed to train a single model. The lessons learnt in the process are invaluable, and the dangers of skipping said lessons are manifold (see what I did there?)"
- Machine Learning Glossary - A reference resource for common ML math topics, definitions, concepts, etc.
- Notes on Music Information Retreival
- Valerio Velardo's "Deep Learning for Audio"
- Andrew Ng's ML Course on Coursera (Good all-around ML course)
- Fast.ai (Can get you up and running fast)
- Rebecca Fiebrink's Machine Learning for Musicians and Artists on Kadenze (No math!)
- Neural Network Programming - Deep Learning with PyTorch. Learn how to code an image predictor neural network in Pytorch. Provides practical NN fundamentals
- Advanced Digital Signal Processing series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with videos and acommpanying Jupyter notebooks by Renato Profeta
- Foundations of Machine Learning taught by David Rosenberg
(I'm often underwhelmed with audio-specific tutorials, actually. No offense! Feel free to suggest some. Here are a couple on related topics that I've found inspiring)
- Andrew Trask's "Anyone Can Learn To Code an LSTM-RNN in Python"
- Machine Learning & Deep Learning Fundamentals (Good high level intro to ML concepts and how neural networks operate)
that we found helpful/inspiring (and are hopefully still relevant)
- Paris Smaragdis at SANE 2015: "NMF? Neural Nets? It’s all the same..."
- Ron Weiss at SANE 2015: "Training neural network acoustic models on waveforms"
- Jordi Pons at DLBCN 2018: "Training neural audio classifiers with few data"
- Sander Dieleman at ISMIR 2019: "Generating Music in the Waveform Domain"
(Let's try to list "representative" or "landmark" papers, not just our latest tweak, unless it includes a really good intro/review section. ;-) )
- Keunwoo Choi et al, "Automatic tagging using deep convolutional neural networks" (ISMIR 2016 Best Paper)
- SampleRNN
- WaveNet
- WaveRNN, i.e. "Efficient Neural Audio Synthesis"
- GANSynth
- Wave-U-Net
(Not sure if this only means "deployed models you can play with in your browser," or if other things should count as demos)
- Chris Donahue's WaveGAN Demo
- Scott Hawley's SignalTrain Demo
- Neil Zeghidour and David Grangier's Wavesplit
- David Samuel, Aditya Ganeshan, and Jason Naradowsky's Meta-TasNet
- awesome-python-scientific-audio Curated list of python software and packages related to scientific research in audio
- Librosa Great package for various kinds of audio analysis and manipulation
- Audiomentations, data augmentation for audio
- tf.signal: signal processing for TensorFlow
- fastai_audio (and fastai2_audio), audio libraries for Fast.ai library/MOOC. Primarily for image, text & tabular data processing, there are efforts to add audio. (Work in progress.)
- Jesse Engel's gist to plot "rainbowgrams"
- Neural Networks and Deep Learning online book. How drscotthawley first started reading.
Python:
- learnpython.org
- Python notebooks for fundamentals of music processing
- Advanced Digital Signal Processing series taught by Dr.-Ing Gerald Schuller of Fraunhofer IDMT, with videos and acommpanying Jupyter notebooks by Renato Profeta
- Yuge Shi's "Gaussian Processes, Not Quite for Dummies"
- Gradient Descent
- Principal Component Analysis: "PCA From Scratch" by @drscotthawley
One finds that many supposed "audio datasets" are really only features or even just metadata! Here are some "raw audio" datasets:
- NSynth Musical Instruments
- GTZAN Genre Collection (Note critique by Bob Sturm)
- Fraunhofer IDMT Guitar/Bass Effects
- Urban Sound Dataset
- FreeSound Annotator (formerly FreeSound Datasets)
- Birdvox-Full-Night
- SignalTrain LA2A
- Kaggle Heartbeat Sounds
- Search for other audio datasets at Kaggle (list)
- A collated list of MIR datasets can be found here, which is the source for audiocontentanalysis.org,but only some are raw audio
- Another list of "audio datasets" by Christopher Dossman
- ...your dataset here...
(or, "Where should I apply for grad school?")
- QMUL (London)
- UPF (Barcelona)
- CRRMA (Stanford, San Francisco)
- IRCAM (Paris)
- NYU (New York)
("Where can I get an internship/job"?)
- Google Magenta
- Google Perception (speech publications)
- Adobe
- Spotify
- Increasingly, everywhere. ;-)
("Which conference(s) should I go to?" -- asked by student on the day this doc began)
**Long list of Music Technology specific conferences https://conferences.smcnetwork.org/ - which is references from here https://github.com/MTG/conferences
- Audio Engineering Society (AES)
- ASA
- Digital Audio Effects (DAFx)
- ICASSP
- ISMIR
- SANE
- Web Audio Conference (WAC)
- SMC
- LVA/ICA
- Audio Mostly
- WIMP
- DCASE
- CSMC
- MuMe
- ICMC
- CMMR
- IBAC
- MLSP
- Interspeech
- FMA
- ICLR
- ICML
- NeurIPS
- IJCNN
("Where can I get published?")
In addition, in machine learning specifically, the tendency is for conference papers to be peer-reviewed and to "count" as journal publications.
Some are yearly, some may be defunct but still interesting.
- MIREX
- SiSEC (Signal Separation Evaluation Campaign)
- Kaggle Heartbeat Sounds
If you want your name listed here, you may. ;-)