-
How Does Batch Normalization Help Optimization?: https://arxiv.org/pdf/1805.11604.pdf
-
Generative models for discovering sparse distributedrepresentations (Hinton 1997) https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.1997.0101
-
A Theoretical Analysis of Contrastive Unsupervised Representation Learning https://arxiv.org/pdf/1902.09229.pdf
-
On the Measure of Intelligence https://arxiv.org/pdf/1911.01547.pdf
-
Second Order Properties of Error Surfaces : Learning Time and Generalization https://papers.nips.cc/paper/314-second-order-properties-of-error-surfaces-learning-time-and-generalization.pdf
-
Scaling Learning Algorithms towards AI http://yann.lecun.com/exdb/publis/pdf/bengio-lecun-07.pdf
-
Learning Deep Architectures for AI, https://www.iro.umontreal.ca/~lisa/pointeurs/TR1312.pdf
-
Representation Learning: A Review and New Perspectives https://arxiv.org/abs/1206.5538
-
Tutorial on EBMs http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf
-
Self-Supervision blog post https://lilianweng.github.io/lil-log/2019/11/10/self-supervised-learning.html
-
Recent Advances in Autoencoder-Based Representation Learning https://arxiv.org/pdf/1812.05069.pdf
-
Autoencoders blog post https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html
-
Gradient, Divergence, Curland Related Formulae: http://bolvan.ph.utexas.edu/~vadim/Classes/2018f/diffop.pdf
-
Vector/Matrix Derivatives and Integrals http://mason.gmu.edu/~jgentle/csi771/13f/matrixcalculus.pdf
-
Taylor expansion theory : http://pathfinder.scar.utoronto.ca/~dyer/csca57/book_P/node26.html
-
CCA https://www.cs.cmu.edu/~tom/10701_sp11/slides/CCA_tutorial.pdf
-
Statstics Resouces https://www.ics.uci.edu/~smyth/courses/cs274/notes.html
-
RKHS http://mlss.tuebingen.mpg.de/2015/slides/gretton/part_1.pdf
-
Optimal Transport and Wasserstein Distance http://www.stat.cmu.edu/~larry/=sml/Opt.pdf, Mini Course https://lchizat.github.io/ot2020orsay.html
-
Integral probablity metrics https://arxiv.org/pdf/0901.2698.pdf, https://sci-hub.tw/10.2307/1428011
-
Computational Optimal Transport https://arxiv.org/pdf/1803.00567.pdf
-
Notes on Optimal Transport https://michielstock.github.io/OptimalTransport/
-
Principles of Riemannian Geometry in Neural Networks https://www.youtube.com/watch?v=IPrNIjA4AWE
-
Linear algebra (2020 vision) https://ocw.mit.edu/resources/res-18-010-a-2020-vision-of-linear-algebra-spring-2020/index.htm
-
MIT Lecture notes http://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf
-
The information bottleneck method https://arxiv.org/pdf/physics/0004057.pdf
-
Deep Learning and the Information Bottleneck Principle https://arxiv.org/pdf/1503.02406.pdf
-
Mutual Information Neural Estimation https://arxiv.org/pdf/1801.04062.pdf
-
Compression https://www.cs.cmu.edu/~guyb/realworld/compression.pdf
-
KL vs Reverse-KL https://wiseodd.github.io/techblog/2016/12/21/forward-reverse-kl/
-
Mutual Information Estimation https://arxiv.org/pdf/cond-mat/0305641.pdf
-
Visual Information Theory http://colah.github.io/posts/2015-09-Visual-Information/ , https://www.blackhc.net/blog/2019/better-intuition-for-information-theory/
-
Steepest descent and Natural Gradients https://ipvs.informatik.uni-stuttgart.de/mlr/marc/notes/gradientDescent.pdf
-
Topologies and neural networks https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
-
dentifying and attacking the saddle pointproblem in high-dimensional non-convex optimization https://papers.nips.cc/paper/5486-identifying-and-attacking-the-saddle-point-problem-in-high-dimensional-non-convex-optimization.pdf
-
Stien Variational Gradient Descent (SVGD) http://www.cs.utexas.edu/~lqiang/PDF/svgd_aabi2016.pdf, https://arxiv.org/abs/1608.04471
-
Conjugate Gradient method https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf
-
Generalization Bounds https://mostafa-samir.github.io/ml-theory-pt2/
-
Stein Methods https://www.cs.dartmouth.edu/~qliu/PDF/steinslides16.pdf
-
Gaussian Processes https://distill.pub/2019/visual-exploration-gaussian-processes/, http://www.gaussianprocess.org/gpml/chapters/RW2.pdf, Code: https://github.com/cornellius-gp/gpytorch/blob/master/examples/01_Simple_GP_Regression/Simple_GP_Regression.ipynb
-
GP lecture UBC https://www.youtube.com/watch?v=4vGiHC35j9s
-
Neural Processes https://kasparmartens.rbind.io/post/np/ , starter codes https://github.com/deepmind/neural-processes
-
List of Michael Jordan Tutorials https://people.eecs.berkeley.edu/~jordan/tutorials.html
-
MMD http://www.jmlr.org/papers/volume13/gretton12a/gretton12a.pdf
-
Neural Tangent Kernel https://rajatvd.github.io/NTK/
-
Neural ODEs https://arxiv.org/pdf/1806.07366.pdf, https://blog.acolyer.org/2019/01/09/neural-ordinary-differential-equations/,
-
Information Bottleneck blog post https://lilianweng.github.io/lil-log/2017/09/28/anatomize-deep-learning-with-information-theory.html#references
-
Advances in Variational Inference https://arxiv.org/pdf/1711.05597.pdf
-
Graph Conv Neural Nets Blog post https://tkipf.github.io/graph-convolutional-networks/
-
Geometric Deep Learning https://arxiv.org/pdf/1611.08097.pdf
-
Variational Inference Tutorial by Shakir Mohamed https://www.shakirm.com/papers/VITutorial.pdf
-
VI NIPS talk https://www.youtube.com/watch?v=ogdv_6dbvVQ
-
Gradient Based MCMC http://www.cs.toronto.edu/~jessebett/CSC412/content/week8/grad_mcmc.pdf
-
Yee Whye Teh Course (SC4/SM8 Advanced Topics in Statistical Machine Learning) https://github.com/ywteh/advml2020
-
MRFs/CRFs https://ermongroup.github.io/cs228-notes/representation/undirected/
-
PGM course notes https://ermongroup.github.io/cs228-notes/
- Blog post https://www.borealisai.com/en/blog/tutorial-2-few-shot-learning-and-meta-learning-i/ , https://www.borealisai.com/en/blog/tutorial-3-few-shot-learning-and-meta-learning-ii/
-
Optical Flow: https://blog.nanonets.com/optical-flow/, https://reader.elsevier.com/reader/sd/pii/S0923596518302479?token=8E5DDBE77C9294FB10D4B64081DA1F40947D46C1331F6AEE20C052A2587D5ABC770E3663B8632E7122D2EF1CF4595401
-
Spectral Clustering (Graph Cut Segmentation) https://towardsdatascience.com/spectral-clustering-aba2640c0d5b
- Intro to RL https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html#deep-q-network
- Policy gradient algorithms https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html
- Spinning-up OpenAI https://spinningup.openai.com/en/latest/user/introduction.html
-
NIPS 2016 Workshop on Adversarial Training - Yann LeCun - Energy Based Adversarial Training https://www.youtube.com/watch?v=88nKI-qqWEo&list=PL80I41oVxglK--is17UhoHVosOLFEJzKQ&index=17&t=0s
-
AAAI Turing award winners talks https://www.youtube.com/watch?v=UX8OubxsY8w
-
Hinton's "What is wrong with conv nets ?" talk https://www.youtube.com/watch?v=rTawFwUvnLE&feature=emb_title
-
Mixed precision (Apex) https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9998-automatic-mixed-precision-in-pytorch.pdf
-
Autograd lecture http://videolectures.net/deeplearning2017_johnson_automatic_differentiation/
-
Transformer Family https://lilianweng.github.io/lil-log/2020/04/07/the-transformer-family.html
-
CUDA resources (University Courses links) https://developer.nvidia.com/educators/existing-courses , CUDA Crash Course https://www.youtube.com/playlist?list=PLxNPSjHT5qvtYRVdNN1yDcdSl39uHV_sU
-
Parallel Computing Arch and Programmig (CMU Course) https://scs.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=66c4b4cc-5dbd-425c-87ed-5d0d217c20b3
-
http://web.engr.oregonstate.edu/~tgd/talks/new-in-ml-2019.pdf
-
Reading a paper https://towardsdatascience.com/guide-to-reading-academic-research-papers-c69c21619de6
- Student's Guide https://github.com/lintool/guide