aelnouby/Readings-Summaries

Resources

Insightlful papers

How Does Batch Normalization Help Optimization?: https://arxiv.org/pdf/1805.11604.pdf
Generative models for discovering sparse distributedrepresentations (Hinton 1997) https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.1997.0101
A Theoretical Analysis of Contrastive Unsupervised Representation Learning https://arxiv.org/pdf/1902.09229.pdf
On the Measure of Intelligence https://arxiv.org/pdf/1911.01547.pdf
Second Order Properties of Error Surfaces : Learning Time and Generalization https://papers.nips.cc/paper/314-second-order-properties-of-error-surfaces-learning-time-and-generalization.pdf

Representation Learning

Scaling Learning Algorithms towards AI http://yann.lecun.com/exdb/publis/pdf/bengio-lecun-07.pdf
Learning Deep Architectures for AI, https://www.iro.umontreal.ca/~lisa/pointeurs/TR1312.pdf
Representation Learning: A Review and New Perspectives https://arxiv.org/abs/1206.5538
Tutorial on EBMs http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf
Self-Supervision blog post https://lilianweng.github.io/lil-log/2019/11/10/self-supervised-learning.html
Recent Advances in Autoencoder-Based Representation Learning https://arxiv.org/pdf/1812.05069.pdf
Autoencoders blog post https://lilianweng.github.io/lil-log/2018/08/12/from-autoencoder-to-beta-vae.html

Math ideas

Gradient, Divergence, Curland Related Formulae: http://bolvan.ph.utexas.edu/~vadim/Classes/2018f/diffop.pdf
Vector/Matrix Derivatives and Integrals http://mason.gmu.edu/~jgentle/csi771/13f/matrixcalculus.pdf
Taylor expansion theory : http://pathfinder.scar.utoronto.ca/~dyer/csca57/book_P/node26.html
ICA https://arxiv.org/pdf/1404.2986.pdf
PCA https://arxiv.org/pdf/1404.1100.pdf
CCA https://www.cs.cmu.edu/~tom/10701_sp11/slides/CCA_tutorial.pdf
Statstics Resouces https://www.ics.uci.edu/~smyth/courses/cs274/notes.html
RKHS http://mlss.tuebingen.mpg.de/2015/slides/gretton/part_1.pdf
Optimal Transport and Wasserstein Distance http://www.stat.cmu.edu/~larry/=sml/Opt.pdf, Mini Course https://lchizat.github.io/ot2020orsay.html
Integral probablity metrics https://arxiv.org/pdf/0901.2698.pdf, https://sci-hub.tw/10.2307/1428011
Computational Optimal Transport https://arxiv.org/pdf/1803.00567.pdf
Notes on Optimal Transport https://michielstock.github.io/OptimalTransport/
Principles of Riemannian Geometry in Neural Networks https://www.youtube.com/watch?v=IPrNIjA4AWE
Linear algebra (2020 vision) https://ocw.mit.edu/resources/res-18-010-a-2020-vision-of-linear-algebra-spring-2020/index.htm

Information Theory

MIT Lecture notes http://people.lids.mit.edu/yp/homepage/data/itlectures_v5.pdf
The information bottleneck method https://arxiv.org/pdf/physics/0004057.pdf
Deep Learning and the Information Bottleneck Principle https://arxiv.org/pdf/1503.02406.pdf
Mutual Information Neural Estimation https://arxiv.org/pdf/1801.04062.pdf
Compression https://www.cs.cmu.edu/~guyb/realworld/compression.pdf
KL vs Reverse-KL https://wiseodd.github.io/techblog/2016/12/21/forward-reverse-kl/
Mutual Information Estimation https://arxiv.org/pdf/cond-mat/0305641.pdf
Visual Information Theory http://colah.github.io/posts/2015-09-Visual-Information/ , https://www.blackhc.net/blog/2019/better-intuition-for-information-theory/

Optimization

Steepest descent and Natural Gradients https://ipvs.informatik.uni-stuttgart.de/mlr/marc/notes/gradientDescent.pdf
Topologies and neural networks https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
dentifying and attacking the saddle pointproblem in high-dimensional non-convex optimization https://papers.nips.cc/paper/5486-identifying-and-attacking-the-saddle-point-problem-in-high-dimensional-non-convex-optimization.pdf
KFAC https://arxiv.org/pdf/1503.05671.pdf
Stien Variational Gradient Descent (SVGD) http://www.cs.utexas.edu/~lqiang/PDF/svgd_aabi2016.pdf, https://arxiv.org/abs/1608.04471
Conjugate Gradient method https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf

Machine Learning

Generalization Bounds https://mostafa-samir.github.io/ml-theory-pt2/
SVM http://cs229.stanford.edu/notes/cs229-notes3.pdf
Stein Methods https://www.cs.dartmouth.edu/~qliu/PDF/steinslides16.pdf
Gaussian Processes https://distill.pub/2019/visual-exploration-gaussian-processes/, http://www.gaussianprocess.org/gpml/chapters/RW2.pdf, Code: https://github.com/cornellius-gp/gpytorch/blob/master/examples/01_Simple_GP_Regression/Simple_GP_Regression.ipynb
GP lecture UBC https://www.youtube.com/watch?v=4vGiHC35j9s
Neural Processes https://kasparmartens.rbind.io/post/np/ , starter codes https://github.com/deepmind/neural-processes
List of Michael Jordan Tutorials https://people.eecs.berkeley.edu/~jordan/tutorials.html
MMD http://www.jmlr.org/papers/volume13/gretton12a/gretton12a.pdf
Neural Tangent Kernel https://rajatvd.github.io/NTK/
Neural ODEs https://arxiv.org/pdf/1806.07366.pdf, https://blog.acolyer.org/2019/01/09/neural-ordinary-differential-equations/,
Information Bottleneck blog post https://lilianweng.github.io/lil-log/2017/09/28/anatomize-deep-learning-with-information-theory.html#references
Advances in Variational Inference https://arxiv.org/pdf/1711.05597.pdf
Graph Conv Neural Nets Blog post https://tkipf.github.io/graph-convolutional-networks/
Geometric Deep Learning https://arxiv.org/pdf/1611.08097.pdf
Variational Inference Tutorial by Shakir Mohamed https://www.shakirm.com/papers/VITutorial.pdf
VI NIPS talk https://www.youtube.com/watch?v=ogdv_6dbvVQ
Gradient Based MCMC http://www.cs.toronto.edu/~jessebett/CSC412/content/week8/grad_mcmc.pdf
Yee Whye Teh Course (SC4/SM8 Advanced Topics in Statistical Machine Learning) https://github.com/ywteh/advml2020
MRFs/CRFs https://ermongroup.github.io/cs228-notes/representation/undirected/
PGM course notes https://ermongroup.github.io/cs228-notes/

MetaLearning

Blog post https://www.borealisai.com/en/blog/tutorial-2-few-shot-learning-and-meta-learning-i/ , https://www.borealisai.com/en/blog/tutorial-3-few-shot-learning-and-meta-learning-ii/

Computer Vision

Optical Flow: https://blog.nanonets.com/optical-flow/, https://reader.elsevier.com/reader/sd/pii/S0923596518302479?token=8E5DDBE77C9294FB10D4B64081DA1F40947D46C1331F6AEE20C052A2587D5ABC770E3663B8632E7122D2EF1CF4595401
Spectral Clustering (Graph Cut Segmentation) https://towardsdatascience.com/spectral-clustering-aba2640c0d5b

Reinforcement Learning

Intro to RL https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html#deep-q-network
Policy gradient algorithms https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html
Spinning-up OpenAI https://spinningup.openai.com/en/latest/user/introduction.html

Talks

NIPS 2016 Workshop on Adversarial Training - Yann LeCun - Energy Based Adversarial Training https://www.youtube.com/watch?v=88nKI-qqWEo&list=PL80I41oVxglK--is17UhoHVosOLFEJzKQ&index=17&t=0s
AAAI Turing award winners talks https://www.youtube.com/watch?v=UX8OubxsY8w
Hinton's "What is wrong with conv nets ?" talk https://www.youtube.com/watch?v=rTawFwUvnLE&feature=emb_title

Practical

Mixed precision (Apex) https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9998-automatic-mixed-precision-in-pytorch.pdf
Autograd lecture http://videolectures.net/deeplearning2017_johnson_automatic_differentiation/
Transformer Family https://lilianweng.github.io/lil-log/2020/04/07/the-transformer-family.html
CUDA resources (University Courses links) https://developer.nvidia.com/educators/existing-courses , CUDA Crash Course https://www.youtube.com/playlist?list=PLxNPSjHT5qvtYRVdNN1yDcdSl39uHV_sU
Parallel Computing Arch and Programmig (CMU Course) https://scs.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=66c4b4cc-5dbd-425c-87ed-5d0d217c20b3

Technical Writing

General

Student's Guide https://github.com/lintool/guide