Learning-Notes: A repository from HappyCoderGS

Study Plan

AML - Advanced Machine Learning | MIC - Medical Image Computing | Prog - Programming

NB: Time below means when I "studied", not when it has been.

2018-10

Main focus: preparing CVPR (and IPMI) ddl

Study

一堂課讓你認識肺癌（Basic Concepts of Lung Cancer: Diagnosis and Treatment）(Coursera)
Computational Neuroscience 计算神经科学 (Xuetangx)
- week3 - Signal propagation in neurons
- week4 - Neural Network simulators

Reading

2018-09

Our paper 3D Deep Learning from CT Scans Predicts Tumor Invasiveness of Subcentimeter Pulmonary Adenocarcinomas is accepted by Cancer Research (DOI: 10.1158/0008-5472.CAN-18-0696).

Reading

2018-08

My Bayesian month :)

Reading

2018-07

A busy month for reproducing DeepLabv3+ and NIPS rebuttal.

Reading

Learning Deep Matrix Representations (arXiv)
Graph Memory Networks for Molecular Activity Prediction (arXiv)
Weighted Transformer Network for Machine Translation (OpenReview)

2018-06

Study Plan

Introduction to Biomedical Imaging 生物医学成像学导论 (MIC)
- Module 2 CT
- Module 3 Ultrasounds
- Module 4 MRI
- Module 5 PET
Computational Neuroscience 计算神经科学 (MIC)
- week1 - Basic neuronal models
- week2 - Synapse and channel dynamics

Reading

2018-05

Still a very busy month for preparing papers.

Reading

EGFR
- Somatic mutations drive distinct imaging phenotypes in lung cancer (Cancer Research)
- Defining a Radiomic Response Phenotype: A Pilot Study using targeted therapy in NSCLC (Scientific Reports)
- Non–Small Cell Lung Cancer Radiogenomics Map Identifies Relationships between Molecular and Imaging Phenotypes with Prognostic Implications (Radiology)
- Radiomic Features Are Associated With EGFR Mutation Status in Lung Adenocarcinomas (Clinical Lung Cancer)
World Models (website) (arXiv)
The Building Blocks of Interpretability (Distill)
Using Artiﬁcial Intelligence to Augment Human Intelligence (Distill)
Dynamic Graph CNN for Learning on Point Clouds (arXiv)

2018-04

A very busy month for preparing papers.

Reading

Interpretability
- Learning with Rejection (page)
- Predict Responsibly: Increasing Fairness by Learning To Defer (arXiv)
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (NIPS)
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (CVPR)

2018-03

Study Plan

lunglab-keras (MIC)
Pointer Network in PyTorch and plus (AML)
Try radiomics and genomics (MIC)

Reading

2018-02

Study Plan

生物医学成像学导论 (MIC)
- Module 2 CT
lunglab-keras (MIC)
Pointer Network in PyTorch (AML)

Reading

2018-01

MyWeekly

Reinforcement Learning Demystified

Study Plan

Reading

Convolutional Invasion and Expansion Networks for Tumor Growth Prediction (IEEE): very simple algorithm, even some bad design... but good at bio-medical explanation. Good feature engineering.
Runtime Neural Pruning (NIPS)
Attention Is All You Need (arXiv):
- NB: very impressive work. Seems to be inspired by Conv Seq2Seq, but more general. 3 key points:
- 1. variable-length inputs can be also processed in "attention": softmax(KT.dot(Q)).dot(V) => fix size d
- 1. seq2seq is indeed encoder output + decoder scoring per step! Can be parallel implemented with masking (seems to come from conv seq2seq?)
- 1. RNN / Conv architecture can still be used, espeically in decoder
One Model To Learn Them All (arXiv): Xception+Transformer+Sparse MoE in one network. Too big title.
Convolutional Sequence to Sequence Learning (arXiv): Transformer paper seems to absort all of its goodness...
Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network (arXiv): DSB2017 1st paper.
On Bayesian Deep Learning and Deep Bayesian Learning (YouTube)
Pointer Networks (arXiv)
动手学深度学习第十三课：正向传播、反向传播和通过时间反向传播 (YouTube)
动手学深度学习第十六课：词向量（word2vec）(YouTube)

2017-12

Study Plan

Reading

An Overview of Multi-Task Learning in Deep Neural Networks (blog): Hard to say very good review, but mentioned a lot. "Recent approaches have thus looked towards learning what to share".
Deep Learning: Practice and Trends (NIPS 2017 Tutorial) (YouTube) (slide)
Towards automatic pulmonary nodule management in lung cancer screening with deep learning (Scientific Reports)
Special Deep Learning Structure of MLDS
- Spatial Transformer Layer (slide)
- Highway Network & Grid LSTM (slide)

2017-11

Time-series Extreme Event Forecasting with Neural Networks at Uber (ICML Time Series Workshop)
Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting (arXiv): ICML Time Series Workshop, yet very simple work
Neural Turing Machines (arXiv): very inspirational
Dynamic Routing Between Capsules (arXiv)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (arXiv)
Computerized detection of lung nodules through radiomics (Medical Physics)

2017-10

MyWeekly
- Modern CNN Design
"天池医疗AI大赛[第一季]：肺部结节智能诊断" (Team: LAB518-CreedAI, rank: 3 / 2887)
Identity Mappings in Deep Residual Networks (a.k.a. ResNet-1001 / ResNet200) (arXiv)
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (arXiv)
Squeeze-and-Excitation Networks (arXiv)
Deep Convolutional Neural Networks with Merge-and-Run Mappings (arXiv)
Interleaved Group Convolutions for Deep Neural Networks (arXiv)

2017-09

WSISA: Making Survival Prediction from Whole Slide Histopathological Images (CVPR)
- NB: it's suitable for small data cases while large data size (large per case). It use KMeans in the paper for patch level clustering, train separate models and use the prediction as features. Not beautiful solution.
Feature Pyramid Networks for Object Detection (arXiv)
- NB: clean top results with single beautiful. Some so-called bells and whistles that have not been tried in the paper: iterative regression [9], hard negative mining [35], context modeling [16], strong data augmentation [22], etc.
Xception: Deep Learning with Depthwise Separable Convolutions (arXiv)
Aggregated Residual Transformations for Deep Neural Networks (a.k.a. ResNeXt) (arXiv)
- NB: Group Conv, interpreted as "Network in Neuron".

2017-08

Internship in Tencent Social Ads Team @Shenzhen
Show and Tell: A Neural Image Caption Generator (arXiv)
Dual Path Networks (arXiv)
Densely Connected Convolutional Networks (arXiv)
VoxResNet: Deep Voxelwise Residual Networks for Volumetric Brain Segmentation (arXiv) (project page)

2017-07

Busy month for "天池医疗AI大赛[第一季]：肺部结节智能诊断" (Team: LAB518-CreedAI, Season 1 rank: 5 / 2887)
Conditional Random Fields as Recurrent Neural Networks (arXiv)
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (arXiv)
Accurate Pulmonary Nodule Detection in Computed Tomography Images Using Deep Convolutional Neural Networks (arXiv)
Dilated Residual Networks (arXiv)
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (arXiv)

2017-06

Multilevel Contextual 3-D CNNs for False Positive Reduction in Pulmonary Nodule Detection (IEEE)
Wide Residual Networks (arXiv)
D. Silver Lecture 1: Introduction to Reinforcement Learning (UCL)
CS231n (Stanford): Lecture 2 (Linear classification I), Lecture 3 (Linear classification II), Lecture 4 (Backpropagation), Lecture 5 (Training 1), Lecture 6 (Training 2), Lecture 7 (ConvNets), Lecture 10 (RNN).
- Till now, finished all CS231n lectures (vidoes), notes and some readings.
DL book RNN chapter (link)

2017-05

SSD: Single Shot MultiBox Detector (arXiv)
Attention and Augmented Recurrent Neural Networks (Distill)
R-FCN: Object Detection via Region-based Fully Convolutional Networks (arXiv)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (arXiv)
Training Region-based Object Detectors with Online Hard Example Mining (arXiv)
Deep Learning, NLP, and Representations (blog)
CS231n (Stanford): Lecture 12 (Deep Learning Tools), 14 (Videos, Unsupervised learning), 15 (Invited Talk: Jeff Dean)

2017-04

MyWeekly
- Restricted Boltzmann Machines
Recurrent Dropout without Memory Loss (arXiv)
- NB: simple, implemented in TensorFlow. It archieves similar (if not better) results in LSTMs than Gal 2015. In short, it drops the recurrent updates but not the recurrent connections, it allows per-step dropout. Moon et al. 2015 drop the recurrent update and connections, with per-sequence dropout, which allows long-term learning but forget the long-term memory in inference.
CS231n (Stanford): Lecture 8 (Detection), 13 (Segmentation and Attention)
The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (arXiv): there are some typos in Table2.
Multi-Scale Context Aggregation by Dilated Convolutions (arXiv), a.k.a "Dilated-8"
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units (arXiv), a.k.a "IRNN"
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting (arXiv)
Understanding Convolutions (blog)
Groups & Group Convolutions (blog)
Deconvolution and Checkerboard Artifacts (Distill)
Calculus on Computational Graphs: Backpropagation (blog)
Neural Networks, Manifolds, and Topology (blog)

2017-03

MyWeekly
STL: A Seasonal-Trend Decomposition Procedurue Base on Loess (link)
Temporal-Kernel Recurrent Neural Networks (ScienceDirect)
[REVISIT] A Clockwork RNN (arXiv) (non-official code)
Visualizing and Understanding Recurrent Networks (arXiv)
Neural Networks for Time Series Prediction (CMU): super old lecture, even not covering LSTM. While still useful, especially it talks many concepts of time series analysis in engineering guys' eyes (rather than statstician's), though some of them are too "Digital Signal Processing" that make my undergraduate "Signal & System" concepts revive :)
Dynamic Time Wrapping
- NB: Yet another example of dynamic programming in sequence modeling, I think CTC's idea benifits from DTW (and absolutely HMM).
- K Nearest Neighbors & Dynamic Time Warping (code): clean code, using DTW and kNN for Human Activity Recognition. It clearly shows the esential idea of DTW, and the code is well factored. But something funny is that, in this code, not all the imports are valid, you should import something manualy before running the code.
- Everything you know about Dynamic Time Warping is Wrong (link): gives some highlights of using and researching DTW (about 10 years ago 😐). The wording of this paper is very sharp. 3 chaims: 1) fix length doesn't hurt 2) narrow band doesn't hurt 3) speeding up DTW with tight lower bound is pointless.
MC and MCMC from Probabilistic Graphical Models Eric Xing (CMU): Lecture 16-18.
- NB: great review for sampling based inference. MC: naive, rejection sampling, importance sampling. MCMC: Metropolis-Hasting, Gibbs, collapsed (Rao-Blackwellised) Gibbs, slice sampling, Reversible Jump MCMC (RJMCMC). RJMCMC is really non-trivial, which I didn't fully understand. It's a MCMC to jump among models' space, designed without detailed balance, while stationary.
Probabilistic Programming & Bayesian Methods for Hackers (link) (code)
Probabilistic Graphical Models 3: Learning (Coursera)
Forecasting at Scale (Prophet)
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (arXiv): simple math but full of brilliant ideas and tricks. (code: good demenstration of using "global" moment and ewa)
Layer Normalization (arXiv): even simpler principle, good for RNN but, worse than BN for CNN.
Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences (NIPS) (TensorFlow implement, Keras implement, both good and clear.)
Using Fast Weights to Attend to the Recent Past (arXiv) (TensorFlow implement): Very simple math, and easy enough to implement, but it seems lots of physiology background. This paper is another trial aimed to beat LSTM. Fast weights (FW) based on IRNN, works well on the mentioned task. The FW can be regarded as something to be "memorised" during the step update. I found papers of Hinton are usually recondite. (maybe Canadian English?)
Neural Networks for Machine Learning by Geoffrey Hinton (Coursera): Finally finished. Good review for neural network approaches. Absolutely not a first course. It's a very course that can inspire you a lot if you've already known; but if you haven't known something mentioned in the course, it can be very hard for you to fully understand without other materials.
MLaPP: Chapter 27.7 Restricted Boltzman machines (RBMs)

2017-02

Reprise from the Spring Festival 😐

A Critical Review of Recurrent Neural Networks for Sequence Learning (arXiv)
- NB: there is not any new insight, while good to reflash some idea; it talks about vanilla RNN, LSTM, BRNN and a little bit NTM, and introduce some application, with emphasis on NLP.
Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks (ICML)
Bootstrap Methods for Time Series (link)
- NB: Though the article regards itself as a simple intro, it seems too theoretical for me. It provides some good review of bootstrap for time series, remember 1. generate series for AR like model; 2. block bootstrap; 3. markov chain bootstrap; 4. frequency domain by DFT 5. other mixtures.
Hidden Markov Model
- Markov model and HMM by mathematicalmonk (YouTube): covering forward-backward and Viterbi.
- 统计学习方法第10章隐马尔可夫模型
Conditional Random Field
- Lecture from CMU (YouTube)
- 统计学习方法第11章条件随机场
Monte Carlo by mathematicalmonk (YouTube): covering importance sampling, Smirnov transform and rejection sampling.
MCMC by mathematicalmonk (YouTube): covering ergodic theorem and Metropolis, very gentle and intuitive.
Probabilistic Programming & Bayesian Methods for Hackers (code)
- NB: Great book, about practical Bayesian modeling and PyMC3.
Variation Inference from Probabilistic Graphical Models Eric Xing (CMU): Lecture 13-15.
- NB: really good (somehow advanced) introduction to VI: loopy belief Propagation, mean field approximation, and general variational principles (solve problem in optimization fashion with dual form of function). The principles part is really abstract, but at least I got the idea. A brief to LDA is also presented in Lecture 15.
Bayesian optimization by Nando de Freitas (YouTube)
- NB: Great intro to Bayesian opt, in 10 minutes you get the whole picture, and the rest tells some details.

2017-01

MyWeekly
- Advanced Linear Regression
Online Kernel Ridge
- Online regression with kernel (link)
  - NB: Indeed a good review for this family of approach, which really hits my ideas (how HAVE they "copied" my idea T_T). However, this paper put strong emphasis on Signal Processing, which is not suit for Machine Learning mind.
- Local online kernel ridge regression for forecasting of urban travel times (ScienceDirect)
Gaussian Processes: A Quick Introduction (arXiv)
CS229 Lecture note of Gaussian Process (Stanford)
Bayesian Linear Regression (YouTube): A very good deduction, while use not commonly used symbols.
Recursive Least Square (RLS) (OTexts)
Time Series Blogs by QuantStart
- NB: These blogs are quite good, covering lots of concepts on Time Series Analysis. While it focus on the representation level, not on the learning level, which means there is less content in these blogs talking about parameter estimate. Anyway, good introduction to ARIMA, GARCH, Kalman Filter and HMM on Time Series.
Understanding the Basis of the Kalman Filter Via a Simple and Intuitive Derivation (IEEE): Lecture note. It's quite intuitive to understand the "basis" of Kalman Filter, as titled.
MLaPP: Chapter 18 State Space Model
Probabilistic Graphical Models 2: Inference (Coursera)
Probabilistic Graphical Models: Principles and Techniques: Chapter 9, 10, 11, 13 (selected sections)

2016-12

Spectral Clustering
- On Spectral Clustering: Analysis and an algorithm (NIPS)
- Spectral Clustering Tutorial Series by Chieu from NEU (YouTube): Good but too trivial
- Implement the Gaussian Mixture Models (notebooks)
统计学习方法第9章 EM算法及其推广

2016-11

MyWeekly
- Recipe of Gradient Descent
- Brief Intro to Variational Inference (Minimal Statistics Background)
Optimization on Neural Networks
- How to make the learning go faster by Geoffrey Hinton (Coursera: Week 6)
- An overview of gradient descent optimization algorithms (blog)
- CS231n Course Notes on gradient based optimization (Stanford)
[REVISIT] Convolution for Neural Networks
- Convolution arithmetic tutorial (blog)
- Conv Nets: A Modular Perspective (blog)
Probabilistic Graphical Models 1: Representation (Coursera)
Probabilistic Graphical Models: Principles and Techniques: Chapter 1, 2, 3, 5, 6
- NB: I read the Chinese Version (概率图模型：原理与技术), quite good if you are taking the course (and you are Chinese of course); if not, there will be something confusing in the translation version. Anyway, great thanks to the effort of Prof. Wang and Prof. Han
Implement the Gaussian Mixture Models (code) (notebooks)
Visual Information Theory (blog)
Towards my research
- Two Machine Learning Approaches for Short-Term Wind Speed Time-Series Prediction (IEEE)
- Forecasting day ahead electricity spot prices: The impact of the EXAA to other European electricity markets (arXiv)

2016-10

MyWeekly
Topics on RNN
- [REVISIT] Understanding LSTM Networks (blog) (code)
  - NB: The code from Udacity Deep Learning course is exactly the same as described in the blog.
- LSTM: A Search Space Odyssey (arXiv)
- An Empirical Exploration of Recurrent Network Architectures (JMLR)
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (arXiv)
- A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (arXiv)
- Sequence to Sequence Learning with Neural Networks (arXiv)
- A Clockwork RNN (arXiv)
- Recurrent Neural Networks for Multivariate Time Series with Missing Values (arXiv)
- Deep Learning Lecture 13: Alex Graves on Hallucination with RNNs (YouTube)
- RNNs in TensorFlow, a Practical Guide and Undocumented Features (blog)
- Recurrent Neural Network Regularization (arXiv)
- Multi-task Sequence to Sequence Learning (arXiv)
Topics on Variational Autoencoders
- Variational Inference Tutorial Series by Chieu from NEU (YouTube)
- Deep Learning Lecture 14: Karol Gregor on Variational Autoencoders and Image Generation (YouTube)
- Building Autoencoders in Keras (blog)
- Markov Chain Monte Carlo Without all the Bullshit (blog)
WaveNet: A Generative Model for Raw Audio (arXiv) (blog) (code)
- NB: The repo of code is a very simple version to read.
t-SNE
- Visualizing Data Using t-SNE (YouTube)
- How to Use t-SNE Effectively (blog)
Neural Style and Deep Dream
- A Neural Algorithm of Artistic Style (arXiv)
- Keras Deep Dream (code)
- Keras Neural Style (code)
- CS231n (Stanford): Lecture 9
Deep Learning for ChatBots (blog: Part1 Part2)
Courses
- CS224d (Stanford): Lecture 1 - 3
- CS231n (Stanford): Lecture 1
- Neural Networks for Machine Learning by Geoffrey Hinton (Coursera): Week 1 - 3
Towards my research
- A Novel Empirical Mode Decomposition With Support Vector Regression for Wind Speed Forecasting (IEEE)

2016-09

Spatial Transformer Networks (arXiv) (blog) (code)
Attention and Memory in Deep Learning and NLP (blog)
Deep Residual Learning for Image Recognition (arXiv)
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning (arXiv)
Bilinear CNN Models for Fine-grained Visual Recognition (arXiv)

HappyCoderGS/Learning-Notes

Study Plan

2018-10

Study

Reading

2018-09

Reading

2018-08

Reading

2018-07

Reading

2018-06

Study Plan

Reading

2018-05

Reading

2018-04

Reading

2018-03

Study Plan

Reading

2018-02

Study Plan

Reading

2018-01

MyWeekly

Study Plan

Reading

2017-12

Study Plan

Reading

2017-11

2017-10

2017-09

2017-08

2017-07

2017-06

2017-05

2017-04

2017-03

2017-02

2017-01

2016-12

2016-11

2016-10

2016-09