AML - Advanced Machine Learning | MIC - Medical Image Computing | Prog - Programming
NB: Time below means when I "studied", not when it has been.
Main focus: preparing CVPR (and IPMI) ddl
- 一堂課讓你認識肺癌(Basic Concepts of Lung Cancer: Diagnosis and Treatment)(Coursera)
- Computational Neuroscience 计算神经科学 (Xuetangx)
- week3 - Signal propagation in neurons
- week4 - Neural Network simulators
- 3D vision is still the main reading
- FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation (CVPR2018)
- PU-Net: Point Cloud Upsampling Network (CVPR2018)
- Pointwise Convolutional Neural Networks (CVPR2018)
- 3D Graph Neural Networks for RGBD Semantic Segmentation (ICCV2017)
- Recurrent Slice Networks for 3D Segmentation of Point Clouds (CVPR2018)
- Learning Representations and Generative Models for 3D Point Clouds (ICML2018): apart from the GAN for point cloud, the metric to measure two point clouds is also useful (togethor with its code). A paper solid to win (8-page supplementary experiments).
- PointGrid: A Deep Network for 3D Shape Understanding (CVPR2018): a simple yet effective and efficient solution for point cloud. I do like this paper (though not very well-wrtien). However, it seems a purified methodology version of VoxelNet (also CVPR2018). No cross citation between these two papers.
- VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection (CVPR2018): a "PointGrid" (CVPR2018 as well) on kitti, better writen. No cross citation between these two papers.
- PointGrow: Autoregressively Learned Point Cloud Generation with Self-Attention (OpenReview, ICLR2019 under review): not very clear writen. Some evaluation is mssing better metrics. Some baseline is missing. I guess a 60% probability to accept. Maybe writen by an intern of authors of DGCNN.
- Group Equivariance
- Attention / Graph
- Hyperbolic Attention Networks (OpenReview, arXiv, ICLR2019 under review): a paper I do not fully understand due to so much missing knowledge in Hyperbolic geometry. A Hyperblic space embedding seems appealing; However, it seems that this paper does not well explained why it is suitable for attention instead of general neural network representation. Worth more exploration.
- Relational Graph Attention Networks (OpenReview, ICLR2019 under review)
- Hierarchical Graph Representation Learning with Differentiable Pooling (arXiv)
- Learning Visual Question Answering by Bootstrapping Hard Attention (ECCV2018)
- A New Angle on L2 Regularization (blog)
- Efficient Annotation of Segmentation Datasets with PolygonRNN++ (CVPR2018): very interesting application of existing algorithms (segmentation + RL + GNN), but some details are missing in this conference paper (maybe better in its journal version?). Many engineering details.
- Taskonomy: Disentangling Task Transfer Learning (CVPR2018 best): hard to understand.
- A Low Power, Fully Event-Based Gesture Recognition System (CVPR2017)
- Hand PointNet: 3D Hand Pose Estimation using Point Sets (CVPR2018): not my area actually. A PointNet application for hand pose regression.
- Visible Machine Learning for Biomedicine (Cell Commentary)
Our paper 3D Deep Learning from CT Scans Predicts Tumor Invasiveness of Subcentimeter Pulmonary Adenocarcinomas
is accepted by Cancer Research (DOI: 10.1158/0008-5472.CAN-18-0696).
- 3D Vision
- Spherical CNNs (ICLR2018 best): this paper is too complicated for me to understand :(
- Spherical convolutions and their application in molecular modelling (NIPS2017): difficult to read for non-native Engish. A good illustration for cubed-sphere grid (link).
- Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models (ICCV2017)
- SO-Net: Self-Organizing Network for Point Cloud Analysis (CVPR2018): basically a PointNet++ with "SOM" clustering.
- SPLATNet: Sparse Lattice Networks for Point Cloud Processing (CVPR2018 oral): it uses differentiable projection to regular grids (permutohedral lattices), together with sparse convolution for efficiency. However, I have not understood its advantage over set-based networks (e.g., PointNet++). Partial reasons are that I have not understood the advantage of bilateral convolution layer (BCL).
- Neural 3D Mesh Renderer (CVPR2018) (project page)
-
NB: very fancy, very useful, but I have not fully understood the graphics-heavy work. I will renew my understanding further. tl;dr: 3 kinds of parameters can be optimized.
vertices
: [n_vertices, 3 (XYZ)],textures
[n_faces, texture_size, texture_size, texture_size, 3 (RGB)], andcamera_position
[3]. Besides,faces
[n_faces, 3 (triangle)] indicates the link of vertices (3 vs make a face), which makes the mesh can be processed like graph (it's graph indeed).faces
seems can not be diff. The paper uses a Straight-Through Estimator to provide the gradients (for the vertices only, not sure at present; the others should have gradients naturally).
-
NB: very fancy, very useful, but I have not fully understood the graphics-heavy work. I will renew my understanding further. tl;dr: 3 kinds of parameters can be optimized.
- Generating 3D Adversarial Point Clouds (arXiv): poorly written.
- Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling (arXiv): not very insightful. Trivial modification uses too much language. Limited empirical improvements. However, learning visible (point) kernel is a good idea for interpretability in deep pc learning (which needs more exploration further).
- Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images (ECCV2018): though appealing, it needs 3D supervision, which is very different from N3MR.
- Self-Attention Generative Adversarial Networks (arXiv): simple yet effective.
- Spectral Normalization Explained (paper: ICLR2018) (blog): greatly explained. SN means a
$W/\sigma(W)$ ,$\sigma(W)$ denotes the max eigen vector of$W$ . Then it provides a simple way to lower the computation. - Generative adversarial interpolative autoencoding: adversarial training on latent space interpolations encourage convex latent distributions (arXiv): an interesting paper, though intuitive.
- Analyzing Inverse Problems with Invertible Neural Networks (arXiv): poorly written, hard to read.
- Image Transformer (ICML2018): self-attention application to autoregressive models.
- Self-Attention with Relative Position Representations (NAACL2018): a short paper, but provides good insight: instead of using pre-defined absolute postion-encoding, it uses learnable relative position embeddings.
- Universal Transformers (arXiv): just a simple modification: add recurrence in Transformers (i.e. sharing weights for multiple layers), plus a trivial ACT (just like my setting in the code ...)
- A radiomics approach to assess tumour-infiltrating CD8 cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: an imaging biomarker, retrospective multicohort study (LANCET Oncology): excellent angle on the usage of radiomics, though methodology is simple, the study is very meaningful and promising.
- Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples (ICML2018 best paper)
- NB: a brilliant conference paper like a journal paper (a research article, a review and a comment paper). Well-writen, comprehensive, well-performing. Very insightful. All the 8 pages are very worth reading. However, its journal version could be better (if it exists further), since some techniques proposed seem not well suited in its Case Study section, e.g., the Reparameterization for solving Vanisihing & Exploding Gradients seems not to appear; instead, it was solved by BPDA. Besides, I don't understand why LID appears in the "Gradient Shattering" section, let alone that LID was not circumbented by the 3 main attack techniques proposed. Overall, the paper developped a good story about 3 shields (Shattered Gradients, Stochastic Gradients and Exploding & Vanishing Gradients) and 3 swords (Backward Pass Differentiable Approximation BPDA, Expectation over Transformation EOT and Reparameterization), while one should be very cafeful that the shields & swords are not the whole of the paper. The comments in the discussion are also valuable for future studies.
- Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning (ICLR2016 -> T-PAMI):
-
NB: good empirical results, elegant solutions. 3 Hyper-parameters:
eplison
(default: 8.0, norm length for (virtual) adversarial training),num_power_iterations
orK
(default: 1, the number of power iterations) andxi
(default: 1e-6, small constant for finite difference). A clean PyTorch implementation here but seems with wrong default hyper-p. For the VAT loss only, it needsK+2
forwards, andK+1
backwards (my calc is sightly different from the paper).
-
NB: good empirical results, elegant solutions. 3 Hyper-parameters:
My Bayesian month :)
- Variantional Inference and Discrete Distribution
- A Tutorial on Variational Bayesian Inference (pdf)
- NB: very clear description on mean field interation, but seems non-trivial on VMP framework. Mean field is something using independant distribution to approximate the whole distribution.
- Tutorial on Variational Autoencoders (arXiv)
- Categorical Reparameterization with Gumbel-Softmax (ICLR2017)
- Learning Latent Permutations with Gumbel-Sinkhorn Networks (ICLR2018)
- The Humble Gumbel Distribution (blog)
- A Tutorial on Variational Bayesian Inference (pdf)
- Generative Flow
- i-RevNet: Deep Invertible Networks (ICLR2018): a very interesting paper with numbers of potential applications; but it's not that novel at this time indeed, e.g., highly related to RealNVP and NICE.
- Glow: Generative Flow with Invertible 1x1 Convolutions (arXiv): introduce a trick Conv1x1 (based on Real NVP, Glow:RealNVP::DCGAN:GAN).
- Density estimation using Real NVP (ICLR2017): a fantastic paper.
- Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models (AAAI2018)
- Normalizing Flows Tutorial (Part 1) (Part 2)
- Improving Variational Inference with Inverse Autoregressive Flow (arXiv, plus a good blog)
- NICE: Non-linear Independent Components Estimation (ICLR2015)
- The Building Blocks of Interpretability (Distill)
- Sampling Generative Networks (NIPS2016): a good paper with bad writing.
- Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer (arXiv)
- Instance Noise: A trick for stabilising GAN training (blog)
A busy month for reproducing DeepLabv3+ and NIPS rebuttal.
- Learning Deep Matrix Representations (arXiv)
- Graph Memory Networks for Molecular Activity Prediction (arXiv)
- Weighted Transformer Network for Machine Translation (OpenReview)
- Introduction to Biomedical Imaging 生物医学成像学导论 (MIC)
- Module 2 CT
- Module 3 Ultrasounds
- Module 4 MRI
- Module 5 PET
- Computational Neuroscience 计算神经科学 (MIC)
- week1 - Basic neuronal models
- week2 - Synapse and channel dynamics
- A mixed-scale dense convolutional neural network for image analysis (PNAS)
- Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation (MICCAI2017)
- In Silico Labeling: Predicting Fluorescent Labels in Unlabeled Images (Cell)
- A Tutorial on Variational Bayesian Inference (pdf)
- Tutorial on Variational Autoencoders (arXiv)
- World Models (website) (arXiv)
- The Building Blocks of Interpretability (Distill)
- Using Artificial Intelligence to Augment Human Intelligence (Distill)
- Memory-Efficient Implementation of DenseNets (arXiv)
- A Comparison of MCC and CEN Error Measures in Multi-Class Prediction (PLOS ONE)
- Fully Convolutional Networks for Semantic Segmentation (CVPR2015)
- Pyramid Scene Parsing Network (CVPR2017)
- Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (a.k.a DeepLab v3+) (arXiv)
- Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks (a.k.a ODIN) (ICLR2018)
- Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery (IPMI2017)
- Anomaly Detection using One-Class Neural Networks (KDD2018)
- Thoughts on "mixup: Data-Dependent Data Augmentation" (blog)
- 3D Vision & Point clouds
- Few Shot
Still a very busy month for preparing papers.
- EGFR
- Somatic mutations drive distinct imaging phenotypes in lung cancer (Cancer Research)
- Defining a Radiomic Response Phenotype: A Pilot Study using targeted therapy in NSCLC (Scientific Reports)
- Non–Small Cell Lung Cancer Radiogenomics Map Identifies Relationships between Molecular and Imaging Phenotypes with Prognostic Implications (Radiology)
- Radiomic Features Are Associated With EGFR Mutation Status in Lung Adenocarcinomas (Clinical Lung Cancer)
- World Models (website) (arXiv)
- The Building Blocks of Interpretability (Distill)
- Using Artificial Intelligence to Augment Human Intelligence (Distill)
- Dynamic Graph CNN for Learning on Point Clouds (arXiv)
A very busy month for preparing papers.
- Interpretability
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (NIPS)
- PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation (CVPR)
- MIC
- Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs (JAMA)
- Dermatologist-level classification of skin cancer with deep neural networks (Nature)
- Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning (Cell)
- Scalable and accurate deep learning for electronic health records (arXiv)
- CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning (arXiv)
- Automated Pulmonary Nodule Detection via 3D ConvNets with Online Sample Filtering and Hybrid-Loss Residual Learning (arXiv)
- DeepLung: 3D Deep Convolutional Nets for Automated Pulmonary Nodule Detection and Classification (arXiv)
- A Survey on Deep Learning in Medical Image Analysis (arXiv)
- Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach (Nature Comm)
- Computational Radiomics System to Decode the Radiographic Phenotype (Cancer Research)
- Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation (arXiv)
- MIL
- Graph Attention Networks (arXiv)
- Deep learning of feature representation with multiple instance learning for medical image analysis (ICASSP): bad writing, trival idea, non-sense to read.
- Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification (arXiv) (code)
- Multi-Instance Deep Learning: Discover Discriminative Local Anatomies for Bodypart Recognition: trival writing, same idea as DSB2017 THU solution.
- Learning from Experts: Developing Transferable Deep Features for Patient-Level Lung Cancer Prediction (MICCAI): very very bad writing, hard to find the details of the model / training / data.
- Attention-based Deep Multiple Instance Learning (ICML2018: !)
- Attention Solves Your TSP (arXiv)
- Multiple-Instance Learning for Medical Image and Video Analysis (IEEE)
- Revisiting Multiple Instance Neural Networks (arXiv)
- An introduction to ROC analysis (ScienceDirect)
- Adaptive Computation Time for Recurrent Neural Networks (arXiv)
- Spatially Adaptive Computation Time for Residual Networks (arXiv)
- A mixed-scale dense convolutional neural network for image analysis (PNAS)
- mixup: Beyond Empirical Risk Minimization (arXiv)
- Hyper Networks (blog)
- Set / Points
- Interpretability
- Learning Deep Features for Discriminative Localization (a.k.a CAM) (arXiv)
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization (arXiv)
- Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning (arXiv)
- A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks (arXiv)
- Fraternal Dropout (arXiv)
- A Tutorial on Variational Bayesian Inference (pdf)
- Tutorial on Variational Autoencoders (arXiv)
- ADLxMLDS (AML)
- RL (Deep RL + Deep RL2 + Imitation Learning)
- 5 Attention
- 6 Special Networks
- 7 Tips
- 10 GAN
- 11 GAN for Seq
- 12 More GAN
- CS231n 2017 (AML)
- Atari Game Playing (AML)
- Fundamentals of Medical Imaging (MIC)
- Chapter 1
- Chapter 2 X-rays
- Chapter 3 CT
- 生物医学成像学导论 (MIC)
- Module 2 CT
- lunglab-keras (MIC)
- Convolutional Invasion and Expansion Networks for Tumor Growth Prediction (IEEE): very simple algorithm, even some bad design... but good at bio-medical explanation. Good feature engineering.
- Runtime Neural Pruning (NIPS)
- Attention Is All You Need (arXiv):
- NB: very impressive work. Seems to be inspired by Conv Seq2Seq, but more general. 3 key points:
-
- variable-length inputs can be also processed in "attention": softmax(KT.dot(Q)).dot(V) => fix size
d
- variable-length inputs can be also processed in "attention": softmax(KT.dot(Q)).dot(V) => fix size
-
- seq2seq is indeed
encoder output
+decoder scoring per step
! Can be parallel implemented with masking (seems to come from conv seq2seq?)
- seq2seq is indeed
-
- RNN / Conv architecture can still be used, espeically in decoder
- One Model To Learn Them All (arXiv): Xception+Transformer+Sparse MoE in one network. Too big title.
- Convolutional Sequence to Sequence Learning (arXiv): Transformer paper seems to absort all of its goodness...
- Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network (arXiv): DSB2017 1st paper.
- On Bayesian Deep Learning and Deep Bayesian Learning (YouTube)
- Pointer Networks (arXiv)
- 动手学深度学习第十三课:正向传播、反向传播和通过时间反向传播 (YouTube)
- 动手学深度学习第十六课:词向量(word2vec)(YouTube)
- CS231n 2017 (AML)
- Fundamentals of Medical Imaging (MIC)
- Chapter 1
- Chapter 2 X-rays
- 生物医学成像学导论 (MIC)
- Module 1 X-rays
- An Overview of Multi-Task Learning in Deep Neural Networks (blog): Hard to say very good review, but mentioned a lot. "Recent approaches have thus looked towards learning what to share".
- Deep Learning: Practice and Trends (NIPS 2017 Tutorial) (YouTube) (slide)
- Towards automatic pulmonary nodule management in lung cancer screening with deep learning (Scientific Reports)
- Special Deep Learning Structure of MLDS
- Time-series Extreme Event Forecasting with Neural Networks at Uber (ICML Time Series Workshop)
- Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting (arXiv): ICML Time Series Workshop, yet very simple work
- Neural Turing Machines (arXiv): very inspirational
- Dynamic Routing Between Capsules (arXiv)
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (arXiv)
- Computerized detection of lung nodules through radiomics (Medical Physics)
- MyWeekly
- "天池医疗AI大赛[第一季]:肺部结节智能诊断" (Team:
LAB518-CreedAI
, rank: 3 / 2887) - Identity Mappings in Deep Residual Networks (a.k.a. ResNet-1001 / ResNet200) (arXiv)
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (arXiv)
- Squeeze-and-Excitation Networks (arXiv)
- Deep Convolutional Neural Networks with Merge-and-Run Mappings (arXiv)
- Interleaved Group Convolutions for Deep Neural Networks (arXiv)
- WSISA: Making Survival Prediction from Whole Slide Histopathological Images (CVPR)
- NB: it's suitable for small data cases while large data size (large per case). It use KMeans in the paper for patch level clustering, train separate models and use the prediction as features. Not beautiful solution.
- Feature Pyramid Networks for Object Detection (arXiv)
- NB: clean top results with single beautiful. Some so-called bells and whistles that have not been tried in the paper: iterative regression [9], hard negative mining [35], context modeling [16], strong data augmentation [22], etc.
- Xception: Deep Learning with Depthwise Separable Convolutions (arXiv)
- Aggregated Residual Transformations for Deep Neural Networks (a.k.a. ResNeXt) (arXiv)
- NB: Group Conv, interpreted as "Network in Neuron".
- Internship in Tencent Social Ads Team @Shenzhen
- Show and Tell: A Neural Image Caption Generator (arXiv)
- Dual Path Networks (arXiv)
- Densely Connected Convolutional Networks (arXiv)
- VoxResNet: Deep Voxelwise Residual Networks for Volumetric Brain Segmentation (arXiv) (project page)
- Busy month for "天池医疗AI大赛[第一季]:肺部结节智能诊断" (Team:
LAB518-CreedAI
, Season 1 rank: 5 / 2887) - Conditional Random Fields as Recurrent Neural Networks (arXiv)
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer (arXiv)
- Accurate Pulmonary Nodule Detection in Computed Tomography Images Using Deep Convolutional Neural Networks (arXiv)
- Dilated Residual Networks (arXiv)
- Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (arXiv)
- Multilevel Contextual 3-D CNNs for False Positive Reduction in Pulmonary Nodule Detection (IEEE)
- Wide Residual Networks (arXiv)
- D. Silver Lecture 1: Introduction to Reinforcement Learning (UCL)
- CS231n (Stanford): Lecture 2 (Linear classification I), Lecture 3 (Linear classification II), Lecture 4 (Backpropagation), Lecture 5 (Training 1), Lecture 6 (Training 2), Lecture 7 (ConvNets), Lecture 10 (RNN).
- Till now, finished all CS231n lectures (vidoes), notes and some readings.
- DL book RNN chapter (link)
- SSD: Single Shot MultiBox Detector (arXiv)
- Attention and Augmented Recurrent Neural Networks (Distill)
- R-FCN: Object Detection via Region-based Fully Convolutional Networks (arXiv)
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (arXiv)
- Training Region-based Object Detectors with Online Hard Example Mining (arXiv)
- Deep Learning, NLP, and Representations (blog)
- CS231n (Stanford): Lecture 12 (Deep Learning Tools), 14 (Videos, Unsupervised learning), 15 (Invited Talk: Jeff Dean)
- MyWeekly
- Recurrent Dropout without Memory Loss (arXiv)
- NB: simple, implemented in TensorFlow. It archieves similar (if not better) results in LSTMs than Gal 2015. In short, it drops the recurrent updates but not the recurrent connections, it allows per-step dropout. Moon et al. 2015 drop the recurrent update and connections, with per-sequence dropout, which allows long-term learning but forget the long-term memory in inference.
- CS231n (Stanford): Lecture 8 (Detection), 13 (Segmentation and Attention)
- The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (arXiv): there are some typos in Table2.
- Multi-Scale Context Aggregation by Dilated Convolutions (arXiv), a.k.a "Dilated-8"
- A Simple Way to Initialize Recurrent Networks of Rectified Linear Units (arXiv), a.k.a "IRNN"
- Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting (arXiv)
- Understanding Convolutions (blog)
- Groups & Group Convolutions (blog)
- Deconvolution and Checkerboard Artifacts (Distill)
- Calculus on Computational Graphs: Backpropagation (blog)
- Neural Networks, Manifolds, and Topology (blog)
- MyWeekly
- STL: A Seasonal-Trend Decomposition Procedurue Base on Loess (link)
- Temporal-Kernel Recurrent Neural Networks (ScienceDirect)
- [REVISIT] A Clockwork RNN (arXiv) (non-official code)
- Visualizing and Understanding Recurrent Networks (arXiv)
- Neural Networks for Time Series Prediction (CMU): super old lecture, even not covering LSTM. While still useful, especially it talks many concepts of time series analysis in engineering guys' eyes (rather than statstician's), though some of them are too "Digital Signal Processing" that make my undergraduate "Signal & System" concepts revive :)
- Dynamic Time Wrapping
- NB: Yet another example of dynamic programming in sequence modeling, I think CTC's idea benifits from DTW (and absolutely HMM).
- K Nearest Neighbors & Dynamic Time Warping (code): clean code, using DTW and kNN for Human Activity Recognition. It clearly shows the esential idea of DTW, and the code is well factored. But something funny is that, in this code, not all the imports are valid, you should import something manualy before running the code.
- Everything you know about Dynamic Time Warping is Wrong (link): gives some highlights of using and researching DTW (about 10 years ago 😐). The wording of this paper is very sharp. 3 chaims: 1) fix length doesn't hurt 2) narrow band doesn't hurt 3) speeding up DTW with tight lower bound is pointless.
- MC and MCMC from Probabilistic Graphical Models Eric Xing (CMU): Lecture 16-18.
- NB: great review for sampling based inference. MC: naive, rejection sampling, importance sampling. MCMC: Metropolis-Hasting, Gibbs, collapsed (Rao-Blackwellised) Gibbs, slice sampling, Reversible Jump MCMC (RJMCMC). RJMCMC is really non-trivial, which I didn't fully understand. It's a MCMC to jump among models' space, designed without detailed balance, while stationary.
- Probabilistic Programming & Bayesian Methods for Hackers (link) (code)
- Probabilistic Graphical Models 3: Learning (Coursera)
- Forecasting at Scale (Prophet)
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (arXiv): simple math but full of brilliant ideas and tricks. (code: good demenstration of using "global" moment and
ewa
) - Layer Normalization (arXiv): even simpler principle, good for RNN but, worse than BN for CNN.
- Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences (NIPS) (TensorFlow implement, Keras implement, both good and clear.)
- Using Fast Weights to Attend to the Recent Past (arXiv) (TensorFlow implement): Very simple math, and easy enough to implement, but it seems lots of physiology background. This paper is another trial aimed to beat LSTM. Fast weights (
FW
) based on IRNN, works well on the mentioned task. TheFW
can be regarded as something to be "memorised" during the step update. I found papers of Hinton are usually recondite. (maybe Canadian English?) - Neural Networks for Machine Learning by Geoffrey Hinton (Coursera): Finally finished. Good review for neural network approaches. Absolutely not a first course. It's a very course that can inspire you a lot if you've already known; but if you haven't known something mentioned in the course, it can be very hard for you to fully understand without other materials.
- MLaPP: Chapter 27.7 Restricted Boltzman machines (RBMs)
Reprise from the Spring Festival 😐
- A Critical Review of Recurrent Neural Networks for Sequence Learning (arXiv)
- NB: there is not any new insight, while good to reflash some idea; it talks about vanilla RNN, LSTM, BRNN and a little bit NTM, and introduce some application, with emphasis on NLP.
- Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks (ICML)
- Bootstrap Methods for Time Series (link)
- NB: Though the article regards itself as a simple intro, it seems too theoretical for me. It provides some good review of bootstrap for time series, remember 1. generate series for AR like model; 2. block bootstrap; 3. markov chain bootstrap; 4. frequency domain by DFT 5. other mixtures.
- Hidden Markov Model
- Conditional Random Field
- Monte Carlo by mathematicalmonk (YouTube): covering importance sampling, Smirnov transform and rejection sampling.
- MCMC by mathematicalmonk (YouTube): covering ergodic theorem and Metropolis, very gentle and intuitive.
- Probabilistic Programming & Bayesian Methods for Hackers (code)
- NB: Great book, about practical Bayesian modeling and PyMC3.
- Variation Inference from Probabilistic Graphical Models Eric Xing (CMU): Lecture 13-15.
- NB: really good (somehow advanced) introduction to VI: loopy belief Propagation, mean field approximation, and general variational principles (solve problem in optimization fashion with dual form of function). The principles part is really abstract, but at least I got the idea. A brief to LDA is also presented in Lecture 15.
- Bayesian optimization by Nando de Freitas (YouTube)
- NB: Great intro to Bayesian opt, in 10 minutes you get the whole picture, and the rest tells some details.
- MyWeekly
- Online Kernel Ridge
- Online regression with kernel (link)
- NB: Indeed a good review for this family of approach, which really hits my ideas (how HAVE they "copied" my idea T_T). However, this paper put strong emphasis on Signal Processing, which is not suit for Machine Learning mind.
- Local online kernel ridge regression for forecasting of urban travel times (ScienceDirect)
- Online regression with kernel (link)
- Gaussian Processes: A Quick Introduction (arXiv)
- CS229 Lecture note of Gaussian Process (Stanford)
- Bayesian Linear Regression (YouTube): A very good deduction, while use not commonly used symbols.
- Recursive Least Square (RLS) (OTexts)
- Time Series Blogs by QuantStart
- NB: These blogs are quite good, covering lots of concepts on Time Series Analysis. While it focus on the representation level, not on the learning level, which means there is less content in these blogs talking about parameter estimate. Anyway, good introduction to ARIMA, GARCH, Kalman Filter and HMM on Time Series.
- Understanding the Basis of the Kalman Filter Via a Simple and Intuitive Derivation (IEEE): Lecture note. It's quite intuitive to understand the "basis" of Kalman Filter, as titled.
- MLaPP: Chapter 18 State Space Model
- Probabilistic Graphical Models 2: Inference (Coursera)
- Probabilistic Graphical Models: Principles and Techniques: Chapter 9, 10, 11, 13 (selected sections)
- Spectral Clustering
- 统计学习方法 第9章 EM算法及其推广
- MyWeekly
- Optimization on Neural Networks
- [REVISIT] Convolution for Neural Networks
- Probabilistic Graphical Models 1: Representation (Coursera)
- Probabilistic Graphical Models: Principles and Techniques: Chapter 1, 2, 3, 5, 6
- NB: I read the Chinese Version (概率图模型:原理与技术), quite good if you are taking the course (and you are Chinese of course); if not, there will be something confusing in the translation version. Anyway, great thanks to the effort of Prof. Wang and Prof. Han
- Implement the Gaussian Mixture Models (code) (notebooks)
- Visual Information Theory (blog)
- Towards my research
- MyWeekly
- Topics on RNN
- [REVISIT] Understanding LSTM Networks (blog) (code)
- NB: The code from Udacity Deep Learning course is exactly the same as described in the blog.
- LSTM: A Search Space Odyssey (arXiv)
- An Empirical Exploration of Recurrent Network Architectures (JMLR)
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling (arXiv)
- A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (arXiv)
- Sequence to Sequence Learning with Neural Networks (arXiv)
- A Clockwork RNN (arXiv)
- Recurrent Neural Networks for Multivariate Time Series with Missing Values (arXiv)
- Deep Learning Lecture 13: Alex Graves on Hallucination with RNNs (YouTube)
- RNNs in TensorFlow, a Practical Guide and Undocumented Features (blog)
- Recurrent Neural Network Regularization (arXiv)
- Multi-task Sequence to Sequence Learning (arXiv)
- [REVISIT] Understanding LSTM Networks (blog) (code)
- Topics on Variational Autoencoders
- WaveNet: A Generative Model for Raw Audio (arXiv) (blog) (code)
- NB: The repo of code is a very simple version to read.
- t-SNE
- Neural Style and Deep Dream
- Deep Learning for ChatBots (blog: Part1 Part2)
- Courses
- Towards my research
- A Novel Empirical Mode Decomposition With Support Vector Regression for Wind Speed Forecasting (IEEE)
- Spatial Transformer Networks (arXiv) (blog) (code)
- Attention and Memory in Deep Learning and NLP (blog)
- Deep Residual Learning for Image Recognition (arXiv)
- Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning (arXiv)
- Bilinear CNN Models for Fine-grained Visual Recognition (arXiv)