SnowSelected Paper List (WIP)

Here is an academic paper list which contains the papers that SnowCloud.ai AI Research Lab considered to be very important, must read.

The reason of any paper to be selected in this list may be any of the following:

The paper had brought a paradigm shift in its own domain.
The paper contained vital parts which lead the appearance of papers in 1.
The paper may cause a paradigm shift within 5 years.

After each subdomain, we proposed several ideas that may inspire your work that might be qualified to appear in this list.

SnowSelected is all you need.

Natual Language Processing

Long and Short-Term Memory : An original idea for long sentences processing, inspired by human neural information processing mechanism.
Recurrent neural network based language model : An original idea of introducing RNN-like structure into the language model(LM).
GRU : A simple yet effective model for RNN-like structure. A large number of effective, high-precision models based on this architecture.
Connectionist temporal classification : Inspired by dynamic processing and dynamic time warping(DTW) when dealing with time-warped sequences like audio data.
Learning Longer Memory in RNN : Formulated Recursive Neural Network which can be applied on sequences recursively by only using a single compact model.
Learning phrase representations using RNN encoder-decoder for statistical machine translation : "Cho Model" for NMT.
Seq2Seq: "Sutskever Model" for NMT, an advanced version.
A Convolutional Neural Network for Modelling Sentences : Conv model for NLP.. More efficient on AI chips.
CNN on Sentence Classification : Conv model for NLP.
Very Deep Convolutional Networks for Text Classification : Conv model foor NLP.
Neural Machine Translation by Jointly Learning to Align and Translate : Attention mechanism first introduced in NLP field.
Soft And Hard Attention : Introduced the choice of soft and hard attention along features.
Global And Local Attention : Introduced attention along data.
Character-Aware Neural Language Models: Character level Conv model for NLP.
Attention is All You Need. : First transduction model relying entirely on self-attention to compute representations of its input and output without using RNNs or convolution, but global FC. Introduced positional encoding, 15% mask sampling and multihead (plus, minus, eltwise product) additive attention.
Universial Transformer
BERT: Bidirectional. Optimized for downstream tasks.
Attentive Neural Processes
Transformer-XL: Introduced relative positional encoding. State reuse resolved the problem may caused by excessive long sentence.
Focused Attention Networks
XLNet : Combined AR and AE models. Introduced DAG while learning AR parameters in sentence segments.
Unsupervised Question Answering by Cloze Translation
Generating Long Sequences with Sparse Transformers : Simplified structure of XLNet AR part. And BERT for CV.(ADDRESS OUR #3 in [what is NEXT])

So what is NEXT?

Better sampling to keep locally complete information of data.
Better relative positional encoding beyond "learned from position".
Simplified structure of XLNet AR part.

Computer Vision

Invertible 1x1

Architecture

AlexNet : The Beginning of Deep Learning for CV. Achieve new high rcoord in imagenet classification
VGG : Deeper (19 layers at most) Conv3x3 models.

Google Series
- GoogLeNet : Combinations of different kernel sizes.
- Inception v3
- Inception v4

First Attention Solution :
Convolutional Implementation of Sliding Windows : The core idea is CNN must recognize things invariant to position shifts.
1 x 1 Convolution : Introduced inplace inter-channel information exchange.
Triplet Loss : Combined differential learning and hard example mining.
Highway Networks : Must read before ResNet. Introduced branching schemes to accelerate deep learning training process.
Dilate Convolution: Introduced more effective method for enlarging receptive field.
ResNet : Branching scheme with standardized implementation (18/34/50/101), combinations of Conv3x3 and Conv1x1
DenseNet : Introduced distillation idea in Conv Neural Networks.
ResNeXt : A tradeoff between a sparse MobileNet and a dense ResNet.
MobileNets : Efficient on some mobile devices. Introduced Depthwise Separable Conv which is very sparse. Save space for model parameters to the extreme. No saving for infer-time feature map.
SqueezeNet : Introduced attention mechanism vertical to image.
Wide ResNet : Ablation study for changing channel sizes.
R-FCN : Introduced 3x3 pixel shuffler.
Deformable Convolutional Networks
Deep Neural Networks for Object Detection
Glow : Introduced Invertible 1x1 Convolutions to save parameters in Encoder/Decoder , relying on PixelShuffler.

Detection

R-CNN Original
- R-CNN
- Fast R-CNN
Kaiming He Series
- SPP Net : Introduced Pyramid like conventional SIFT.
- Improved R-CNN
  - Faster R-CNN
  - Mask R-CNN : Introduced segmentation after ROI-Align. Not efficient on AI chip.
  - TensorMask
Jia Deng Series
- Stacked Hourglass Networks : Recombination of ResNet. Achieved SOTA using hourglass104.
- CornerNet
YOLO Series
- YOLO : Deal Classification problem using coarse segmentation.
- YOLO9000 : Yolov2. Better, Stronger, Faster. Introduced Darknet architecture using less Conv1x1. Introduced label tricks.
- YOLOv3 : Introduced unsupervised clustering in RPN/NMS stage.
Segmentation is All You Need : Introduced Segmentation methodology for detection task.

Segmentation

Fully Convolutional Networks : Pixelwise classification as Segmentation.
UNet : Introduced spatial features extraction and restorations. Backbone of many works like image compression/imputations/segmentation. Ideas might be inspired by MPEG4 rev.11 i.e. H264.
Pixel Shuffler
DeepLab ,DeepLab v2 and DeepLab v3
FPN
STS and STS++

Optical Flow

FlowNet and FlowNet2.0 Introduced temporal features extraction. Backbone of many works based on video understanding. Ideas might be inspired by MPEG4 rev.11 i.e. H264.
SelFlow

Unsupervised Methods

Loss Function

ArcFace : A final human face recognition paper combines sphereface idea and different order loss margins (Order 0,1,2 are hyper parameters)

Pose Estimation

Convolution Pose Machines :
OpenPose + PAF : The core idea is to predict directed vectors in between keypoints to form a feature map (PAF) thus one can join KP to different instances in a bottom-up way.

So what is NEXT?

A much more robust way to deal with larger/smaller object.
Beyond the invariance to shift/mirroring, a much more decent way to implement invariance to rotation.
A "1-for-all" attention mechanism.

Optimization

GAN

Transfer Learning

Deep Representations

Audio Processing

Quantization

Tricks

Dropout
Batch Normalization : Deal with large scale dynamic range of features.
No More Pesky Learning Rates
Bag of Tricks for CV : Tricks about how to train a model with higher cost performance.
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
LARS : How to train a model with very large batch size.
SNIPER: Efficient Multi-Scale Training
Learning Data Augmentation Strategies for Object Detection

bytetok/SnowSelect