
A collection of useful modules and utilities (especially helpful for kaggling) not available in Pytorch

Pytorch Zoo

A collection of useful modules and utilities (especially helpful for kaggle) not available in Pytorch

pytorch_zoo can be installed from pip

pip install pytorch_zoo



Sending yourself notifications when your models finish training

IFTTT allows you to easily do this. Follow https://medium.com/datadriveninvestor/monitor-progress-of-your-training-remotely-f9404d71b720 to setup an IFTTT webhook and get a secret key.

Once you have a key, you can send yourself a notification with:

from pytorch_zoo.utils import notify

message = f'Validation loss: {val_loss}'
obj = {'value1': 'Training Finished', 'value2': message}

notify(obj, [YOUR_SECRET_KEY_HERE])

Viewing training progress with tensorboard in a kaggle kernel

Make sure tensorboard is installed in the kernel and run the following in a code cell near the beginning of your kernel:

!mkdir logs
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip -o ngrok-stable-linux-amd64.zip
LOG_DIR = './logs'
    'tensorboard --logdir {} --host --port 6006 &'
get_ipython().system_raw('./ngrok http 6006 &')

!curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

temp = !curl -s http://localhost:4040/api/tunnels | python3 -c "import sys,json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

from pytorch_zoo.utils import notify

obj = {'value1': 'Tensorboard URL', 'value2': temp[0]}
notify(obj, [YOUR_SECRET_KEY_HERE])

!rm ngrok
!rm ngrok-stable-linux-amd64.zip

This will start tensorboard, set up a http tunnel, and send you a notification with a url where you can access tensorboard.


A dynamic batch length data sampler. To be used with trim_tensors.

Implementation adapted from https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/94779 and https://github.com/pytorch/pytorch/blob/master/torch/utils/data/sampler.py

train_dataset = data.TensorDataset(data)
sampler = data.RandomSampler(train_dataset)
sampler = DynamicSampler(sampler, batch_size=32, drop_last=False)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_sampler=len_sampler)

for epoch in range(10):
    for batch in train_loader:
        batch = trim_tensors(batch)

sampler (torch.utils.data.Sampler): Base sampler.
batch_size (int): Size of minibatch.
drop_last (bool): If True, the sampler will drop the last batch if its size would be less than batch_size.

Trim padding off of a batch of tensors to the smallest possible length. To be used with DynamicSampler.

Implementation adapted from https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/94779

train_dataset = data.TensorDataset(data)
sampler = data.RandomSampler(train_dataset)
sampler = DynamicSampler(sampler, batch_size=32, drop_last=False)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_sampler=len_sampler)

for epoch in range(10):
    for batch in train_loader:
        batch = trim_tensors(batch)

tensors ([torch.tensor]): list of tensors to trim.

([torch.tensor]): list of trimmed tensors.


The binary Lovasz Hinge loss for semantic segmentation.

Implementation adapted from https://github.com/bermanmaxim/LovaszSoftmax

loss = lovasz_hinge(logits, labels)

logits (torch.tensor): Logits at each pixel (between -\infty and +\infty).
labels (torch.tensor): Binary ground truth masks (0 or 1).
per_image (bool, optional): Compute the loss per image instead of per batch. Defaults to True.


  • Input:
    • logits: (batch, height, width)
    • labels: (batch, height, width)
  • Output: (batch)

(torch.tensor): The lovasz hinge loss

The dice loss for semantic segmentation

Implementation adapted from https://www.kaggle.com/soulmachine/siim-deeplabv3

criterion = DiceLoss()
loss = criterion(logits, targets)


  • Input:
    • logits: (batch, *)
    • targets: (batch, *) same as logits
  • Output: (1)

(torch.tensor): The dice loss



The channel-wise SE (Squeeze and Excitation) block from the Squeeze-and-Excitation Networks paper.

Implementation adapted from https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/65939 and https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/66178

# in __init__()
self.SE = SqueezeAndExcitation(in_ch, r=16)

# in forward()
x = self.SE(x)

in_ch (int): The number of channels in the feature map of the input.
r (int): The reduction ratio of the intermidiate channels. Default: 16.


  • Input: (batch, channels, height, width)
  • Output: (batch, channels, height, width) (same shape as input)

The sSE (Channel Squeeze and Spatial Excitation) block from the Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks paper.

Implementation adapted from https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/66178

# in __init__()
self.sSE = ChannelSqueezeAndSpatialExcitation(in_ch)

# in forward()
x = self.sSE(x)

in_ch (int): The number of channels in the feature map of the input.


  • Input: (batch, channels, height, width)
  • Output: (batch, channels, height, width) (same shape as input)

The scSE (Concurrent Spatial and Channel Squeeze and Channel Excitation) block from the Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks paper.

Implementation adapted from https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/66178

# in __init__()
self.scSE = ConcurrentSpatialAndChannelSqueezeAndChannelExcitation(in_ch, r=16)

# in forward()
x = self.scSE(x)

in_ch (int): The number of channels in the feature map of the input.
r (int): The reduction ratio of the intermidiate channels. Default: 16.


  • Input: (batch, channels, height, width)
  • Output: (batch, channels, height, width) (same shape as input)

A gaussian noise module.

# in __init__()
self.gaussian_noise = GaussianNoise(0.1)

# in forward()
if self.training:
    x = self.gaussian_noise(x)

stddev (float): The standard deviation of the normal distribution. Default: 0.1.


  • Input: (batch, *)
  • Output: (batch, *) (same shape as input)


Pytorch's cyclical learning rates, but for momentum, which leads to better results when used with cyclic learning rates, as shown in A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay.

optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
scheduler = torch.optim.CyclicMomentum(optimizer)
data_loader = torch.utils.data.DataLoader(...)
for epoch in range(10):
    for batch in data_loader:

optimizer (Optimizer): Wrapped optimizer.
base_momentum (float or list): Initial momentum which is the lower boundary in the cycle for each param groups. Default: 0.8
max_momentum (float or list): Upper boundaries in the cycle for each parameter group. scaling function. Default: 0.9
step_size (int): Number of training iterations per half cycle. Authors suggest setting step_size 2-8 x training iterations in epoch. Default: 2000
mode (str): One of {triangular, triangular2, exp_range}. Default: 'triangular'
gamma (float): Constant in 'exp_range' scaling function. Default: 1.0
scale_fn (function): Custom scaling policy defined by a single argument lambda function. Mode paramater is ignored Default: None
scale_mode (str): {'cycle', 'iterations'}. Defines whether scale_fn is evaluated on cycle number or cycle iterations (training iterations since start of cycle). Default: 'cycle'
last_batch_iteration (int): The index of the last batch. Default: -1


Send a notification to your phone with IFTTT

Setup a IFTTT webhook with https://medium.com/datadriveninvestor/monitor-progress-of-your-training-remotely-f9404d71b720

notify({'value1': 'Notification title', 'value2': 'Notification body'}, key=[YOUR_PRIVATE_KEY_HERE])

obj (Object): Object to send to IFTTT
key ([type]): IFTTT webhook key

Set random seeds for python, numpy, and pytorch to ensure reproducible research.


seed (int): The random seed to set.

Prints the amount of GPU memory currently allocated in GB.

gpu_usage(device, digits=4)

device (torch.device, optional): The device you want to check. Defaults to device.
digits (int, optional): The number of digits of precision. Defaults to 4.

Return the number of parameters in a pytorch model.


model (nn.Module): The model to analyze.

(int): The number of parameters in the model.

Save a trained pytorch model on a particular cross-validation fold to disk.

Implementation adapted from https://github.com/floydhub/save-and-resume.

save_model(model, fold=0)

model (nn.Module): The model to save.
fold (int): The cross-validation fold the model was trained on.

Load a trained pytorch model saved to disk using save_model.

model = load_model(model, fold=0)

Arguments: model (nn.Module): The model to save.
fold (int): Which saved model fold to load.

(nn.Module): The same model that was passed in, but with the pretrained weights loaded.

Save an object to disk.

save(tokenizer, 'tokenizer.pkl')

obj (Object): The object to save.
filename (String): The name of the file to save the object to.

Load an object saved to disk with save.

tokenizer = load('tokenizer.pkl')

path (String): The path to the saved object.

(Object): The loaded object.

A masked softmax module to correctly implement attention in Pytorch.

Implementation adapted from: https://github.com/allenai/allennlp/blob/master/allennlp/nn/util.py

out = masked_softmax(logits, mask, dim=-1)

vector (torch.tensor): The tensor to softmax.
mask (torch.tensor): The tensor to indicate which indices are to be masked and not included in the softmax operation.
dim (int, optional): The dimension to softmax over. Defaults to -1.
memory_efficient (bool, optional): Whether to use a less precise, but more memory efficient implementation of masked softmax. Defaults to False.
mask_fill_value ([type], optional): The value to fill masked values with if memory_efficient is True. Defaults to -1e32.

(torch.tensor): The masked softmaxed output

A masked log-softmax module to correctly implement attention in Pytorch.

Implementation adapted from: https://github.com/allenai/allennlp/blob/master/allennlp/nn/util.py

out = masked_log_softmax(logits, mask, dim=-1)

vector (torch.tensor): The tensor to log-softmax.
mask (torch.tensor): The tensor to indicate which indices are to be masked and not included in the log-softmax operation.
dim (int, optional): The dimension to log-softmax over. Defaults to -1.

(torch.tensor): The masked log-softmaxed output


This repository is still a work in progress, so if you find a bug, think there is something missing, or have any suggestions for new features or modules, feel free to open an issue or a pull request.


  • Bilal Khan - Initial work


This project is licensed under the MIT License - see the license file for details


This project contains code adapted from:

This README is based on: