
68th place solution in Kaggle Humpback Whale Identification.

Humback Whale Identification

Some best soluton

rank solution github author keyword
1th 1th Place Solution Github code earhian classification
3rd 3rd Place Solution Github pudae ArcFace
4th 4th Place Solution Github code David SIFT+Siamese
7th 7th Place Solution Github code old-ufo classification
9th 9th Place Solution Github code lvan Sosin GapNet
25th 25th Place Solution Github code Bartek CosFace+ProtoNets
31st 31st Place Solution Github code Khoi Nguyen RGB
57th 57th Place Solution Github code Miguel Pinto SoftTripletLoss

My solution

Heavily based on Whale Recognition Model with score 0.78563


  • Framework: Keras(backend: tensorflow)
  • Model: Siamese(CNN+Metric Learning)
  • Augmentation: slight(otation, shear, height_zoom, width_zoom, height_shift, width_shift)
  • Preprocess: rotate some special images, convert grayscale,get bounding boxs, affine tranformation
  • Optimizer: Adam
  • Learning rate: start at 64e-5, and 4 times less training per epoch group
  • Image size: 512*512
  • Epochs: 400 or more
  • Batch size: 32


  • Threshold: 0.99 and 0.94 with bootstrapping
  • TTA number: 4
  • TTA augmentaion: random slight: (rotation, shear, height_zoom, width_zoom, height_shift, width_shift)


  • Training takes about more than 80 hours on GTX 1080TI without pretrained state-of-art model
  • Public LB: 0.92248
  • Private LB: 0.92761

Mode result ensemble:

  • Ensemble of ensemble is not feasible, but ensemble is very effective
  • If single model is selected as far as possible for fusion, the effect is better, but the model difference is large, so the fusion effect is better. The fusion effect of models with similar Epochs is not as good as that with large difference
  • The ensemble of tta*4 + original result is effective

ensemble code

# coding:utf-8
# filename:ensemble.py
# function:模型识别结果融合程序,融合4个最好的结果

import csv
sub_files = [


# Weights of the individual subs
sub_weight = [
            0.883 ** 2,
            0.884 ** 2,
            0.901 ** 2,
            0.905 ** 2,
            0.908 ** 2,
            0.912 ** 2]
Hlabel = 'Image'
Htarget = 'Id'
npt = 5 # number of places in target
place_weights = {}
for i in range(npt):
    place_weights[i] = (1 / (i + 1))
lg = len(sub_files)
sub = [None] * lg
for i, file in enumerate(sub_files):
    ## input files ##
    print("Reading {}: w={} - {}".format(i, sub_weight[i], file))
    reader = csv.DictReader(open(file, "r")) # 将csv文件数据读入到字典中
    sub[i] = sorted(reader, key=lambda d: str(d[Hlabel]))
## output file ##
out = open("./submissions/submission_ensemble_zh.csv", "w", newline='')
writer = csv.writer(out)
writer.writerow([Hlabel, Htarget])
for p, row in enumerate(sub[0]):
    target_weight = {}
    for s in range(lg):
        row1 = sub[s][p]
        for ind, trgt in enumerate(row1[Htarget].split(' ')):
            target_weight[trgt] = target_weight.get(trgt, 0) + (place_weights[ind] * sub_weight[s])
    tops_trgt = sorted(target_weight, key=target_weight.get, reverse=True)[:npt]
    writer.writerow([row1[Hlabel], " ".join(tops_trgt)])

My conclusion


  • Large image size helps a lot
  • ensemble is useful, but correct ensemble strategy is more useful
  • TTA maybe help, but ensemble of tta must be help
  • Put all images into SSD faster than HDD in training
  • training more epochs helps a lot
  • bootstrapping helps, but it need more time to train

Don't work

  • pure classition don't work, but if you do some extra works,classition maybe very useful, such as this 1thsolution
  • n-fold CV: my parteners have tried 5-fold CV, but it dont't work, maybe our ways have some problem, but i dont see n-fold CV as solution in Kaggle Dissussion


  • Grayscale images are not necessarily more effective than RGB



Hardware requirements
  • GTX1060, GTX1080TI better
  • 32GB Memory
  • SSD

Software requirments

  • Ubuntu 18.04
  • Anaconda3/Python3
  • Keras(backend: tensorflow

Steps for usage

  • 1.clone the repository
git https://github.com/HarleysZhang/kaggle_humpback_whale_identification.git
cd kaggle_humpback_whale_identification
  • 2.install requirements
pip3 install -r requirements.txt
  • 3.download data and copy it to data folder
kaggle competitions download -c humpback-whale-identification
cp train ./data/
cp test ./data/
cp train.csv ./data/
cp sample_submission.csv ./data/
  • 4.train your model without bootstrapping
python3 main_all.py

with bootstrapping

python3 main_with_bootstrapping.py
  • 5.ensemble submission file
python test.py
# python test_tta.py    # with tta

Some Code Interpretation

Build a transformation matrix with the specified characteristics.

def build_transform(rotation, shear, height_zoom, width_zoom, height_shift, width_shift):
    Build a transformation matrix with the specified characteristics.
    rotation = np.deg2rad(rotation)
    shear = np.deg2rad(shear)
    rotation_matrix = np.array(
        [[np.cos(rotation), np.sin(rotation), 0], [-np.sin(rotation), np.cos(rotation), 0], [0, 0, 1]])
    shift_matrix = np.array(
        [[1, 0, height_shift], [0, 1, width_shift], [0, 0, 1]])
    shear_matrix = np.array(
        [[1, np.sin(shear), 0], [0, np.cos(shear), 0], [0, 0, 1]])
    zoom_matrix = np.array(
        [[1.0 / height_zoom, 0, 0], [0, 1.0 / width_zoom, 0], [0, 0, 1]])
    shift_matrix = np.array(
        [[1, 0, -height_shift], [0, 1, -width_shift], [0, 0, 1]])
    return np.dot(np.dot(rotation_matrix, shear_matrix), np.dot(zoom_matrix, shift_matrix))

Compute the score matrix by scoring every pictures from the training set against every other picture O(n^2) with multithreads.

def compute_score(verbose=1):
    Compute the score matrix by scoring every pictures from the training set against every other picture O(n^2).
    features = branch_model.predict_generator(
        FeatureGen(train, batch_size=64, verbose=verbose),
        max_queue_size=12, workers=6, verbose=0)
    num_threads = 6
    batch = features.shape[0] // (num_threads - 1)
    if features.shape[0] % batch <= 3:
        num_threads = 5
        if features.shape[0] % batch is not 0:
            batch += 1
    all_score = []
    for start in range(0, features.shape[0], batch):
        end = min(features.shape[0], start + batch)
        temp_features = features[start:end, :]
        temp_score = head_model.predict_generator(
            ScoreGen(temp_features, batch_size=2048, verbose=verbose),
            max_queue_size=12, workers=6, verbose=0)
        temp_score = score_reshape(temp_score, temp_features)
    score = np.zeros((features.shape[0], features.shape[0]), dtype=K.floatx())
    for i, start in enumerate(range(0, features.shape[0], batch)):
        end = min(features.shape[0], start + batch)
        score[start:end, start:end] = all_score[i]
    return features, score

sompute Linear programming problem with multithreads

def my_lapjv(score):
    num_threads = 6
    batch = score.shape[0] // (num_threads - 1)
    if score.shape[0] % batch <= 3:
        num_threads = 5
        if score.shape[0] % batch is not 0:
            batch += 1
    # print(batch)
    tmp = num_threads * [None]
    threads = []
    thread_input = num_threads * [None]
    thread_idx = 0
    for start in range(0, score.shape[0], batch):
        end = min(score.shape[0], start + batch)
        # print('%d %d' % (start, end))
        thread_input[thread_idx] = score[start:end, start:end]
        thread_idx += 1

    def worker(data_idx):
        x, _, _ = lapjv(thread_input[data_idx])
        tmp[data_idx] = x + data_idx * batch

    # print("Start worker threads")
    for i in range(num_threads):
        t = threading.Thread(target=worker, args=(i,), daemon=True)
    for t in threads:
        if t is not None:
    x = np.concatenate(tmp)
    # print("LAP completed")
    return x


