comic/evalutils

Docker container killed when given the huge size image, only on grand-challenge website.

hjoonjang opened this issue · 3 comments

  • evalutils version: 0.1.16 (the latest for now)
  • Python version: 3.6-slim (occurred when grand-challenge docker runs)
  • Operating System: Not sure (on the grand-challenge website evaluation)

Description

I was trying to generate a docker image for 'Segmentation' task, whose procedure is almost completely following the default. I expected that my docker image works normally no matter how big the given image (both ground-truth and prediction) would be.

What I Did

When I tried sudo ./test.sh on my machine, without any modification in test.sh, it just worked normally on my machine so that I could see the result below:
(I set up the evaluation with 1 csv file and 8 tif uint8 binary images)

...
Successfully built 1f6e4b4b6cac
Successfully tagged test02:latest
test02-output
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Could not find an int in the string 'reference_test'.
Could not find an int in the string 'submission_test'.
{
    "csv_aggregates": {
        "mean_absolute_error": 0.09999999999999981
    },
    "segmentation_aggregates": {
        "DiceCoefficient": {
            "25pc": 0.9531821766435399,
            "50pc": 0.9660170086759565,
            "75pc": 0.969093750441764,
            "max": 0.9796634396591047,
            "mean": 0.9426168819059304,
            "min": 0.786203060473665,
            "std": 0.06388317667108237
        },
        "JaccardCoefficient": {
            "25pc": 0.9105583729300138,
            "50pc": 0.9342738968892539,
            "75pc": 0.9400412422060231,
            "max": 0.9601375445488285,
            "mean": 0.8968605986169458,
            "min": 0.6477220652578578,
            "std": 0.10215224255890767
        }
    }
}
test02-output

I thought it must work on grand-challenge platform, hence, I uploaded my docker image tarball to a challenge test page. The uploading procedure was all normal, and I could see the 'Ready: True' sign. When I submitted my sample submission zip file, all things were the same as before; evaluating was normally started.

However, I got soon failed to evaluate and could see an output message 'Killed'.
result_killed_sample

(Before this upload, I'd uploaded another version of docker image for testing, which has the completely same codes of my latter erroneous version, but the only difference was sizes of ground-truth images. The container whose gt images are smaller just worked fine both on my machine's docker and on grand-challenge's docker. The smaller version has 64 times smaller images than those on the bigger version, in the meaning of pixel numbers, while --the erroneous-- bigger version has sizes at most shape=(61168, 79495). )

I guess this was caused by the memory limit settings when running docker run. But I couldn't understand why it worked on my machine, which executed docker run with same memory limit ( =4G, because I just ran the given test.sh. ) as the default memory limit of the grand-challenge's docker run.

In digital pathology field, I guess there can be much bigger images, such as Camelyon dataset you may know better. Therefore, I wonder how I can work around this problem. Is there any settled convention when a challenge manager tries to handle large images? And what is the exact pixel number I could expect to be failed? I first came up with resizing input images, but I don't think it is the best way because resizing also must require quite a bit of extra memory space.

The problem itself was simple, but I think my letters are not concise enough :( Anyway, I appreciate any advice from you.

Thank you,
Hyungjoon

Sorry that you've had problems with this, indeed the memory is limited. In another challenge, we work round this by using memory mapped files from numpy, here is an example:

from tempfile import TemporaryFile

import numpy as np
from evalutils import ClassificationEvaluation
from evalutils.io import ImageLoader
from evalutils.validators import (
    NumberOfCasesValidator, UniquePathIndicesValidator, UniqueImagesValidator
)
from imageio.plugins._tifffile import TiffFile


class TiffLoader(ImageLoader):
    @staticmethod
    def load_image(fname):
        return TiffFile(fname)

    @staticmethod
    def hash_image(image):
        return hash(image.filehandle.path)


def dice(im1, im2):
    """
    Calculates the dice coefficient between 2 images, using memory maps for
    intermediate storage.
    """
    if im1.shape != im2.shape:
        raise RuntimeError(
            f"Images do not have the same shape, you submitted an image with "
            f"shape {im2.shape} where we expected {im1.shape}."
        )

    if im1.dtype != im2.dtype != np.bool:
        raise RuntimeError(
            f"Images must have boolean type, you submitted images with type "
            f"{im2.dtype}."
        )

    with TemporaryFile() as f:
        intersection = np.memmap(f, dtype=np.bool, mode='w+', shape=im1.shape)
        np.logical_and(im1, im2, out=intersection)

        dice_coeff = 2.0 * intersection.sum() / (im1.sum() + im2.sum())

    return dice_coeff


class Acdclunghp_evaluation(ClassificationEvaluation):
    def __init__(self):
        super().__init__(
            file_loader=TiffLoader(),
            validators=(
                NumberOfCasesValidator(num_cases=1),
                UniquePathIndicesValidator(),
                UniqueImagesValidator(),
            ),
        )

    def score_case(self, *, idx, case):
        gt_path = case["path_ground_truth"]
        pred_path = case["path_prediction"]

        # Load the images for this case
        gt = self._file_loader.load_image(gt_path)
        pred = self._file_loader.load_image(pred_path)

        # Check that they're the right images
        assert self._file_loader.hash_image(gt) == case["hash_ground_truth"]
        assert self._file_loader.hash_image(pred) == case["hash_prediction"]

        dice_coeff = dice(
            gt.asarray(out='memmap'),
            pred.asarray(out='memmap')
        )

        return {
            'DiceCoefficient': dice_coeff,
        }


if __name__ == "__main__":
    Acdclunghp_evaluation().evaluate()

Thank you for kindly giving an example. I've just tried modifying my implementation referring to the exampled workaround, and it works as expected. Then, isn't there any storage limit per container? (as opposed to the memory limit per container)
now working sample2

Plus, what I missed before was the fact that the swap memory had worked with 4G-limited physical memory when I tried to run docker on my machine. That was why it had been okay on my side. I've just found the case with 4G physical and 22G swaps. (how funny though..)

Anyway, the issue solved. I deeply appreciate your help again.