VGG16 scoring model taking up > 500 GB of RAM

Question

VGG16 scoring model taking up > 500 GB of RAM

Opened this issue 2 months ago · 6 comments

Hi,

I was trying to score a VGG16 model on our own cluster, i.e. by running a local instance
When I use the following layer names

['avgpool' , 'features', 'classifier']

i.e. more high level layers the RAM consumption seems to be OK and works fine
However, when I go to the detailed level layer names

['features.1', 'features.2', 'features.3', 'features.4', 'features.5', 'features.6', 'features.7', 'features.8', 'features.9', 'features.10', 'features.11', 'features.12', 'features.13', 'features.14', 'features.15', 'features.16', 'features.17', 'features.18', 'features.19', 'features.20', 'features.21', 'features.22', 'features.23', 'features.24', 'features.25', 'features.26', 'features.27', 'features.28', 'features.29', 'features.30', 'classifier.0', 'classifier.1', 'classifier.2', 'classifier.3', 'classifier.4', 'classifier.5', 'classifier.6']

It takes computational space > 500 GB of RAM and runs OOM
The thing is with the high level layers
my scores are

    "imagenet_trained": {
        "V4": "0.3685106454430227",
        "IT": "0.5185169380743393",
        "V1": "0.09256884647589658",
        "V2": "0.2600441204932774"
    }

in V1 the score for no training is higher than imagenet trained, which is a weird effect since
the weights are random. I know sometimes a random weight could also just match because
of a statistical artefact, but this occurs in 2 iterations

    "no_training": {
        "V4": "0.3413502290434312",
        "IT": "0.2947047868783302",
        "V1": "0.2026004427555423",
        "V2": "0.1448800686541028"
    }

    "no_training_2": {
        "V4": "0.33954039465787034",
        "IT": "0.29491768114613165",
        "V1": "0.1974565275931902",
        "V2": "0.15089219267469867"
    }

I am using the following public benchmarks for scoring my model

  benchmark_identifiers = ['MajajHong2015public.V4-pls', 'MajajHong2015public.IT-pls', 
                             'FreemanZiemba2013public.V1-pls', 'FreemanZiemba2013public.V2-pls']

Any help would be gladly appreciated

Best regards,
Shreya

Answer 1 · 2024-10-09T13:51:46.000Z

Hi Shreya, thanks for opening an issue! Usually in Brain-Score, earlier layers of the model are more computationally expensive (RAM-wise) to score, as they tend to be much bigger than later model layers. Also, it could be the case that for VGG16, the more granular layers are bigger themselves, or are full convolutional layers as opposed to perhaps a pooling or relu layer (I am not entirely sure here, as I would need a refresher on VGG16 architecture). As for the issue of random weights scoring higher, I am linking @mschrimpf in who may be able more scientific insight as to what might be occurring.

Answer 2 · 2024-10-09T20:52:35.000Z

Hi Shreya, thanks for opening an issue! Usually in Brain-Score, earlier layers of the model are more computationally expensive (RAM-wise) to score, as they tend to be much bigger than later model layers. Also, it could be the case that for VGG16, the more granular layers are bigger themselves, or are full convolutional layers as opposed to perhaps a pooling or relu layer (I am not entirely sure here, as I would need a refresher on VGG16 architecture). As for the issue of random weights scoring higher, I am linking @mschrimpf in who may be able more scientific insight as to what might be occurring.

Dear Mike,

Thanks a lot for your answer! It is really helpful, yes, more layers indeed lead to higher computational complexity. I just wanted to know how could an untrained network
match super well. Guess that will fill the gap

Best regards,
Shreya

Answer 3 · 2024-11-08T09:44:47.000Z

Hi @mike-ferguson
In such a case do people usually compute the brainscore on the convolutional layers only?

Best regards,
Shreya

Answer 4 · 2024-11-12T14:39:25.000Z

Hi @ShreyaKapoor18, sorry for the late reply, I was Out of Office for a couple of days; as to your question, I think it depends! For the most part I think people usually do like the conv layers when hand-selecting/passing in layers to score, just because they align nicely with the conceptual framework. However, I have also seen pooling layers and even relu/activation function layers be passed in manually as well. I do not know off the top of my head which tend to have better alignment, and I am sure Martin/others have looked into this, but I think passing in all conv layers is a reasonable thing to do!

Answer 5 · 2024-11-12T15:29:15.000Z

Hi Mike,

Thanks for your reply. I did just that, passing only the conv layers but am still running OOM.
It's just a bit confusing what the state of the art is for comparing.
For other methods I am using to align the networks to the brain usually it is not so computationally expensive and I am not able to compare brainscore results to these networks. I guess it is an open question.

Best regards,
Shreya Kapoor

Answer 6 · 2024-11-12T15:46:16.000Z

@ShreyaKapoor18 Gotcha- have you tried submitting recently on our website? If you do that, I should be able to see the logs and troubleshoot and see what exactly is eating up so much memory. We also have a new procedure to map layers that should drastically cut down on RAM usage, but it is only available through our submitting through our site (or a PR) at the moment (still working on deploying that fix to local scoring schemas)