numerai/example-scripts

The example_model.py file grinds my laptop to a hold on a 16RAM memory hardware

Raynos opened this issue · 1 comments

I have a reasonable recent laptop.

The example_model maxed out all 16gb of my RAM and used 9gb of swap and my whole laptop was unusable for anything else.

Is there a way to run the program and limit it's RAM usage to 8gb or something where I can continue to use my laptop for browsing or code editing experiences whilst having the model be trained ?

Or should the minimal system requirements be bumped to 32gb of RAM ?

Could the parquet files be read and written from in a format key value db like lmdb or rocksdb to reduce the reliance on having to upgrade my laptop from 16gb of ram to 32gb of ram ?

Alternatively should we add instructions on how to SSH into an EC2 allocated with 32gb of RAM for the purposes of running the example scripts ?

Laptop overview: ( 6 core i7 @ 2.6ghz, 16gb ram, 256gb SSD )

image

There are some things you can do, to optimize Memory usage:

  • Only read features / targets you really want to work with. If you don't use all of them, it might be worth a try to store the list of features / targets, that you are actually using.
  • Consider downcasting the data types or work with integer dataset. You can usually downcast the dtypes to float16 or int8 without loosing noticably precision.
  • Take a closer look at parquet functions. There are a few things you can consider. You can read data in smaller batches and work with it (e.g. downcast datatypes, or train your model iteratively). Also there is a filter parameter, which you can use (e.g. only read every X era, or other conditions)
  • As mentioned, train your model iteratively. If your type of model allows it you can do the training in iterations. This not only helps with memory usage, if you build your pipeline that way, you can easily include additional training data to your model as more data is given every week
  • Try to optimize garbage collection usage and avoid copies if you dont need them. Always think about what objects you really need and when / how long you need them.

This can help for memory usage for sure, but some of these strategies might have a tradeoff in time consumtion (read in batches) or prediction performance (downcasting dtypes)