Loading Platform/Scenario from large local database uses too much RAM/CPU/time
SiddharthJoshi-Git opened this issue · 7 comments
Hi,
I am experiencing slowdown in the following tasks when using IXMP:
1.) when loading IXMP platform, the load process gets slower with each new scenario/ sensitivity runs. Currently the DB size on my system is 20gb with platform load time >15 mins and upto 40 mins. The RAM usage goes upto 18 GB with CPU pinned at 80-100% during this load process. Mitigation can be observed when using a fresh DB for IXMP where the load times are <1min and RAM usage <5 GB.
2.) When loading the model from the IXMP platform and making changes to it eg. adding new technologies. Here the processor is panned at again 80-100% and ram usage goes upto 40 GB (manual allocation to manipulate Global Model without OOM error) taking around 20 mins. When doing the same task on a fresh DB time taken is <7 mins with RAM usage <15 GB.
Additional Note: I always use mp.close_db() to tidy up the IXMP DB after each run. I am using a high speed SSD with 64 GB RAM and 12 Core i9 CPU (all cores 4.2 GHz) for these tests.
Best,
Sid
Thanks for the detailed report!
I also experience similar issues, though not as extreme: on the order of 10s of seconds waiting for a local platform to load.
These performance issues arise from the Java code underlying ixmp's JDBCBackend—either the code itself (which is in a private repository called ixmp_source) or the libraries it uses, e.g. HyperSQL for local databases. This code is not actively maintained. Work has started to add new, pure-Python backend to ixmp (addressing #78), but I am not aware of a roadmap or target timing to release it.
One of the goals for that new backend should be to provide "good" or "acceptable" performance, including for the items you mention. These should be quantified, e.g. “Loading a platform with A data points in B scenarios should use less than C GB of RAM and take less than D seconds on such-and-such hardware.” See also #215, which is about specifying and confirming performance metrics.
For both items (1) and (2), what would you consider acceptable performance for your usage of ixmp?
Acceptable Performance philosophy:
The current setup works great if we start with fresh DB and only run the final scenarios from the DB. This would entail that we do our tests in a separate DB and only commit the final changes in the main DB. Now, I think this is bit tough to manage so here is what I think an acceptable usage should be:
Item 1: initialising IXMP platform should be done within 2-5 mins as this task is performed whenever I launch a notebook and in case of errors in my scenario development. If I want to try coding for 10 different items like sensitivity on variables, history calibration, technology updating 5 mins of load time of IXMP*10 times opening the notebook = 50 mins spent waiting for the platform to load which is OK instead of 200+ mins currently. At the same time when I monitor energy draw , I am consuming 200+ watts for these 20 mins every time I load IXMP.
Short Answer: 2-5 mins instead of 20+ mins
Item 2: Here when I load prebuilt data via Excel sheet it takes 2-3 mins , but if I read a param from base scenario and manipulate it on fly I take 20-25 mins. This example entails me updating and loading new techs, creating new grid structure and doing historical calibration in a single go. My deduction is that reading variable, manipulating it on the fly and committing it adds in high penalty compared to just loading prebuilt structures from excel. Given that the total data we are manipulating is <50 mb, this much penalty should not occur (I can be wrong here as I have not used Java based workflows before).
Short Answer: 5-7 mins instead of 20+ minutes.
My experiences with DB is just limited to Tearadata and GIS (SQL/SQLITE/GPKG) datasets which do not incur such high penalties with increasing DB size. Maybe there is a tweak present inside IXMP that I am missing? [https://github.com//issues/41]
Excellent, thanks for sharing these expectations as a user, which are useful to developers. I agree that, given the performance of other software that works with much larger data, it should be possible for our new Python backend to meet or exceed these metrics. Again, the Java code will not be further developed, so we won't look too closely at why it's performing poorly.
Maybe there is a tweak present inside IXMP that I am missing?
There's none, no.
Thanks for Sharing this feedback. All the best with the python porting.
Let's keep it open until we've verified that the new backend meets these requirements.
iiasa/message_ix#617 reports a different error message that may have a similar cause:
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Python Reference Queue"
That occurs on a macOS system with 8 GB RAM.
Thanks Sid, for raising this issue and thank you Paul for your confidence in the pure-python implementation.
Maybe some comparative benchmarks and my findings while profiling both architectures serve to give some context here.
I found that the database overhead is very small, most time is spent in python or java (and significant performance penalties occur when context-switching between the two).
The pure-python implementation takes 0.007
seconds to initialize a platform where the current version takes ~10
seconds (on my laptop).
Here is a graph of a performance comparison I did a few months ago on a data set with 113k data points (current shown in blue, pure python in green):
Since the pure python version is still in development many features are not implemented yet (the most significant performance-wise being scenario version-control) these are not directly comparable, but I suspect that those features will incur only small and linear performance impacts.
hth