maxmind/mmdbwriter

MMDB writer consuming a lot of memory

Opened this issue · 3 comments

Hi,

I have been using the mmdbwriter package in golang to insert some records into an MMDB file. I observed that the script to insert records into MMDB consumes a lot of memory. I did some memory profiling using pprof and I found that a couple of MMDB writer functions were consuming most of the memory. I have attached a screenshot below for reference.

Screenshot 2024-07-22 at 6 15 53 PM

Screenshot 2024-07-22 at 6 17 56 PM

Screenshot 2024-07-22 at 6 18 37 PM

These functions consume around 600MB each whereas the size of the final MMDB file was 146MB. Overall, the program consumed around 3.8GB for producing a file of 146MB. I think these functions, specially the Map.Copy(), stores the records in-memory and is not garbage collected since their references are still in use. This profiling is done just before the writer writes the MMDB file on disk.

Here's how I have defined the MMDB writer:

writer, err := mmdbwriter.New(
	mmdbwriter.Options{
		DatabaseType:            "V1",
		IncludeReservedNetworks: true,
		RecordSize:              32,
	},
)

I am using the DeepMergeWith inserter to insert the MMDB records

It is expected that the writer will use a fair bit of memory. You don't provide any information on how you are using the writer, but in terms of what you have provided:

  • Map.Copy - this suggests you are using one of the merging inserter functions. You can likely reduce your memory usage by either using inserter.ReplaceWith or writing your own inserter function. Implementing your own function allows for much more efficient merging as you know what the records look like and how they might change. We only use the pre-defined functions internally for the simplest of cases. The default functions probably do have room for improvement, but that has not been a priority as we rarely use them.
  • insert - excluding the inserter function, the remainder of this is the in-memory representation of the tree. This will be much larger than the on-disk representation will be between 24-32 bits for a record and the in-memory representation would be over an order of magnitude more. Although this could be improved somewhat, it would likely come at the cost of slowing down reading and writing to the tree and making the code more complex.

Thanks. Yes you are right, I am using the DeepMergeWith inserter. Updated the comment and added details about how I have defined the writer.

Looking at the code, I think it would be possible to get rid of the Copy in DeepMergeWith and more carefully allocate a new map only when needed. It is hard to know if this would significantly impact your memory usage as it would largely depend on the structure of your data and how it is modified on insert.

We don't have a single internal use of DeepMergeWith. I don't know if this is a change that we are likely to work on given that.