simonpoole/mapsplit

Inform user when hashmap is full

Closed this issue · 3 comments

Running mapsplit on a 971MB file (pbf of The Netherlands) doesn't seem to finish:

$ ulimit -n
4096
$ mapsplit/mapsplit -v --fd-max=4096 `pwd`/netherlands-latest.osm.pbf `pwd`/out
No datefile given. Writing all available tiles.
Reading: ~/t/netherlands-latest.osm.pbf
Writing: ~/t/out
3000000 nodes processed
6000000 nodes processed
9000000 nodes processed
12000000 nodes processed
15000000 nodes processed
18000000 nodes processed
21000000 nodes processed
24000000 nodes processed
27000000 nodes processed
30000000 nodes processed
33000000 nodes processed
36000000 nodes processed
39000000 nodes processed
42000000 nodes processed
45000000 nodes processed
48000000 nodes processed
51000000 nodes processed
54000000 nodes processed
57000000 nodes processed
^C

Whatever I try, after 57M nodes, mapsplit seems to hang (I waited for 5+ hours).
Increasing max open files to 65536 didn't seem to make a difference.

Is there anything I can do to get this to work?

PedaB commented

Hi, sorry for the late reply. As I used Java but tried to be memory efficient at the same time I wrote my own little hashmap implementation with a collision system that uses the next empty bucket if the actual bucket is already in use. I.e. I don't have a list implementation for used buckets as it would be the case with a normal HashMap. That again means my hashmap can fill up to occupy 100% of the preallocated space and I don't have a way to increase the hashmap's size. That being said it also explains your problem: the hashmap is full, you get an infinite loop (which is a bug, you should get notified).

The work around is easy: You can set the size of the hashmap at startup. You can do this with the '-s' switch and three numbers for the nodes', the ways' and the relations' hashmap size. For good performance you should use about 1.5-2 times the size of the number of nodes in your extract. I don't have the numbers at hand but -s=200000000,20000000,2000000 might work.

I'll leave this issue open as I should include a way to determine a full hashmap and inform the user about it :-)

Thanks for the tip, I'll try again with the argument. Good idea to detect this and give a hint 👍

Fixed in 9eb9879