onthegomap/planetiler

[BUG] Proces killed after ~ 3 hours of running

Closed this issue · 7 comments

Describe the bug
Hi, I have clean install of Ubuntu 22.04, and latest .jar from repo. When I run
java -Xmx120g -jar planetiler.jar --download --area=planet in screen, it is killed after approx. 3 hours of runtime. I have tried it twice but with the same result. First run killed after 3:10, second run killed after 2:45.

To Reproduce
Steps to reproduce the behavior:
java -Xmx120g -jar planetiler.jar --download --area=planet

Screenshots
2:45:08 INF [mbtiles] - features: [ 325M 11% 1.7M/s ] 212G tiles: [ 10M 134k/s ] 9.1G cpus: 29.5 gc: 9% heap: 78G/128G direct: 18M postGC: 36G read(60% 54%) -> merge(53%) -> (4.6k/6.4k) -> encode(69% 84% 71% 85% 86% 83% 75% 73% 90% 67% 76% 72% 67% 84% 73% 85% 82% 73% 81% 67% 75% 87% 86% 84% 73% 65% 89% 72% 75% 89%) -> (6.4k/6.4k) -> write(69%) last tile: 12/1924/1875 (z12 46%) https://www.openstreetmap.org/#map=12/15.02969/-10.89844 2:45:18 INF [mbtiles] - features: [ 341M 11% 1.5M/s ] 212G tiles: [ 11M 45k/s ] 9.6G cpus: 29.9 gc: 8% heap: 27G/128G direct: 18M postGC: 16G read(51% 62%) -> merge(72%) -> (4.9k/6.4k) -> encode(74% 86% 85% 78% 79% 77% 81% 82% 83% 85% 77% 85% 76% 88% 75% 85% 79% 84% 81% 79% 82% 78% 81% 81% 80% 82% 83% 75% 75% 81%) -> (6k/6.4k) -> write(79%) last tile: 12/2069/1728 (z12 50%) https://www.openstreetmap.org/#map=12/27.05913/1.84570 2:45:28 INF [mbtiles] - features: [ 356M 12% 1.5M/s ] 212G tiles: [ 11M 25k/s ] 10G cpus: 29.6 gc: 11% heap: 27G/128G direct: 18M postGC: 19G read(71% 63%) -> merge(60%) -> (4.9k/6.4k) -> encode(86% 83% 86% 84% 78% 86% 84% 79% 82% 83% 87% 84% 85% 80% 86% 78% 80% 76% 81% 82% 83% 80% 88% 78% 71% 78% 79% 82% 79% 79%) -> (5.3k/6.4k) -> write(66%) last tile: 12/2150/1306 (z12 52%) https://www.openstreetmap.org/#map=12/54.47004/8.96484 2:45:38 INF [mbtiles] - features: [ 369M 12% 1.3M/s ] 212G tiles: [ 11M 17k/s ] 10G cpus: 29.6 gc: 13% heap: 38G/128G direct: 18M postGC: 21G read(52% 65%) -> merge(72%) -> (5.6k/6.4k) -> encode(83% 80% 80% 86% 84% 75% 83% 84% 84% 82% 82% 81% 84% 77% 77% 75% 82% 81% 84% 78% 79% 80% 82% 82% 76% 76% 79% 77% 84% 78%) -> (5.8k/6.4k) -> write(65%) last tile: 12/2206/1254 (z12 53%) https://www.openstreetmap.org/#map=12/57.04073/13.88672 2:45:48 INF [mbtiles] - features: [ 381M 12% 1M/s ] 212G tiles: [ 12M 19k/s ] 11G cpus: 29.8 gc: 12% heap: 26G/128G direct: 18M postGC: 24G read(60% 57%) -> merge(51%) -> (6.3k/6.4k) -> encode(86% 83% 83% 86% 80% 80% 86% 80% 81% 85% 83% 79% 86% 81% 79% 78% 83% 86% 84% 84% 81% 76% 84% 87% 82% 82% 81% 78% 83% 83%) -> (6.4k/6.4k) -> write(65%) last tile: 12/2274/2 (z12 55%) https://www.openstreetmap.org/#map=12/85.03594/19.86328 2:45:59 INF [mbtiles] - features: [ 392M 13% 1.1M/s ] 212G tiles: [ 12M 16k/s ] 11G cpus: 29.7 gc: 13% heap: 31G/128G direct: 18M postGC: 26G read(53% 61%) -> merge(21%) -> (5.9k/6.4k) -> encode(84% 75% 85% 86% 84% 84% 85% 85% 85% 85% 80% 83% 83% 83% 82% 83% 79% 77% 84% 81% 85% 85% 84% 84% 85% 86% 85% 85% 85% 83%) -> (6.4k/6.4k) -> write(55%) last tile: 12/2331/3150 (z12 56%) https://www.openstreetmap.org/#map=12/-69.09994/24.87305 Killed

Environment:

  • Hardware: AMD Ryzen™ 9 5950X, 16 Core Vermeer (Zen3) Simultaneous Multithreading Virtualization (AMD-V), 128 GB DDR4 ECC, 2 x 3.84 TB NVMe SSD
  • OS: Ubuntu 22.04
  • Java version and distribution: openjdk version "18.0.2-ea" 2022-07-19
    OpenJDK Runtime Environment (build 18.0.2-ea+9-Ubuntu-222.04)
    OpenJDK 64-Bit Server VM (build 18.0.2-ea+9-Ubuntu-222.04, mixed mode, sharing)
  • Maven version: [e.g. 3.8.1]

Additional context
With --area=planet it works normally.

wipfli commented

Can you try this once:

java -Xmx20g \
  -jar planetiler.jar \
  `# Download the latest planet.osm.pbf from s3://osm-pds bucket` \
  --area=planet --bounds=planet --download \
  `# Accelerate the download by fetching the 10 1GB chunks at a time in parallel` \
  --download-threads=10 --download-chunk-size-mb=1000 \
  `# Also download name translations from wikidata` \
  --fetch-wikidata \
  --mbtiles=output.mbtiles \
  `# Store temporary node locations at fixed positions in a memory-mapped file` \
  --nodemap-type=array --storage=mmap
wipfli commented

Planet.md says this is the right thing to do if you do not have 1.5x the memory of the input file

@wipfli is correct, you definitely need some different options when running on the planet. quickstart.sh sets up the recommended settings for you as well.

My guess is it used too much memory and the OS killed it... it's not obvious from the logs because the logs say it's only using 31G, but what can happen is that the JVM is allowed to use up to 120G because you set -Xmx120g and even though the heap shrunk to 31G, java hasn't released that extra back to the OS yet and during the mbtiles writing phase, sqlite requests quite a bit of memory outside the JVM heap.

Adding --storage=mmap and a smaller heap between -Xmx20g and -Xmx30g like the command @wipfli is the best way to fix it. That will let planetiler scale up to much larger input files without needing a bigger machine.

Another option is to tell the JVM to more aggressively release unused memory by using an option like -XX:MaxHeapFreeRatio=40 but I think the better route would be to just run the command Oliver suggested from planet.md.

Hi @wipfli and @msbarry. Thank you for your super fast response. I have tried your command @wipfli. This is new error from it:

0:51:37 INF [wikidata] - blocks: [ 39k 0/s ] nodes: [ 8.1B 0/s ] ways: [ 915M 0/s ] rels: [ 10M 0/s ] wiki: [ 1.6M 3.4k/s ] 351M
cpus: 0.1 gc: 0% heap: 10G/21G direct: 81k postGC: 2.6G
parse() -> pbf( -%) -> (0/94) -> filter( -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -% -%) -> (0/1M) -> fetch( -%)
0:51:37 INF [wikidata] - DONE fetched:1605286
0:51:37 INF [wikidata] - Finished in 9m47s cpu:2h28m37s gc:26s avg:15.2
0:51:37 INF [wikidata] - pbf 1x(40% 3m52s sys:1m40s wait:54s done:4m23s)
0:51:37 INF [wikidata] - filter 31x(44% 4m16s sys:1s wait:43s done:4m19s)
0:51:37 INF [wikidata] - fetch 1x(14% 1m25s sys:4s wait:8m9s)
0:51:40 INF - loaded from 1603876 mappings from /root/planetiler/data/sources/wikidata_names.json in 4s cpu:13s avg:3.7
0:51:40 DEB - ✓ 334G storage on / (/dev/md2) requested for read phase disk, 2T available
0:51:40 DEB - - 82G used for temporary node location cache
0:51:40 DEB - - 11G used for temporary multipolygon geometry cache
0:51:40 DEB - - 240G used for temporary feature storage
0:51:40 DEB - ✓ 361G storage on / (/dev/md2) requested for write phase disk, 2T available
0:51:40 DEB - - 240G used for temporary feature storage
0:51:40 DEB - - 120G used for mbtiles output
0:51:40 WAR - ❌️ 21G JVM heap requested for read phase, 21G available (increase JVM -Xmx setting)
0:51:40 WAR - - 21G used for temporary profile storage
0:51:40 DEB - ✓ 93G storage on / (/dev/md2) requested for read phase, 2T available
0:51:40 DEB - - 82G used for array node location cache (switch to sparsearray to reduce size)
0:51:40 DEB - - 11G used for multipolygon way geometries
Exception in thread "main" java.lang.IllegalStateException: Insufficient resources for read phase, use the --force argument to continue anyway:
21G JVM heap requested for read phase, 21G available
at com.onthegomap.planetiler.util.ResourceUsage.checkAgainstLimits(ResourceUsage.java:162)
at com.onthegomap.planetiler.Planetiler.checkMemory(Planetiler.java:601)
at com.onthegomap.planetiler.Planetiler.run(Planetiler.java:520)
at com.onthegomap.planetiler.basemap.BasemapMain.run(BasemapMain.java:51)
at com.onthegomap.planetiler.basemap.BasemapMain.main(BasemapMain.java:14)
at com.onthegomap.planetiler.Main.main(Main.java:50)

Try -Xmx25g

Tried -Xmx30g. Still running for 1 hour now. Will let you know:)

Finished! Thank you for your help:)

2:07:33 INF [mbtiles] - Finished in 55m36s cpu:26h4m54s gc:5m29s avg:28.1
2:07:33 INF [mbtiles] - read 2x(13% 7m27s sys:2m39s wait:43m28s done:6s)
2:07:33 INF [mbtiles] - merge 1x(16% 8m52s sys:5s wait:42m17s done:1s)
2:07:33 INF [mbtiles] - encode 30x(82% 45m20s sys:3s wait:4m2s done:1s)
2:07:33 INF [mbtiles] - write 1x(28% 15m27s sys:3m38s wait:35m45s)
2:07:33 INF - Finished in 2h7m34s cpu:59h38m16s gc:12m33s avg:28.1
2:07:33 INF - FINISHED!
2:07:33 INF -
2:07:33 INF - ----------------------------------------
2:07:33 INF - overall 2h7m34s cpu:59h38m16s gc:12m33s avg:28.1
2:07:33 INF - wikidata 4m59s cpu:2h25m2s gc:19s avg:29.2
2:07:33 INF - pbf 1x(83% 4m8s sys:1m56s wait:26s done:8s)
2:07:33 INF - filter 31x(86% 4m17s wait:15s done:7s)
2:07:33 INF - fetch 1x(1% 4s wait:4m49s done:7s)
2:07:33 INF - lake_centerlines 0.9s cpu:6s avg:7.1
2:07:33 INF - water_polygons 46s cpu:18m46s gc:4s avg:24.7
2:07:33 INF - read 1x(47% 21s wait:18s done:2s)
2:07:33 INF - process 31x(69% 32s wait:10s)
2:07:33 INF - write 1x(53% 24s sys:9s wait:19s)
2:07:33 INF - natural_earth 4s cpu:15s avg:3.6
2:07:33 INF - read 1x(84% 4s)
2:07:33 INF - process 31x(5% 0.2s wait:4s)
2:07:33 INF - write 1x(0% 0s wait:4s)
2:07:33 INF - osm_pass1 5m47s cpu:2h29m10s gc:24s avg:25.8
2:07:33 INF - read 1x(66% 3m51s sys:2m1s wait:1m17s done:2s)
2:07:33 INF - process 31x(73% 4m12s sys:5s block:7s wait:1m3s)
2:07:33 INF - osm_pass2 54m2s cpu:26h23m25s gc:5m3s avg:29.3
2:07:33 INF - read 1x(14% 7m40s sys:5m24s wait:44m27s done:32s)
2:07:33 INF - process 31x(82% 44m27s sys:31s block:8s wait:1m28s)
2:07:33 INF - write 1x(62% 33m44s sys:8m51s wait:14m13s)
2:07:33 INF - boundaries 8s cpu:10s avg:1.3
2:07:33 INF - sort 5m56s cpu:1h55m56s gc:1m14s avg:19.5
2:07:33 INF - worker 8x(31% 1m49s sys:26s wait:1m30s done:3s)
2:07:33 INF - mbtiles 55m36s cpu:26h4m54s gc:5m29s avg:28.1
2:07:33 INF - read 2x(13% 7m27s sys:2m39s wait:43m28s done:6s)
2:07:33 INF - merge 1x(16% 8m52s sys:5s wait:42m17s done:1s)
2:07:33 INF - encode 30x(82% 45m20s sys:3s wait:4m2s done:1s)
2:07:33 INF - write 1x(28% 15m27s sys:3m38s wait:35m45s)
2:07:33 INF - ----------------------------------------
2:07:33 INF - features 217GB
2:07:33 INF - mbtiles 84GB