minetest-go/mapcleaner

RAM overflow, probably bug

Closed this issue ยท 15 comments

For some reasons, the RAM is overflowing, then the program gives an error and the process ends.

The size of map was ~400mb, available RAM: 26gb. All 26gb was refilled in 7 minutes.

Settings of "mapcleaner.json": {"chunk_x":-60,"chunk_y":-60,"chunk_z":-60,"removed_chunks":0,"retained_chunks":0,"processed_chunks":0,"from_x":-60,"from_z":-60,"to_x":60,"to_y":60,"to_z":60,"delay":0}

mapcleaner_protect.txt is empty

ezgif com-optimize

Not sure where it consumes that much memory, the only cache currently is here:

mapcleaner/protected.go

Lines 17 to 24 in b62fb1c

// caches
var protected_chunks = make(map[string]*bool)
var emerged_chunks = make(map[string]*bool)
func ClearCache() {
protected_chunks = make(map[string]*bool)
emerged_chunks = make(map[string]*bool)
}

But this gets reset after every y-layer here:

mapcleaner/process.go

Lines 32 to 33 in b62fb1c

// purge cache after each layer
ClearCache()

Does it even progress to the next y-layer in your case?

Anyway: why are you using it without declaring any protected nodename? This will remove every chunk in your map eventually ๐Ÿค” (removing the map.sqlite is faster btw ๐Ÿ˜‰)

>Does it even progress to the next y-layer in your case?
Yes, the process continues until all RAM memory is exhausted, then the process is killed

>Anyway: why are you using it without declaring any protected nodename?
I'm using areas mod instead of protector redo (with nodes). I tested also with some nodes in "mapcleaner_protect.txt" - happens the same; after some time memory overflowing as well

>This will remove every chunk in your map eventually thinking (removing the map.sqlite is faster btw wink)
Not all chunks, protected by areas mod from "areas.dat" will not be removed, so the other issue is very actual for me too

I had the same issue after 30 minutes:

INFO[1343] Removing chunk                                chunk_x=-137 chunk_y=-387 chunk_z=-57
INFO[1524] Removing chunk                                chunk_x=-136 chunk_y=-387 chunk_z=-57
INFO[1683] Removing chunk                                chunk_x=-135 chunk_y=-387 chunk_z=-57
Getรถtet
Oct 21 22:02:34 joes-kiste2 kernel: [24997.515923] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/vte-spawn-8c56399b-0b27-4c4a-9701-26205df4aef3.scope,task=mapcleaner,pid=23864,uid=1000
Oct 21 22:02:34 joes-kiste2 kernel: [24997.515950] Out of memory: Killed process 23864 (mapcleaner) total-vm:16236276kB, anon-rss:5919744kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:34428kB oom_score_adj:0

I have a mapcleaner_protect.txt with the following text:

protector:protect
techage:power_pole2
techage:power_pole
techage:power_lineS
techage:oil_drillbit
techage:oil_source
hyperloop:shaft
hyperloop:tubeS
autobahn:node1
carts:powerrail
carts:rail
default:torch
homedecor:torch_wall

this might be fixed with e65b8e3 which introduces a smarter caching

But there is no binary available so far, right?

But there is no binary available so far, right?

I created a release, the binaries should arrive there any minute now: https://github.com/minetest-go/mapcleaner/releases/tag/v0.0.5 :)

Please test and let me know if the issue persists

@init-lua about your other problem (the areas-export): i fixed that in dbe3d38 this should also be fixed in that release ๐Ÿ‘

The syntax is the same:

mapcleaner -mode export_protected

Again: ping me up or open a new issue if something comes up, also: thanks for testing ๐Ÿ‘

Thanks for your work, Unfortunately, the result is still the same:
2023-11-18_11-20

I'm unsure if this is the case but similar to what happened to PostgreSQL backend, there needs to be a call to rows.Close() otherwise they are retained by the driver/database transaction:

Possibly missing bit is in this method:
https://github.com/minetest-go/mtdb/blob/master/block/block_sqlite.go#L60

Potentially also on other methods that need to keep db.Sql.Rows references as well. I'll take a look at the Iterator() new implementation that can also potentially have this bug.

The program has been running for a few days now, but the database size "map.sqlite" remains unchanged (~50 GB)

INFO[257114] Processing next z-stride chunk_y=400 chunk_z=401
INFO[257114] Processing next y-layer chunk_y=401
INFO[257114] mapcleaner exiting

The program has been running for a few days now, but the database size "map.sqlite" remains unchanged (~50 GB)

INFO[257114] Processing next z-stride chunk_y=400 chunk_z=401
INFO[257114] Processing next y-layer chunk_y=401
INFO[257114] mapcleaner exiting

@joe7575 that is unfortunate :(

The current way the cleanup goes is by querying all the possible mapblock positions, even those that do not exist (are not emerged yet). This makes the total processing time be sometimes prohibitive. It should be able to clean up the map regardless, but looks like I have done something wrong with my changes and broke that part as per your tests.

Since a few months ago, I started implementing an "iterator" that walks over only the emerged areas. I've tested this one iterator by implementing a new flag on --mode export_protected, called --export-all. This one iterates over all emerged mapblocks and exports the equivalent of what would be left after a clean.

I tested this on a backup of the server I host. The PostgreSQL source has 80G of physical storage. The exported sqlite is around 4G, so it was a major reduction. We tested the exported map locally and it looks fine: protected areas were properly exported with a safe margin of 1 chunk on all directions around them. If you have the opportunity to test this version on your map file and share here if it works for you too, I'm going to implement the same algorithm on the cleanup version of the command and it should run faster and also should give the expected results. Alternatively, you could use this more conservative approach for the map cleanup: export protected, validate, then replace the big map with the new, smaller file.

As for execution time, the server is hosted on a ARM64 VM with 4 vCPU and 24G of RAM and It took 24hr to parse whole map (~125.000.000 mapblocks) and export. The same map on my local AMD64 machine with 24 CPU and 32G of RAM took 10hr.

I have not tested long-running/large scale export from sqlite to sqlite, but the unit tests for both exporting algorithms are compatible with each other as per the unit test cases I have in place.

Kindly let me know what you think about this and as soon as the PR is merged and the new version is released it should be ready to test.

If you have a working Go installation you can try compiling this branch with go build: https://github.com/ronoaldo/mapcleaner/tree/ronoaldo/issue184

The program has been running for a few days now, but the database size "map.sqlite" remains unchanged (~50 GB)

You might have to do a manual vacuum; in the sqlite cli, but i was under the impression that the mapcleaner should do this after it has completed ๐Ÿค”

EDIT: yeah, it doesn't do that (yet) ๐Ÿ™„

Just a note: the batch_size parameter can speed things up - but that may require some testing. I managed to do the whole map clean on my use case with a batch size of 100.000. The script I used with the export-all flag is here:

https://github.com/ronoaldo/mercurio/blob/main/scripts/export.sh#L15

Thanks for the "vacuum" hint.
With sqlite3 ./map.sqlite "VACUUM INTO './map2.sqlite';" the database shrinks from 48.6 GB to 13.7 GB

closing this, if the issue persists, please reopen