blixt/py-starbound

Support for reading bookmarks, etc, from world data (types/layers 3 and 4)

apocalyptech opened this issue · 12 comments

I'm putting this one in here more for documentation and brainstorming than anything else, in case someone else feels like puzzling this out. I've got it Good Enough for my own purposes, and was planning on just implementing what I've got in my own app. It'd be nice to have a "real" API for this, though whether it's worth the work is another question.

Anyway, I wanted to find out how to find user bookmarks, though this seems to also be a method to (probably) find where any Entity is, based on UUID (though I haven't really looked into that too far). The starting point for bookmarks, at least, is in the player JSON data, underneath universeMap.(uuid).teleportBookmarks, and includes the following data:

  1. The user-defined name for the bookmark
  2. The UUID of the object it links to
  3. A colon-separated string describing the world it links to

The colon-separated string contains one of:

CelestialWorld:<coord[0]>:<coord[1]>:<coord[2]>:<planet_num>[:<moon_num]
InstanceWorld:<filename_part>:<instance_uuid>:<suffix_num>
ClientShipWorld:<player_uuid>

CelestialWorld entries point to the universe/*.world files, as you'd expect, InstanceWorld entries point to universe/unique-<filename_part>-<instance_uuid>-<suffix_num>.world files, and ClientShipWorld points to player/<player_uuid>.shipworld.

Notably absent from that info is the actual world coordinates for the bookmark - for that we need to look at the map data itself. One brute-force way would be to loop through all regions, get_entities on them, and look for entities whose uniqueId attribute matches the bookmark UUID (and then use that entity's tilePosition).

To avoid having to loop through all entities in the map, it looks like the map data types/layers 3 and 4 could be used. Layer 4 is the simplest, and seems to be a simple map of all entity UUIDs in a region, so you can get those with just world.get(4, rx, ry), though of course finding that would require looping through all keys, since you don't know the rx, ry yet. The data itself in there starts out with a single unsigned byte which is the number of IDs included in the data, and then a length-prefixed string for each ID (the length is another single unsigned byte - ordinarily 0x20/32 for UUIDs).

Layer 3 is a bit more complex, and I suspect that this is the one which is intended to be used for this sort of lookup. My first big unknown is that I have no idea what the key actually represents. The layer/type is 4, of course, but I've yet to figure out what in the world the other eight bytes are meant to describe. It's definitely not just rx, ry, and may not even be considered two shorts.

As for the data inside layer 3 nodes, it starts off with an unsigned short specifying the number of entries. Each entry starts out with a length-prefixed string, as with the layer 4 entry. Then there's two shorts which give you the region X, Y, and a further eight bytes I don't have a clue how to interpret.

So right now, for both of those layers, the best I can manage is looping through all the keys in the DB, matching on the UUID, and then using the region X,Y to grab entities in that region, and match on the UUID in there as well. I assume there's got to be a way to generate the keys for region 3 data using what exists in the Player bookmark data, but I've not figured that out yet. Among the things I tried was checking to see if the key happened to be a crc32 of the UUID, but no such luck there.

I know this is already super long-winded, but for thoroughness's sake, here's various bookmarks and their layer 3 keys/unknown data:

Processing Diana
 * Found bookmark "Space Flag" in unique-playerstation-ada79096eb153ccfd6768413dd6edeb8-1.world, at region (2, 8), coords (91, 279)
  - Node Key: 03 1C FD B1 24
    * As bytes: 28 253 177 36
    * As shorts: 7421 45348
    * As int: 486388004
  - Unknown Data: 42 B6 00 00 43 8B 80 00
    * As bytes: 66 182 0 0 | 67 139 128 0
    * As shorts: 17078 0 | 17291 32768
    * As ints: 1119223808 | 1133215744

 * Found bookmark "Station Porter" in unique-playerstation-ada79096eb153ccfd6768413dd6edeb8-1.world, at region (2, 8), coords (73, 267)
  - Node Key: 03 ED 42 3F 40
    * As bytes: 237 66 63 64
    * As shorts: 60738 16192
    * As int: 3980541760
  - Unknown Data: 42 92 00 00 43 85 80 00
    * As bytes: 66 146 0 0 | 67 133 128 0
    * As shorts: 17042 0 | 17285 32768
    * As ints: 1116864512 | 1132822528

Processing Destructicus
 * Found bookmark "Human Campground" in -452761967_-908966405_-81331760_3.world, at region (146, 31), coords (4679, 1000)
  - Node Key: 03 60 52 71 34
    * As bytes: 96 82 113 52
    * As shorts: 24658 28980
    * As int: 1616015668
  - Unknown Data: 45 92 38 00 44 7A 00 00
    * As bytes: 69 146 56 0 | 68 122 0 0
    * As shorts: 17810 14336 | 17530 0
    * As ints: 1167210496 | 1148846080

 * Found bookmark "Apex Camp" in -452761929_-908966412_-25356524_11.world, at region (121, 32), coords (3900, 1026)
  - Node Key: 03 01 1D 69 97
    * As bytes: 1 29 105 151
    * As shorts: 285 27031
    * As int: 18704791
  - Unknown Data: 45 73 C0 00 44 80 40 00
    * As bytes: 69 115 192 0 | 68 128 64 0
    * As shorts: 17779 49152 | 17536 16384
    * As ints: 1165213696 | 1149255680

 * Found bookmark "Home Base" in -452761926_-908966428_-252630074_5.world, at region (95, 31), coords (3055, 1022)
  - Node Key: 03 43 AE F5 FF
    * As bytes: 67 174 245 255
    * As shorts: 17326 62975
    * As int: 1135539711
  - Unknown Data: 45 3E F0 00 44 7F 80 00
    * As bytes: 69 62 240 0 | 68 127 128 0
    * As shorts: 17726 61440 | 17535 32768
    * As ints: 1161752576 | 1149206528

 * Found bookmark "Home Base Backup" in -452761926_-908966428_-252630074_5.world, at region (95, 31), coords (3070, 1022)
  - Node Key: 03 7E E0 50 73
    * As bytes: 126 224 80 115
    * As shorts: 32480 20595
    * As int: 2128629875
  - Unknown Data: 45 3F E0 00 44 7F 80 00
    * As bytes: 69 63 224 0 | 68 127 128 0
    * As shorts: 17727 57344 | 17535 32768
    * As ints: 1161814016 | 1149206528

 * Found bookmark "On A Moon" in -452761882_-908966422_-48613115_10_2.world, at region (47, 21), coords (1504, 679)
  - Node Key: 03 69 FD 4C 2B
    * As bytes: 105 253 76 43
    * As shorts: 27133 19499
    * As int: 1778207787
  - Unknown Data: 44 BC 00 00 44 29 C0 00
    * As bytes: 68 188 0 0 | 68 41 192 0
    * As shorts: 17596 0 | 17449 49152
    * As ints: 1153171456 | 1143586816

 * Found bookmark "Exploring" in -452761882_-908966422_-48613115_11.world, at region (25, 16), coords (802, 529)
  - Node Key: 03 0C 1F 56 16
    * As bytes: 12 31 86 22
    * As shorts: 3103 22038
    * As int: 203380246
  - Unknown Data: 44 48 80 00 44 04 40 00
    * As bytes: 68 72 128 0 | 68 4 64 0
    * As shorts: 17480 32768 | 17412 16384
    * As ints: 1145602048 | 1141129216

 * Found bookmark "On the ship!" in 1d6a362efdf17303b77e33c75f73114f.shipworld, at region (31, 32), coords (1003, 1024)
  - Node Key: 03 B1 23 15 04
    * As bytes: 177 35 21 4
    * As shorts: 45347 5380
    * As int: 2971866372
  - Unknown Data: 44 7A C0 00 44 80 00 00
    * As bytes: 68 122 192 0 | 68 128 0 0
    * As shorts: 17530 49152 | 17536 0
    * As ints: 1148895232 | 1149239296

Actually, I suppose the following in the World class might be somewhat useful. It's still rather inefficient to have to loop through all the keys in the world, if you're just looking for a single entity, but something like this should do:

    @property
    def uuid_to_region_map(self):
        """
        A dict whose keys are the UUIDs (or just IDs, in some
        cases) of entities, and whose values are the `(rx, ry)`
        coordinates in which that entity can be found.
        """
        if not self._uuid_to_region_map:
            self._uuid_to_region_map = {}
            for key in self.get_all_keys():
                (layer, rx, ry) = struct.unpack('>BHH', key)
                if layer == 4:
                    stream = io.BytesIO(self.get(layer, rx, ry))
                    num_entities = struct.unpack('B', stream.read(1))[0]
                    for i in range(num_entities):
                        strlen = struct.unpack('B', stream.read(1))[0]
                        uuid = stream.read(strlen).decode('utf-8')
                        if uuid in self._uuid_to_region_map:
                            # TODO: err?  Should we do this?
                            raise Exception('UUID {} found in two regions'.format(uuid))
                        else:
                            self._uuid_to_region_map[uuid] = (rx, ry)
        return self._uuid_to_region_map

Two things I'm unsure of with that:

  1. Should utf-8 be used for decoding? I suspect that they never get beyond latin1. If they're not latin1, does the length value indicate the number of bytes or the number of characters? (I'd assume the former, but I'd also assume that the question's moot 'cause probably the IDs never get beyond ASCII.)
  2. What should be done if the same UUID's found in more that one place?

Anyway, something like that would at least enable apps to do something like:

for mark in bookmarks:
    if mark.uuid in world.uuid_to_region_map:
        entities = world.get_entities(*world.uuid_to_region_map[mark.uuid])
        for entity in entities:
            if ('uniqueId' in entity.data and entity.data['uniqueId'] == mark.uuid):
                (ex, ey) = entity.data['tilePosition']
                print(' * Found bookmark "{}" in {}, at coords ({}, {})'.format(
                    mark.name,
                    filename,
                    ex, 
                    ey  
                    ))  
    else:
        print(' * Bookmark "{}" not found in {}'.format(
            mark.name,
            filename,
            ))  

I could PR that function if it's something you think you might want in the project, though I admit that even better would be a way to figure out those layer-3 lookups so we're not looping through all keys.

blixt commented

This is great! I’ll read through this properly later. Could I ask you to take a glance at the FORMATS.md file to see if it needs some freshening up? It seems things have changed and you have a lot more up to date knowledge than I do and I would like there to be a central source of truth for this stuff to help others be able to work on Starbound files easier.

blixt commented

Regarding the unknown data that looks like floats to me. I tried one of them ("Home Base" in -452761926_-908966428_-252630074_5.world) and the coordinates appear to match up:

>>> struct.unpack('>ff', '\x45\x3E\xF0\x00\x44\x7F\x80\x00')
(3055.0, 1022.0)

Oh, hah! Yeah, of course they're floats. I've even seen plenty of coordinates as floats in the data elsewhere, too.

And sure, I've been thinking I should probably update the README as well with a couple of the new methods, though I'd wanted to make sure those settle down a bit first. I don't think there's anything in FORMATS that's wrong, though there's probably some stuff that could be expanded. I'll take a look in a bit!

So, I'm finally getting back to this stuff a bit. The above uuid_to_region_map property has been working quite well for my own purposes; still not sure if it's something that necessarily belongs in the base class or not.

I've got some other info that I'm aggregating into my own World class. Perhaps some of this stuff might be useful to have in the real class? Let me know if so and I can put together a PR for it:

    class World(starbound.World):
        """
        Overloaded World class, originally to provide some data-access methods
        I'm not sure really belong in the main py-starbound World class,
        but now also to provide some of our own data correlation stuff which
        almost certainly *doesn't* belong in the py-starbound World class.
        """

        def __init__(self, df, filename):
            super().__init__(df)
            self._uuid_to_region_map = None
            self.filename = filename
            self.base_filename = os.path.basename(filename)
            self.types = set()
            self.biomes = set()
            self.dungeons = set()
            self.coords = (0, 0)

        def read_metadata(self):
            """
            Reads metadata, and collates some information in the metadata
            structure, for ease of use later.
            """
            super().read_metadata()
            if 'worldTemplate' in self.metadata:
                wt = self.metadata['worldTemplate']
                if wt:
                    if 'celestialParameters' in wt:
                        cp = wt['celestialParameters']
                        if cp and 'parameters' in cp and 'terrestrialType' in cp['parameters']:
                            self.types = set(cp['parameters']['terrestrialType'])
                        if cp and 'coordinate' in cp and 'location' in cp['coordinate']:
                            self.coords = (
                                    cp['coordinate']['location'][0],
                                    cp['coordinate']['location'][1],
                                    )
                    if 'worldParameters' in wt:
                        wp = wt['worldParameters']
                        if wp:
                            for key, layer in wp.items():
                                if layer and (key.endswith('Layer') or key.endswith('Layers')):
                                    if key.endswith('Layer'):
                                        layerlist = [layer]
                                    else:
                                        layerlist = layer
                                    for layer in layerlist:
                                        for dungeon in layer['dungeons']:
                                            self.dungeons.add(dungeon)
                                        for label in ['primaryRegion', 'primarySubRegion']:
                                            region = layer[label]
                                            self.biomes.add(region['biome'])
                                        for label in ['secondaryRegions', 'secondarySubRegions']:
                                            for inner_region in layer[label]:
                                                self.biomes.add(inner_region['biome'])
blixt commented

Cool! I would say that the O(1) things are fine to add if they're useful in a general sense.

However, anything else I would prefer if it was only calculated on an as-needed basis. Basically as a rule of thumb to avoid a tool that e.g., displays a list of worlds, from slowing down as we add more and more conveniences. Same goes for individual attributes. One tool might only care about the types, and another may only care about dungeons, and in those cases if we don't have a sane separation of concerns, things will get slower as more useful data fields get added.

It might also be worth considering putting some of these into a WorldInspector class instead of using inheritance. And then try to split out reading for specific values, but still grouping relevant loops to avoid unnecessary overhead when you do need all values:

class WorldInspector(object):
    def __init__(self, world):
        self.world = world
        self._biomes = None
        self._coords = None
        self._dungeons = None
        self._types = None

    @property
    def biomes(self):
        if self._biomes is not None:
            return self._biomes
        self._read_world_parameters()
        return self._biomes

    @property
    def coords(self):
        # Same but self._read_celestial_parameters()

    # ...

    def _read_world_parameters(self):
        self.world.read_metadata()
        if 'worldTemplate' in self.world.metadata:
            self._biomes = set()
            self._dungeons = set()
            # Update _biomes and _dungeons here since they're in the same pass...

    # And same for _read_celestial_parameters...

I prefer this since you're not really expanding/modifying the behavior of World, only creating a new type of utility.

Sure thing, I'll get that wrapped into a PR in a bit, then!

So this feels a bit like overkill, really; I feel vaguely like nearly all of this would be perfectly fine happening automatically instead of on-demand, but here's a version where literally everything's on-demand, regardless: apocalyptech@d3cf7bd

Look okay? As I say, seems a bit overkill, but it works well. IMO it makes sense to shove in the introspection class to the world object, rather than forcing users to keep track of it themselves.

If you're okay with this approach, I'll add in some documentation for all that and get it PR'd properly! :)

blixt commented

Hey @apocalyptech I agree it looks a bit overkill, but I actually think this is a great start that can be done very similarly without the overkill part :) I took a stab at it here: https://github.com/blixt/py-starbound/compare/world_info (I don't have Starbound installed so I couldn't test that it all works fine, sorry!)

Let me know what you think. Some notes:

First, to avoid circular references I made WorldInfo only care about the metadata dict, so it no longer keeps a reference to the World. This also showed that get_entity_uuid_coords doesn't belong in the separate WorldInfo class (because it's actively inspecting the BTree), so I moved that back into World.

Overall, since I foresee more things getting added to WorldInfo, I created a simple lazyproperty descriptor to avoid maintenance cost of keeping __init__ and the_property in sync (this also makes the code more readable by reducing if nesting).

Then I made the celestialParameters and worldParameters private lazy properties that return a named tuple with all the information they represent. This makes extending them and also accessing them from other properties much easier.

Finally I did some PEP8 cleanup and potential bug fixes. I noticed you were reading a single byte, and then a stream of bytes and decoding it as UTF-8. I presume this is actually a standard "SBON" string, which means that first byte is actually a variable length integer, which would mean it could break in some cases if we assume it's only a single byte. I also assumed that the number of entities is also a variable length integer instead of a single byte. Let me know if this is actually wrong and breaks in your testing!

Hello! Yeah, that looks good, though requires a bit of tweaking. As-written, the lazyproperty class can't handle having more than one World object -- or rather, as soon as a property's been read from a single World's metadata, it'll continue returning that data for any other World as well. To solve that, the lazyproperty class would either have to store its info using an internal dict of some sort, or just load attributes into the proper object itself. I opted for the latter since that way the data for a World gets kept inside the object itself. Feels a bit hacky but should be fine; used the attribute name _lazyproperty_<funcname> to store 'em.

The other problem I encountered was that _celestialParameters and _worldParameters were attempting to alter the namedtuples, which was failing 'cause of tuple immutability. Reworked those very slightly to compensate.

Both those changes can be seen here: https://github.com/blixt/py-starbound/compare/world_info...apocalyptech:world_info?expand=1

Good catch re: the varint stuff, I'm sure you're right about that. No errors reading any of my own world files, so I think we're good there.

blixt commented

Ah of course, how silly of me :/ That's what I get for not testing my code and you can tell I haven't touched Python for a while now! Thanks :)

If you're happy with this version then go ahead and squash it and PR it, otherwise I'm happy to do it too.

BTW for future commits, a few notes on code style that I'd like to try and follow in this repo:

  • For commit messages, first line should be <= 50 chars and separated from any other text by a blank line (this helps GitHub format the commits nicely without the ... button)
  • PEP8 rules (in general double newlines between types, and trying to keep lines relatively short although I think instead of 80 chars, 100 or 120 are ok max line widths in this 16:9+ age)

Sounds good! I'll get some docs added as well, and bundle that into a PR once that's ready.

Also sounds good re: style -- I admit I tend to be a bit laissez-faire with my own commits. :)