Hierarchal Batch Table in 3D Tiles Next?
pjcozzi opened this issue · 2 comments
In 3D Tiles 1.0 3DTILES_batch_table_hierarchy enables storage-efficient representations of hierarchal relationships such as
- a door is in a room is on a floor is in a building.
This is used for tiling CityGML in Cesium ion, and for Cesium OSM Buildings. CC @kring
AFAIK this is not currently possible in 3D Tiles Next; instead, duplicate metadata needs to be stored. I think it is OK to solidify the current specs, #556, before starting anything new, but can we:
- Communicate to the developer community why 3D Tiles Next doesn't support this
- Add a comment here on how 3D Tiles Next may approach hierarchal metadata representation
I found some detailed notes I had about allowing a hierarchy of classes. I had a few ideas, though at the time it was low priority so never came up for discussion.
NOTE: examples below were based on the older 3DTILES_feature_metadata
schema.
NOTE: Towards the end I have some diagrams in the section on Space Complexity
Option 1: Inherit Columns
{
"classes": {
"building": {
"properties": {
"estimatedHeight": {
"type": "STRING"
},
"longitude": {
"type": "FLOAT64"
},
"latitude": {
"type": "FLOAT64"
}
}
},
"buildingPart": {
// point to parent class
"parent": "building",
"properties": {
"elementType": {
"type": "STRING"
},
"elementID": {
"type": "FLOAT64"
},
// Let's make a name collision with
// geometry.color with different types. I'm proposing that
// the subclass overrides this. Other options would be to
// require that the types match, or just disallow collisions.
"color": {
"type": "ARRAY",
"componentType": "UINT8",
"componentCount": 4
}
}
},
"geometry": {
// point to parent class
"parent": "buildingPart",
"properties": {
"color": {
"type": "STRING"
}
}
}
},
"instanceTables": {
"buildingTable": {
"class": "building",
"properties": {
"estimatedHeight": {
"bufferView": 1,
"offsetBufferViews": [2]
},
"longitude": {"bufferView": 3},
"latitude": {"bufferView": 4}
}
},
"buildingPartTable": {
"class": "buildingPart",
"properties": {
// Separate copy of columns from parents
"estimatedHeight": {
"bufferView": 1,
"offsetBufferViews": [2]
},
"longitude": {"bufferView": 3},
"latitude": {"bufferView": 4},
// plus subclass-only columns
"elementID": {
"bufferView": 5,
"offsetBufferViews": [6]
},
// name collision example.
// This is the parent table, so this has type ARRAY[UINT8, 4]
"color": {"bufferView": 7}
}
},
"geometryTable": {
// Separate copy of columns from parents
"estimatedHeight": {
"bufferView": 1,
"offsetBufferViews": [2]
},
"longitude": {"bufferView": 3},
"latitude": {"bufferView": 4},
"elementID": {
"bufferView": 5,
"offsetBufferViews": [6]
},
// plus subclass-only columns
// name collision example. This is the subclass, so it
// shadows the parent variable. Thus this has type STRING
"color": {
"bufferView": 7,
"offsetBufferViews": 8
}
}
}
}
Pros:
- no extra auxillary buffers needed
Cons:
- more bufferviews to keep track of
Example
buildingTable
:
instanceId | estimatedHeight | longitude | latitude |
---|---|---|---|
0 | 100.0 | -75.164031 | 39.953264 |
1 | 400.0 | -74.013334 | 40.712270 |
buildingPartTable
:
Notice that we inherit columns from the parent, adding elementType, elementId and color.
The values are independent of the parent though.
instanceId | estimatedHeight | longitude | latitude | elementType | elementId | color |
---|---|---|---|---|---|---|
0 | 200.0 | -122.477886 | 37.813934 | "elementType1" | 0 | [255, 0, 0] |
1 | 200.0 | -122.479090 | 37.825700 | "elementType2" | 1 | [255, 0, 0] |
geometryTable
:
Notice that there was a name collision. Here I suggest that the subclass' property shadows the parent. See notes about name collisions below.
instanceId | estimatedHeight | longitude | latitude | elementType | elementId | color |
---|---|---|---|---|---|---|
0 | 180.0 | 38.624548 | -90.184862 | "elementType1" | 0 | "GREY" |
1 | 50.0 | 38.624548 | -90.184862 | "elementType2" | 1 | "GREY" |
Resolving Name Collisions:
- Option 1.1: Subclass properties shadow their parents. This is like the example above, where
buildingPart.color: ARRAY[UINT8, 4]
is shadowed bygeometry.color: STRING
. This could be problematic though in cases where geometry needs to be used as a buildingPart - Option 1.2: like 1.1, but require that the types match (e.g. the above example would be invalid)
- Option 1.3: Disallow name conflicts between parent and child.
Option 2: Include a parentIds
column
In this option, each child table includes an extra property array called parentId
which stores instanceIds that index into the parent table.
{
"classes": {
"building": {
"properties": {
"estimatedHeight": {
"type": "STRING"
},
"longitude": {
"type": "FLOAT64"
},
"latitude": {
"type": "FLOAT64"
}
}
},
"buildingPart": {
// point to parent class
"parent": "building",
"properties": {
"elementType": {
"type": "STRING"
},
"elementID": {
"type": "FLOAT64"
},
// Let's make a name collision with
// geometry.color with different types. Unlike the
// "inherit columns" approach, we can store values at both
// the parent and child scope. However, then it becomes a question
// of how to access each version of color from the API.
// we could always make it explicit: buildingPart.color vs geometry.color.
// Or we could disallow name conflicts.
"color": {
"type": "ARRAY",
"componentType": "UINT8",
"componentCount": 4
}
}
},
"geometry": {
// point to parent class
"parent": "buildingPart",
"properties": {
"color": {
"type": "STRING"
}
}
}
},
"instanceTables": {
"buildingTable": {
"class": "building",
"properties": {
// These bufferViews must store values for both the parents
// and _all children classes_ that inherit from building
// so we have instances for building, buildingPart, and geometry
"estimatedHeight": {
"bufferView": 1,
"offsetBufferViews": [2]
},
"longitude": {"bufferView": 3},
"latitude": {"bufferView": 4}
}
},
"buildingPartTable": {
"class": "buildingPart",
// parentIds is like a UINT (what size?) property
// that is an instance ID in the _parent_ table.
// in this case, it indexes into buildings
"parentIds": {
"bufferView": 4
},
"properties": {
// In this case, we have instances from buildingPart and geometry
"elementID": {
"bufferView": 5,
"offsetBufferViews": [6]
},
// name collision example. This property array can be defined
// separately from geometryTable.properties.color. See note
// in the class definition.
"color": {"bufferView": 7}
}
},
"geometryTable": {
"class": "geometry",
"parentIds": {
"bufferView": 8
},
"properties": {
"color": {
"bufferView": 7,
"offsetBufferViews": 8
}
}
}
}
}
Pros:
- Data for each table is stored by the relevant class
- Fewer bufferView declarations
- Naming collisions allow parent and class version of variable, even with different types)
Cons:
- Need additional
parentId
property arrays - since parentId is essentially a pointer, need to be careful about dangling references
Example
buildingTable
:
Notice that the parent table now has many more entries for all the subclass instances
instanceId | estimatedHeight | longitude | latitude |
---|---|---|---|
(building instances) |
|||
0 | 100.0 | -75.164031 | 39.953264 |
1 | 400.0 | -74.013334 | 40.712270 |
(buildingPart instances) |
|||
2 | 200.0 | -122.477886 | 37.813934 |
3 | 200.0 | -122.479090 | 37.825700 |
(geometry instances) |
|||
4 | 180.0 | 38.624548 | -90.184862 |
5 | 50.0 | 38.624548 | -90.184862 |
buildingPartTable
:
Notice the new column parentId
, and again, we have additional rows from the geometry subclass. Furthermore, due to the name collision, there are two different "color" properties, one in the parent table and one in the child table. No collisions in the data, but you'd need to allow a way to distinguish them, perhaps with a fully qualified name (e.g. geometry.color
)
instanceId | elementType | elementId | color | parentId |
---|---|---|---|---|
(buildingPart instances) |
||||
0 | "elementType1" | 0 | [255, 0, 0] | 2 |
1 | "elementType2" | 1 | [255, 0, 0] | 3 |
(geometry instances) |
||||
2 | "elementType1" | 0 | [128, 128, 128] | 4 |
3 | "elementType2" | 1 | [128, 128, 128] | 5 |
geometryTable
:
instanceId | color | parentId |
---|---|---|
0 | "GREY" | 2 |
1 | "GREY" | 3 |
Resolving Name Collisions
Option 2.1: Consider each version of the name as a separate variable, identified by a fully-qualified name like buildingPart.color
vs. geometry.color
to resolve conflicts.
Option 2.2: Disallow naming conflicts between parent and child class.
Option 3: Virtual parentId
Option 2 can be tweaked to remove the overhead of storing pointers to parents. However, there's a caveat about generalizing to trees of classes.
If we enforced a rule that the parent table is listed in order from base class down the inheritance chain, then we can (at runtime) compute the offsets:
classOffsets = {A: 0, B: hA, C: hA + hB]
Then parentInstanceId = classOffsets[parentClass] + child.instanceId
.
Thus no pointers need to be stored!
EDIT: One catch: supporting inheritance trees may be slightly trickier. See the section on "Inheritance Trees" below.
{
"classes": {
"building": {
"properties": {
"estimatedHeight": {
"type": "STRING"
},
"longitude": {
"type": "FLOAT64"
},
"latitude": {
"type": "FLOAT64"
}
}
},
"buildingPart": {
// point to parent class
"parent": "building",
"properties": {
"elementType": {
"type": "STRING"
},
"elementID": {
"type": "FLOAT64"
},
// Let's make a name collision with
// geometry.color with different types. Unlike the
// "inherit columns" approach, we can store values at both
// the parent and child scope. However, then it becomes a question
// of how to access each version of color from the API.
// we could always make it explicit: buildingPart.color vs geometry.color.
// Or we could disallow name conflicts.
"color": {
"type": "ARRAY",
"componentType": "UINT8",
"componentCount": 4
}
}
},
"geometry": {
// point to parent class
"parent": "buildingPart",
"properties": {
"color": {
"type": "STRING"
}
}
}
},
"instanceTables": {
"buildingTable": {
"class": "building",
"properties": {
// These bufferViews must store values for both the parents
// and _all children classes_ that inherit from building
// so we have instances for building, buildingPart, and geometry
"estimatedHeight": {
"bufferView": 1,
"offsetBufferViews": [2]
},
"longitude": {"bufferView": 3},
"latitude": {"bufferView": 4}
}
},
"buildingPartTable": {
"class": "buildingPart",
// No parentIds needed!
"properties": {
"elementID": {
"bufferView": 5,
"offsetBufferViews": [6]
},
"color": {"bufferView": 7}
}
},
"geometryTable": {
"class": "geometry",
// No parentIds needed!
"properties": {
"color": {
"bufferView": 7,
"offsetBufferViews": 8
}
}
}
}
}
Pros:
- No need to store pointers in the file
- strict storage layout could be a good thing, only one correct way to store data.
- Since the pointers are implicit, validation is easier (just check that the length of the parent array is correct). No worries about dangling pointers!
Cons:
- Small amount of runtime overhead to compute offsets at each access
- This method could be more tricky to explain clearly.
Inheritance Trees
This method has one caveat: It was derived using an example with a single inheritance chain. What changes when the user specifies an inheritance tree?
- How do we define the ordering of classes for storage in the parent class table? A preorder of the tree would be fine, except siblings are unordered... we could require an explicit listing of subclasses of the direct children. Here's one idea, though it feels a bit weird:
"classes": {
"parent": {
// ordering the parents' direct children
childrenOrder: ["childA", "childB"]
},
"childA": {
"parent": "parent"
},
"childB": {
"parent": "parent"
}
}
- how does calculating the
classOffsets
change? It would be similar to before; a cumulative sum of instance counts, but you'd need to do this on a preordered list of classes.
Comparison with Batch Table Hierarchy
The 3D Tiles Batch Table Hierarchy spec works a little bit differently in the following way:
Batch Table Hierarchy:
- All instances of all classes are assumed to be listed in one big array, so each has a unique
batchId
. - Each class has a unique id, numbered in order that they appear in the instance table
- up to 3 auxillary arrays of length
numberOfInstances
are needed to mention parents:- classIds: each instance explicitly labels a class
- parentCounts: how many levels up the inheritance tree are valid for this instance. This enables partial inheritance chains (why??)
- parentIds: the batch Id of the parent instance. Since this indexes into the global array of indices, this allows some corner cases like an instance with a parent in the same class (behavior of this explicitly undefined in the spec)
3D Tiles Next:
- Each class has a different set of instanceIds
- Classes are stored as a dictionary, not an array, so they don't have a defined order
- If we make the following assumptions, you can get rid of the
classIds
andparentCounts
buffer, and make other simplifications:- inheritance is only defined between classes, not between arbitrary instances
- A subclass must inherit from all its ancestors, no partial inheritance chains
- if columns are inherited, then no
parentIds
buffer is needed.
Space Complexity Analysis
Let's consider a case like above where we have class C extends B extends A
. I want to compute the number of cells in all three tables and any auxillary buffers needed.
Note: In the analysis below, I assume that there are no name collisions for simplicity.
To do this, let's define some helper variables:
Variable | Description |
---|---|
wA | "width" of instance table A, i.e. the number of properties (columns) unique to class A |
hA | "height" of instance table A, i.e. the number of instances (rows) unique to class A |
wB | "width" of instance table B |
hB | "height" of instance table B |
wC | "width" of instance table C |
hC | "height" of instance table C |
Option 1: Inherit columns
Overhead needed: 0 (we only store the values we need, no pointers)
bufferViews needed:
- class A:
wA
- class B:
wA + wB
(need a separate bufferView for each parent column. This is not shown in the diagram) - class C:
wA + wB + wC
- total:
3wA + 2wB + wC
Option 2: parentId
overhead: hB + 2hC
(parentId pointers)
the generalized pattern is 1hB + 2hC + 3hD + ... (n-1)hX
where n
is the length of the inheritance chain. this grows quadratically with n
Option 3: Virtual parentId
Overhead: 0
bufferVIews needed:
- class A:
wA
- class B:
wB
(parent columns are stored in the table for classA, no bufferViews needed) - class C:
wC
Batch Table Hierarchy for Comparison:
Overhead: 3hA + 2 * 3hB + 3 * 3hC
General pattern: 1 * (3hA) + 2 * (3hB) + 3 * (3hC) + ... + n * (3hX)
. Again, n
stands for the length of the inheritance chain. Again, this is quadratic in n
, but has strictly greater storage requirements than Option 2 because even the base class stores parent pointers to itself, plus there are 3 arrays rather than just a single pointer.
I'm reading this again after having opened #641. It seems like the most important decision is this one.
inheritance is only defined between classes, not between arbitrary instances
I'm not sure if class inheritance alone provides enough composability, especially for city and BIM/CAD use cases where you might have a "door" entity that could belong to many types of parent entities.
One concept I like in option 2 is the separate parentIds
column. Originally I thought PARENT_IDS
could be a semantic but it doesn't really make sense to include a property like that in the class definition. A separate property next to the property table makes more sense.