mapbox/vtquery

when to use std::int32_t and std::int64_t

Opened this issue · 2 comments

Description

Looks like there have been some conversations around this and thought we should discuss all the places where we need to be explicit about the bit size of the int and when we shouldn't. I'll start us off with refs to previous conversations and lists of our current int32_t and int64_t usage. It would be great to get some follow-up comments about the reasoning behind this.

It looks like the idea is that we want to use int32_t for x, y, z, and extent until we cast them to doubles or int64_ts when we store them in a mapbox::geometry::point. Looks like we use int64_t for id and property values, referred to as v in the code.

References

int32_t/ uint32_t usage

int64_t/ uint64_t usage

cc @mapbox/core-tech

joto commented

Here is my thinking around this:

Generally it is best to use specific types (like uint32_t) instead of the classical types (like int), because those tend to break when switching platforms (Windows vs. Linux/OSX, 32bit vs 64bit OS). It also makes it easier to do checks ("will this value fit into my type?") correctly and it helps understanding how structs are layouted and making sure data is packed well in them. (clang-tidy now has a rule based on the Google code guidelines that checks for use of classical types and recommends changes.)

The exception for this is code that needs to interface with C libraries for instance that use the classical types, take extra care in those cases to switch between types.

Then there is the question of the size you want. From a data modelling standpoint you want the type that best fits the bill. OSM coordinates or tile coordinates fit in 32 bit because of the limited resolution, but generally coordinates need a double or 64 bit integer. Also smaller is better, because it takes less space in memory and might be faster. This ist most important if there is a lot of the data, say an array of thousands of coordinates. From a CPU performance point of view anything smaller than 32bit is probably not worth it, because registers are 32 or 64 bit anyway, but it still might help with memory/cache performance.

signed vs. unsigned: I think it is general wisdom these days that using signed ints is better than unsigned (see http://soundsoftware.ac.uk/c-pitfall-unsigned.html for instance). I have heard several C++ ISO committee members arguing that making size_t unsigned was a mistake. You have to use unsigned types for anything where you want to do bit manipulation, because bit-manipulations on signed types are undefined (clang-tidy warns you about this). I struggle with this often myself, because so many (std) library calls use unsigned ints that it is hard to get this right.

There are some goatches with 8 bit types: Classic types are signed char and unsigned char, the char type is defined to be either signed char or unsigned char depending on the platform. uint8_t is an unsigned char under the hoods, so cout << uint8_t(123); will output a character, not a number.

Consistency is really important: Casting between types is always a point where things could break, so minimizing types and type switches is good.

So there is no perfect solution here. One other thing I like to think of when deciding is: What happens if something overflows, is this actually a problem? And: What happens if I check for an overflow (say on a downcast) and I get a problem? How can I handle the error or report it? When I don't have a good way of handling this, I must use a larger type.