nyxxxie/serenity

Create typing system

nyxxxie opened this issue · 5 comments

We need a way to define types to be used with the template system. This feature should be implemented in two parts:

  • Type: Provides to_string/from_string transforms and indication of size in memory. Uniquly identified by a string (EG: "int32_t").
  • Typesystem: Manages and stores types. Ensures there are no duplicates and allows for easy access via a for loop or type name lookup.

In addition, we should provide certain builtin types off the bat (byte, char, etc). These should be placed in their own directory in the project called types.

Template system should have at least these types defined:
i8, u8, i16le, i16be, u16le, u16be, i32le, i32be, u32le, i64le, i64be, u64le, u64be, ieee754f32le, ieee754f32be, ieee754f64le, ieee754f64be, ieee754f128le, ieee754f128be and intelf80
Other types should alias to these types above depending on settings.
i8, int8_t to i8
i16, int16_t to i16le or i16be
i32, int32_t to i32le or i32be
i64, int64_t to i64le or i64be

u-prefixed types should alias respectively. C types must follow C specification. That means char aliases to signed char or unsigned char depending on project settings.
char may alias to signed char or unsigned char
signed char may alias to i8, i16, i32, i64
unsigned char may alias to u8, u16, u32, u64
short and int may alias to i16 or bigger
long may alias to i32 or bigger
long long may alias to i64 or bigger

The unsigned variants must alias to their unsigned counterparts respectively. Don't forget that short, short int, signed short and signed short int is the same type. This is same for long and long long. Additionally, floating point number types must follow these rules:
float usually is an alias to ieee754f32le but any floating point type may be set
double usually is an alias to ieee754f64le but any floating point type may be set
long double may alias to ieee754f64le, ieee754f128le or intelf80 or other

Additionally, a non-standard type wchar_t may alias to i16, u16 or larger.

Spade should probably include alias definitions for various platforms (windows-x86-msvc, windows-x64_64-msvc, windows-x86-gcc, windows-x86_64-gcc, linux-x86, linux-x86_64, m68k etc, platform-arch-compiler) out of box so that users could quickly select the one they want instead of configuring every alias manually (though this should be kept as an option).

I think some of your proposed types can be added as a seperate feature (for example le,be versions of types). Let's define the default set of types for successful completion of this feature to be byte and char, and add in the additional builtins in a new feature. I do agree the typesystem should be able to support aliasing (creating type that refers to another type).

This comment is more of a note to myself. Seeing that, in #34 we want to add dynamic-sized types, the current design that is being developed may be inadequate for the task. An alternate design for this typesystem feature could be to have the typemanager store the python class of a type itself (as opposed to a class object as we currently do), and then have the object serve as a representation of a series of bytes. Each type will take in the bytes through the constructor or through from_string() or from_bytes() functions (to allow reuse), and then will allow querying of the stored data using to_string() and to_bytes(), and size() methods. Usage, then, would look like this:

int_bytes = b"\xde\xad\xbe\xef" # Some bytes we get from some random source

# We can get the Int32 type parser from a typesystem...
from spade.typesystem.typemanager import TypeManager
tm = TypeManager() # Initializes typesystem and adds default types
Int32 = tm.get_type("int32") # Returns usable class

# ...or we can just import it directly, since it's a default type
from spade.typesystem.types.int32 import Int32

# Usage 1 (The standard way to use it)
int_parse = Int32(int_bytes)
print(int_parse.to_string())
print(str(int_parse)) # Make str call to_string because why not

# Usage 2 (more verbose)
int_parse = Int32()
int_parse.from_bytes(int_bytes)
print(int_parse.to_string())

# Usage 3 (similar to typecasting)
print(Int32(int_bytes).to_string())

This above design also allows us to write various utility functions, like one that will take in a byte stream and extract as many types from that byte stream as there are. For instance, if we fed this function 9 bytes and asked it to parse out all int32's, then it'd return 2 Int32 objects representing both integers.

I merged the final interface for implementing types as well as implementations for char, byte, int32le, and uint32le. I'll keep this issue open until I've implemented all (or most) types mentioned above.