/Gimpo

Provides robust and extensible types for data manipulation. Includes Apache Arrow compatible dataframe

Primary LanguageC#MIT LicenseMIT

Gimpo

Gimpo project was inspired by both Microsoft Data Analysis and Apache Arrow. It is intended to provide a set of data structures and building blocks for data analytics.

Mostly I am going to use it for experimenting with alternative implementation and ideas before proposing change requests into original libraries.

Gimpo DataFrame

Provides a DataFrame - tabular data structure common to many data processing libraries. Gimpo DataFrame is fully compatible with Apache Arrow format.

Gimpo Data Primitives

The array size in C# is limited to maximum index of 0x7FFFFFC7(2,147,483,591), Array.MaxLength. Framework classes Span<T> and Memory<T> both have the same restriction. In terms of bytes[], it is about 2GB, which is very low in the modern world. Classes provided by Gimpo Data Primitives use native memory and allow to go beyond this limit.

Currenly it contains:

  • NativeMemoryBuffer - the simplest abstraction over a contiguous memory block. Provides method to allocate/dispose, indexing, resizing and cloning.
  • NativeMemoryVector - vector (or List in term of C#) for storing structs.
  • NativeMemoryNullableVector - the same as NativeMemoryVector, but allows to store nullable structs.