/datafusion-orc

Implementation of ORC file format

Primary LanguageRustApache License 2.0Apache-2.0

datafusion-orc

Implementation of ORC file format

test codecov Crates.io Crates.io

Read Apache ORC in Rust.

  • Read ORC files
  • Read stripes (the conversion from proto metadata to memory regions)
  • Decode stripes (the math of decode stripes into e.g. booleans, runs of RLE, etc.)
  • Decode ORC data to Arrow Datatypes (Async/Sync)

Current Support

Column Encoding Read Write Rust Type Arrow DataType
SmallInt, Int, BigInt i16, i32, i64 Int16, Int32, Int64
Float, Double f32, f64 Float32, Float64
String, Char, and VarChar string Utf8
Boolean bool Boolean
TinyInt i8 Int8
Binary Vec<u8> Binary
Decimal
Date chrono::NaiveDate Date32
Timestamp chrono::NaiveDateTime Timestamp(Nanosecond,_)
Timestamp instant
Struct Struct
List
Map
Union

Compression Support

Compression Read Write
None
ZLIB
SNAPPY
LZO
LZ4
ZSTD