WASB ORC to CSV is a data engineering utility for converting Windows Azure Storage Blob-based Apache ORC files (typically from Hive) into CSV.
The process is the following:
- Authenticate in Azure
- Download ORC file from Blob store
- Write contents of ORC file a tempfile
- Using ORC C++ library to deserialize ORC
- Print ORC columns as CSV using custom "Flat class"
The utility is currently limited to struct
ORC types, but others should be straightforward to add.