DFIR-ORC/dfir-orc-config

Missing parquet documentation ?

Closed this issue · 1 comments

Hi,
I would be very interested in producing parquet files with ORC. I then compiled ORC with the parquet format support but after that, I have just been able to produce CSV files.
Moreover, I have not seen any mention of the parquet format in the documentation. Maybe I missed it.

Any help would be really appreciated. Which embedded tools can produce parquet files and how?

Thanks,

Regards,
Pierre

Good day Pierre!

Apache Parquet support in dfir-orc is experimental and not used in production yet.
That being said, it is functional and we encourage you to test and use modern file formats like Apache Orc and/or Apache Parquet.

To build dfir-orc with the Parquet file format support:

  • make sure you use the latest code in the dev branch
  • configure your build directory with -DORC_BUILD_PARQUET=ON

Important: please note that Apache Parquet has not been ported to x86 yet. It can only be used on AMD64 architecture. Apache Orc does not have this limitation. This is not a dfir-orc limitation but a limit of Apache Arrow (that we use to generate Apache Parquet files).

To use dfir-orc to produce parquet files:

  • simply use a /out=c:\temp\test.parquet output syntax and a parquet file will be generated
    (tools generating massive csv files like usninfo, ntfsinfo are primary go to tools for parquet testing. however, potentially, "any" tool could generate parquet files)

Please keep the feedback coming!
We very welcome your input on this and, as I said, strongly encourage these modern file formats.

Best regards,
Jean
PS: to submit more issues, please prefer the dfir-orc repository for dfir-orc related issues. dfir-orc-config should be limited to configuration issues.