hangxie/parquet-tools

INT96 import issue

hangxie opened this issue · 3 comments

INT96 values does not match original value after import:

    "Int96": "1717-12-28T19:20:10.805069776Z",		      |	    "Int96": "2022-01-01T09:09:09.009009Z",

I feel like this is unresolvable due parquet-go does not store arbitrary bytes in a proper way, it should be treated as []byte but parquet-go uses string, more details at xitongsys/parquet-go#434, which means literally there is no way to import an INT96 value from JSON to parquet, thinking of INT96 is deprecated I believe we are good.

I will do a couple of tests to confirm this.

There may be a solution to parse INT96 timestamp, convert it to INT96 then let parquet-go handle the new value, however, this again needs to recursively iterate value node to decide which one to convert, which is pretty complex (see code for cat command), I tend not to do this as INT96 is deprecated but if parquet-go can fix this problem, I can take it.

I'm going to close this case as INT96 is barely supported now a day, especially write.