Cinchoo/ChoETL

Writing dates as parquet datetime types

Opened this issue · 5 comments

I have a program that writes a list of objects of a specific type to parquet. The issue is when it is writing date properties to the parquet file they are saved as strings rather than a datetime type.

This is how I have my parser configured

using (var parser = new ChoParquetWriter(outSteam)
.Configure(c => c.Culture = CultureInfo.InvariantCulture)
.Configure(c => c.TypeConverterFormatSpec = new ChoTypeConverterFormatSpec { DateTimeFormat = "o" })

This is the definition of the property in the class

public DateTime? date_reported { get; set; }

And this is a example of what the date looks in the database I am reading from

2023-01-09 00:00:00.000

And this is how it is stores in the object
image

Well, underlying parquet driver doesn't support datetime type, hence storing it as text.

Is there a datetime like type that it does support such as datetimeoffset?

yes, there is way to use datetimeoffset. let me add it. Will update.

Did you push this update and if so how is it used?

Yes, here is how you can control the output

                using (var w = new ChoParquetWriter(filePath)
                    .Configure(c => c.TreatDateTimeAsDateTimeOffset = true)
                    )
                {
                    w.Write(recs);
                }