Cinchoo/ChoETL

Convert from CSV to object without JSON

WolfAlvein opened this issue · 9 comments

Hi Cinchoo I wanted to ask you if there is anyway to convert from the CSV to a specific class that has been passed has a Generic value, I have managed to make it work but only if I transform it to JSON first and then to the object has shown bellow:

           using (var csvParser = ChoCSVReader<T>.LoadText(header + Environment.NewLine + _csvData)
                                           .QuoteAllFields()
                                           .WithFirstLineHeader()
                                           .ThrowAndStopOnMissingField(false)
                                           .Configure(c => c.LiteParsing = true)
                                           .Configure(c => c.NestedColumnSeparator = '/')
                                           .Configure(c => c.FieldValueTrimOption = ChoFieldValueTrimOption.None))
            {
                if (readLargeStream)
                {
                    foreach (var row in csvParser)
                    {
                        var _jsonReturn = JsonConvert.DeserializeObject<T>(row.DumpAsJson().Replace("^MISSING_VALUE$", null)); ;
                        returnObject.Add(_jsonReturn);
                    }
                }
                else
                {
                    returnObject = JsonConvert.DeserializeObject<List<T>>(csvParser.DumpAsJson().Replace("^MISSING_VALUE$", null));
                }
            }

I'm also getting this error with a file and class that I think shouldn't be having any problems, other than having quite a big header the amount of data shouldn't be causing to much problems:

Error:

"Invalid cast from 'System.String' to 'System.Nullable`1[[System.Int32, System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]'."

Class generates error.txt
CPRRVN.zip

Thank you.

need to turn off .Configure(c => c.LiteParsing = false) in CSV parser to handle it.

need to turn off .Configure(c => c.LiteParsing = false) in CSV parser to handle it.

Thank you, I guess this is to avoid the invalid cast issue but is there anyway to convert the CSV to object directly? without dumping it has a JSON?

found out that below fields are defined as int, but the csv has them as date value in text. After changing their types to string, csv loads successfully.

        [JsonProperty(NullValueHandling = NullValueHandling.Ignore)]
        public string VehicleMonth { get; set; } //original type: int
        [JsonProperty(NullValueHandling = NullValueHandling.Ignore)]
        public string AnyoMes { get; set; } //original type: int

Sample csv load code snippet

using (var r = new ChoCSVReader<CapVehicleFutureValuePrice>(@"*** CSV file path ***")
    .WithFirstLineHeader(true)
    .NotifyAfter(100)
    .Setup(s => s.RowsLoaded += (o, e) =>
    {
        Console.WriteLine(e.RowsLoaded.ToString("#,##0") + " rows loaded.");
    })
    )
{
    r.First().Print();
}

Is there anyway to allow the values to continue being an Int? this is done at the moment like this because in the Data base we store does values has Int.

Also, I'm getting this message with a DateTIme null value in the CSV's, DateTime values that are greater than DateTime.MaxValue or smaller than DateTime.MinValue when converted to UTC cannot be serialized to JSON., is there anyway to configure it so that it takes the Null value correctlly?

well, after closing looking at your csv file, found out that it comes with 306 fields. But your POCO defined with extra 3 fields.

In order to process your csv successfully, you will have to tell the parser to ignore those 3 fields. And use the after record load event to calculate and derive those 3 field values.

Wrote the sample fiddle to show how

https://dotnetfiddle.net/wk2ClU

FYI, take the latest package.

Thank you Cinchoo, yes this fields are extra data that we manage alone and are not used in the CSV extraction, the issue that I have is that I used a longer and more complex code before and this was not a problem, I can't add the ignore fields because not all the POCO classes that we process in this method have them, and the idea is for this general method to process all of the files that we capture, there are about 28 files and it could continue to grow along the years to come,
here is the original method that does everything correctly but takes too much time:OriginalMethod.txt
and optimized method I'm trying to build for this process:OptimizedMethod.txt

The real thing I wish that could be done is to deserialize the CSV directly to the POCO class if possible.

after looking at the optimizedmethod.txt, removing .Configure(c => c.LiteParsing = true) this line, will load the csv into POCO object. let me know.

Thank you Cinchoo, but what I want is not the CSV parser loaded to the POCO Objects, I want the CSV Parser to load the data into a List of the POCO, like:

           List<T> csvParser = ChoCSVReader<T>.LoadText(header + Environment.NewLine + _csvData)
                                           .QuoteAllFields()
                                           .WithFirstLineHeader()
                                           .ThrowAndStopOnMissingField(false)
                                           .Configure(c => c.LiteParsing = true)
                                           .Configure(c => c.NestedColumnSeparator = '/')
                                           .Configure(c => c.FieldValueTrimOption = ChoFieldValueTrimOption.None)

the .Configure(c => c.LiteParsing = true) works great and is allowing the csvParser to load into it's POCO object.

I'm also getting an error when doing the DumpAsJSON() where if a DateTime Field in the CSB is empty, it will give me the following error:
DateTime values that are greater than DateTime.MaxValue or smaller than DateTime.MinValue when converted to UTC cannot be serialized to JSON., any idea on how to allow it to set it to null or simply ignore the value?

after looking at the optimizedmethod.txt, removing .Configure(c => c.LiteParsing = true) this line, will load the csv into POCO object. let me know.

Hi Cinchoo, thank you for all your help, at the end the issue was a little bit more with the POCO objects than the CSVReader, I got it to work by doing:

        using (var csvParser = ChoCSVReader.LoadText(header + Environment.NewLine + _csvData)
                                           .QuoteAllFields()
                                           .WithFirstLineHeader()
                                           .ThrowAndStopOnMissingField(false)
                                           .Configure(c => c.NestedColumnSeparator = '/')
                                           .Configure(c => c.FieldValueTrimOption = ChoFieldValueTrimOption.None))
        {
            if (readLargeStream)
            {
                foreach (var row in csvParser)
                {
                    var _jsonReturn = JsonConvert.DeserializeObject<T>(row.DumpAsJson().Replace("^MISSING_VALUE$", null));
                    returnObject.Add(_jsonReturn);
                }
            }
            else
            {
                if (t == typeof(CapVehicleUsedValuePrice))
                {
                    System.IO.File.WriteAllText(@"C:\Users\ascar\Desktop", csvParser.DumpAsJson().Replace("^MISSING_VALUE$", null));
                }
                
                returnObject = JsonConvert.DeserializeObject<List<T>>(csvParser.DumpAsJson().Replace("^MISSING_VALUE$", null));
            }
        }