MarkPflug/Sylvan.Data.Excel

Unable to get column names of Excel worksheets, except for the first one

imdnir opened this issue · 2 comments

imdnir commented

Hi,

When running the following code on an Excel file containing multiple worksheets, I should receive a Dataset containing data tables where each data table represents an Excel worksheet. It should include the datasheet name as data table name, column names and data rows.
My problem - all data tables contain worksheet name and data rows, only the first data table also includes the column names. The other ones does not include column names.
When debugging the code, we can see the ExcelDataReader (edr) has empty column names for the second and third sheet when the current worksheet is not the first one.
What am I doing wrong here? Is this a bug?

Thanks,
Nir

public DataSet GetXlsAsDataSet(string fullFileName)
        {
            var resultDataSet = new DataSet();

            using (ExcelDataReader edr = ExcelDataReader.Create(fullFileName))
            {
                do
                {
                    string sheetName = edr.WorksheetName;
                    //var firstRow = true;
                    DataTable dt = new DataTable(sheetName);
                    bool firstRow = true;
                    // enumerate rows in current sheet.
                    while (edr.Read())
                    {
                        //for the row titles
                        if (firstRow)
                        {


                            ReadOnlyCollection<DbColumn> columns = edr.GetColumnSchema();


                            for (var columnIndex = 0; columnIndex < columns.Count; columnIndex++)
                            {
                                dt.Columns.Add(columns.ElementAt(columnIndex).ColumnName);
                            }

                            firstRow = false;
                        }


                        // iterate cells in row.
                        // can use edr.RowFieldCount when sheet contains jagged, non-rectangular data
                        //Add rows to DataTable.
                        dt.Rows.Add();
                        int index = 0;

                        for (var i = 0; i < edr.FieldCount; i++)
                        {
                            var value = edr.GetString(i);
                            if (string.IsNullOrWhiteSpace(value))
                            {
                                index++;
                                continue;
                            }
                            dt.Rows[dt.Rows.Count - 1][index++] = value;
                            if (index >= dt.Columns.Count)
                                break;
                        }

                    }
                    resultDataSet.Tables.Add(dt);
                } while (edr.NextResult());

            }
            return resultDataSet;
        }

This was indeed a bug. It is fixed now with 0.4.19 which I just pushed to nuget. It might take a while to show up in the index.

You can simplify your DataSet loading code a bit, by using the DataTable.Load method:

static DataSet GetDS(string filename)
{
    var ds = new DataSet();
    using var edr = ExcelDataReader.Create(filename);

    while (!edr.IsClosed) { 
        var dt = new DataTable();
        dt.TableName = edr.WorksheetName;
        // DataTable.Load internall calls NextResult,
        // and will close the reader when the last result is read.
        dt.Load(edr, LoadOption.Upsert);
        ds.Tables.Add(dt);
    }

    return ds;
}
imdnir commented

Thank you very much!
Nir