fsprojects/ExcelProvider

When a sheetname isn't found, the type provider defaults to a non-deterministic first sheet with a different name

Closed this issue · 9 comments

Description

In an Excel file with Sheet1, Sheet2, Sheet3, assuming each has a different type, and assuming the sheetname is given as an argument to the constructor: if the given sheetname is not found at runtime, it is silently ignored and the internally first sheet is used instead.

Note that the "first sheet" is the first you've created. When you move sheets to the right in Excel, the display order changes, but not the internal index of the sheet. The type provider will select the first sheet by index when the sheetname is not found.

Repro steps

  1. Create an Excel sheet with three different types.
  2. Create some code to read these and to specify the sheetname dynamically
  3. Delete a sheet that isn't the first sheet

Expected behavior

Either an error or an empty set.

Actual behavior

The type provider takes the first sheet that's programmatically (i.e., inside the XML) the first sheet by index. If this type doesn't match, an error will occur. If the type does match, this can lead to unpredictable behavior, like the same sheet being read twice.

Known workarounds

I've researched this with my colleagues but couldn't find a workaround.

Some semblance of a workaround can be created by using the cell-gettors (the dynamic properties from the type provider) and inspect whether the result gives null, in which case either the sheet is empty, the column is absent, or the wrong sheet is loaded.

Related information

Observed with the most recent version of this repo.

When I debug this and inspect the internal fields, you can see that, upon instantiation of the type provider, the internal sheetname is set to the one provided in the constructor. However, the actual data can be something totally different and from a different sheet, because it can silently take the first-sheet-by-index.

ingted commented

I bumped into similar issue, guessing the code should be like this...

        let _range =
            printfn "String.IsNullOrWhiteSpace %s: %b" range (String.IsNullOrWhiteSpace range)
            if String.IsNullOrWhiteSpace range
            then sheetname //workbook.Tables.[0].TableName <= caused this issue!?
            else range
ingted commented

#91

This PR might fix this issue... (PS. In my build I updated FSharp.Core to 6.0.3 since I am unable to build this project with .net 5.0...)

I see there were a bunch of PRs: #91, #93 and #94.

Ultimately, it was fixed through #95 (at least, I think that PR fixed it?). I haven't needed this library for a while in my current line of F# work, but I hope someone else can verify that I can close this issue (i.e @ingted)?

After the fix, code with the sheet not exist would not be buildable... Maybe yes?

Yep, that's how it should work. Perfect @ingted!

@abelbraaksma and @ingted I have now published new Nuget Version 2.1.0-rc-1 that includes this fix. It would be highly appreciated if you can have a look at using that version to see if it works as expected.

I just tried my original repo (that is: create an XLSX file with three sheets, having different column names to prevent accidental silent success) and load the 2nd sheet by name.

I then renamed the 2nd sheet by name and now, instead of silently loading a different sheet, it now failed with an error "Sheet [Xyz] does not exist." (which is expected and more useful than a type error, or accidental success with wrong data).

To be sure I was testing the right thing, I downgraded to an earlier version (2.0.0), and using that version:

  • I statically compiled an instance of ExcelFile type using three sheets, loading sheet "Ages"
  • I then ran the tests using that data type, but the runtime excel file being one with a different name for that sheet.
  • This indeed failed with an error "ExcelProvider: Cannot cast '"1022XY"' a 'System.String' to 'System.Double'.".

Then, doing the same with the latest 2.1.0 version, everything went peachy again (that is, the error was "System.Exception: Sheet [Ages] does not exist.")!

One tiny suggestion: it appears that most exceptions here contain ExcelProvider in it, perhaps this exception can also be prefixed with that?

Either way, works like a charm, great work @quintusm and @ingted!

Thanks for the feedback @abelbraaksma. @ingted was the one that did the fix, I just facilitated infrastructure.

Hi @quintusm,

I added the prefix here https://github.com/ingted/ExcelProvider2
Maybe you could merge it with release 2.1.0?