Scenario.read_excel() fails on sheets containing 'phantom' cells
OFR-IIASA opened this issue · 1 comments
The error occurs when executing read_excel().
When reading an excel file, some rows of data can be added which actually do not contain any data. These extra rows will contain "NaN"s, therefore resulting in two errors.
The first error which occurs, if the argument add_unit=True
is passed:
TypeError: No matching overloads found for at.ac.iiasa.ixmp.Platform.addUnitToDB(float,str), options are: public int at.ac.iiasa.ixmp.Platform.addUnitToDB(java.lang.String,java.lang.String) throws at.ac.iiasa.ixmp.exceptions.IxException
The error message results from the fact that a unit "nan" is trying to be added.
The second error occurs when reading data and determining the column type. Should "phantom" data be included in the dataframe, then columns which should be integers are read as floats. The following error is given:
RuntimeError: unhandled Java exception: The key '2001.0' must not contain a '.'!
The current workaround requires cleaning of the xlsx which is being read-in. In the various sheets, columns can be filtered for "Blanks" in excel. The resulting selection will show the respective rows, indicated by having blue row numbers. Deleting these and saving the xlsx file should resolve the issue.
In order to see which parameter sheet contains the phantom data, add print(name)
in line 274 of the file ixmp/backend/io.py
. The parameter sheet currently being processed will be printed on screen. The last parameter printed before the error message will guide the user to the sheet containing the phantom-data.
pandas has problems with:
-
rows which can be cleaned used
df.dropna(how="all")
-
columns that have “left-over” empty columns, naming them
Unnamed: {v}
where v is a counter…This is how this is caught and treated in pyam: https://github.com/IAMconsortium/pyam/blob/c62ee8c368a9ee48447fce8a838410112298a3b2/pyam/utils.py#L178