Fastexcel fails to load an excel file with polars
Closed this issue · 3 comments
durgeksh commented
dfs = pl.read_excel('input.xlsx', engine='calamine', sheet_id=0)
for key in dfs.keys():
print(dfs[key].head())
Error:
Traceback (most recent call last):
File "/Users/Desktop/workspace/pocs/demo.py", line 27, in <module>
dfs = pl.read_excel('compass_input.xlsx', engine='calamine', sheet_id=0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/utils/deprecation.py", line 136, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/utils/deprecation.py", line 136, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 259, in read_excel
return _read_spreadsheet(
^^^^^^^^^^^^^^^^^^
File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 487, in _read_spreadsheet
parsed_sheets = {
^
File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 488, in <dictcomp>
name: reader_fn(
^^^^^^^^^^
File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 834, in _read_spreadsheet_calamine
df = ws.to_polars()
^^^^^^^^^^^^^^
File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/fastexcel/__init__.py", line 64, in to_polars
df = pl.from_arrow(data=self.to_arrow())
^^^^^^^^^^^^^^^
File "/Users/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/fastexcel/__init__.py", line 47, in to_arrow
return self._sheet.to_arrow()
^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Could not create RecordBatch from sheet Hybrid_Productivity
Caused by:
0: Could not build schema for sheet Hybrid_Productivity
1: Error in calamine cell: NA
Installed versions:
--------Version info---------
Polars: 0.20.7
Index type: UInt32
Platform: macOS-12.7.3-arm64-arm-64bit
Python: 3.11.2 (v3.11.2:878ead1ac1, Feb 7 2023, 10:02:41) [Clang 13.0.0 (clang-1300.0.29.30)]
----Optional dependencies----
adbc_driver_manager: <not installed>
cloudpickle: <not installed>
connectorx: <not installed>
deltalake: <not installed>
fsspec: <not installed>
gevent: <not installed>
hvplot: <not installed>
matplotlib: 3.8.2
numpy: 1.26.4
openpyxl: <not installed>
pandas: 2.2.0
pyarrow: 15.0.0
pydantic: <not installed>
pyiceberg: <not installed>
pyxlsb: <not installed>
sqlalchemy: <not installed>
xlsx2csv: <not installed>
xlsxwriter: <not installed>
None
Process finished with exit code 0
lukapeschke commented
Hello, could you please provide an excel file allowing to reproduce the bug ? Without this, it will be hard to investigate the issue
durgeksh commented
Hi @lukapeschke, I found that if a cell contains the value "#N/A" then it is failing to load the data. I am attaching a sample below.
Thank you.
sample_data.xlsx
df = pl.read_excel('sample_data.xlsx', engine='calamine')
durgeksh commented
Thank you so much for so quick fix @PrettyWood @lukapeschke.