ValueError when a column contains many ints followed by a float
adereth opened this issue · 1 comments
adereth commented
I'm using this as sample data: http://spatialkeydocs.s3.amazonaws.com/FL_insurance_sample.csv.zip
There's a column with a bunch of 0
values and on line 902 it contains 7096.5
. When I'm paging through the data using ngrid, everything is fine until it hits this line. At that point it dies with:
Traceback (most recent call last):
File "/usr/local/bin/ngrid", line 9, in <module>
load_entry_point('ngrid==0.1.0', 'console_scripts', 'ngrid')()
File "/usr/local/lib/python2.7/dist-packages/ngrid/main.py", line 124, in main
grid.show_model(model, num_frozen=options.frozenCols)
File "/usr/local/lib/python2.7/dist-packages/ngrid/grid.py", line 1000, in show_model
view.show()
File "/usr/local/lib/python2.7/dist-packages/ngrid/grid.py", line 686, in show
self.__print()
File "/usr/local/lib/python2.7/dist-packages/ngrid/grid.py", line 831, in __print
if idx < self.__model.num_rows
File "/usr/local/lib/python2.7/dist-packages/ngrid/grid.py", line 378, in get_row
row = [ c(v) for c, v in zip(self.converts, row) ]
ValueError: invalid literal for int() with base 10: '7096.5'
alexhsamuel commented
You can use the --buffer_size
option to use a larger number of rows for guessing column types, or --dataframe
to load the entire dataset into memory up front.
It's on the todo list to adjust types dynamically in cases like this, but it's somewhat tricky to implement.