error using clean_lat_lon
arfriedman opened this issue · 4 comments
Unfortunately, clean_lat_lon returns an error in dataprep 0.4.3a1 with python 3.10.4.
The problem occurs for me using the documentation example:
import pandas as pd
import numpy as np
df = pd.DataFrame({
"lat_long":
[(41.5, -81.0), "41.5;-81.0", "41.5,-81.0", "41.5 -81.0",
"41.5° N, 81.0° W", "41.5 S;81.0 E", "-41.5 S;81.0 E",
"23 26m 22s N 23 27m 30s E", "23 26' 22\" N 23 27' 30\" E",
"UT: N 39°20' 0'' / W 74°35' 0''", "hello", np.nan, "NULL"]
})
from dataprep.clean import clean_lat_long
clean_lat_long(df, "lat_long")
It returns this error:
File ~/miniconda3/envs/AQUATIC/lib/python3.10/site-packages/dataprep/clean/clean_lat_long.py:172, in clean_lat_long(df, lat_long, lat_col, long_col, output_format, split, inplace, errors, report, progress)
167 raise ValueError(
168 f'output_format {output_format} is invalid, it must be "dd", "ddh", "dm", or "dms"'
169 )
171 # convert to dask
--> 172 df = to_dask(df)
174 # To clean, create a new column "clean_code_tup" which contains
175 # the cleaned values and code indicating how the initial value was
176 # changed in a tuple. Then split the column of tuples and count the
177 # amount of different codes to produce the report
178 if lat_long:
179 # clean a latitude and longitude column
File ~/miniconda3/envs/AQUATIC/lib/python3.10/site-packages/dataprep/clean/utils.py:73, in to_dask(df)
71 df_size = df.memory_usage(deep=True).sum()
72 npartitions = np.ceil(df_size / 128 / 1024 / 1024) # 128 MB partition size
---> 73 return dd.from_pandas(df, npartitions=npartitions)
File ~/miniconda3/envs/AQUATIC/lib/python3.10/site-packages/dask/dataframe/io/io.py:236, in from_pandas(data, npartitions, chunksize, sort, name)
234 if none_chunksize:
235 if not isinstance(npartitions, int):
--> 236 raise TypeError(
237 "Please provide npartitions as an int, or possibly as None if you specify chunksize."
238 )
239 chunksize = int(ceil(nrows / npartitions))
240 elif not isinstance(chunksize, int):
TypeError: Please provide npartitions as an int, or possibly as None if you specify chunksize.
I encounter the problem both in the version from conda-forge on linux and also pip on windows.
Thanks much,
Andrew
Hi @arfriedman. Thank you for using our library and reporting the issue.
Actually, others also encountered the similar issue in #903 and give the solution in stackoverflow (https://stackoverflow.com/questions/72453608/dataprep-eda-typeerror-please-provide-npartitions-as-an-int-or-possibly-as-non), and we already refined this issue in current develop branch. You can install the develop branch version with:
pip install -U git+https://github.com/sfu-db/dataprep.git@develop
Both way can solve the issue you encountered.
Hi @arfriedman. Thank you for using our library and reporting the issue.
Actually, others also encountered the similar issue in #903 and give the solution in stackoverflow (https://stackoverflow.com/questions/72453608/dataprep-eda-typeerror-please-provide-npartitions-as-an-int-or-possibly-as-non), and we already refined this issue in current develop branch. You can install the develop branch version with:
pip install -U git+https://github.com/sfu-db/dataprep.git@develop
Both way can solve the issue you encountered.
Thank you @qidanrui
I installed dataprep 0.4.5, which solves the problem above -- thank you!
However, it now returns the following warning:
import pandas as pd
from dataprep.clean import clean_lat_long
df = pd.DataFrame({'coord': ['51° 29′ 36.24″ N, 0° 0′ 35.28″ E', '51.4934° N, 0.0098° E']})
clean_lat_long(df, 'coord', split=True)
/home/andrew/miniconda3/envs/AQUATIC/lib/python3.10/site-packages/dask/dataframe/core.py:6604: FutureWarning: Meta is not valid, `map_partitions` and `map_overlap` expects output to be a pandas object. Try passing a pandas object as meta or a dict or tuple representing the (name, dtype) of the columns. In the future the meta you passed will not work.
warnings.warn(
Latitude and Longitude Cleaning Report:
2 values cleaned (100.0%)
Result contains 2 (100.0%) values in the correct format and 0 null values (0.0%)
Out[5]:
coord latitude longitude
0 51° 29′ 36.24″ N, 0° 0′ 35.28″ E 51.4934 0.0098
1 51.4934° N, 0.0098° E 51.4934 0.0098
Do you know how to address this warming?