Failure on pandas commands
datapythonista opened this issue · 7 comments
I've got this script:
import pandas
df = pandas.DataFrame({'col': ['foo bar']})
df['col'].map(lambda x: len(x.split(' ')))
When I run it with the Python interpreter, it works without problems.
But when I run it with PYTHON_RECORD_API_TO_MODULES="pandas" python -m record_api
, I get the following error:
Traceback (most recent call last):
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/__main__.py", line 12, in <module>
tracer.calls_from_modules[0], run_name="__main__", alter_sys=True
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 205, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/mgarcia/quansight/dataframe_tools/kaggle/mutable/scripts/9996822.py", line 4, in <module>
df['col'].map(lambda x: len(x.split(' ')))
File "/home/mgarcia/quansight/dataframe_tools/kaggle/mutable/scripts/9996822.py", line 4, in <module>
df['col'].map(lambda x: len(x.split(' ')))
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 564, in __call__
Stack(self, frame)()
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 372, in __call__
getattr(self, method_name)()
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 477, in op_CALL_METHOD
self.process((function,), function, args)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 354, in process
log_call(f"{filename}:{line}", fn, *args, **kwargs)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 262, in log_call
bound = Bound.create(fn, args, kwargs)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 239, in create
sig = signature(fn)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/pandas/core/generic.py", line 1799, in __hash__
f"{repr(type(self).__name__)} objects are mutable, "
TypeError: 'Series' objects are mutable, thus they cannot be hashed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/__main__.py", line 15, in <module>
raise Exception(f"Error running {tracer.calls_from_modules}")
Exception: Error running ['9996822']
Not sure what's the exact pattern, but I'd say I get an error like this in almost every script that uses pandas. Let me know if you need more information, I can find other examples, but I guess it should be obvious for you what's wrong.
Can you try with the new version? This works for me locally:
$ cat tmp.py
import pandas
df = pandas.DataFrame({'col': ['foo bar']})
df['col'].map(lambda x: len(x.split(' ')))
$ PYTHON_RECORD_API_TO_MODULES="pandas" PYTHON_RECORD_API_OUTPUT_FILE=tmp.jsonl PYTHON_RECORD_API_FROM_MODULES=tmp python -m record_api
$ cat tmp.jsonl
{"location":"/Users/saul/p/python-record-api/tmp.py:3","function":{"t":"type","v":{"module":"pandas.core.frame","name":"DataFrame"}},"bound_params":{"pos_or_kw":[["data",{"t":"dict","v":[["col",["foo bar"]]]}]]}}
{"location":"/Users/saul/p/python-record-api/tmp.py:4","function":{"t":"builtin_function_or_method","v":{"module":"_operator","name":"getitem"}},"bound_params":{"pos_only":[["a",{"t":{"module":"pandas.core.frame","name":"DataFrame"}}],["b","col"]]}}
That's weird, I'm in the latest version... Those are the rest of the relevant versions:
#!/bin/sh
python --version
python -c "import pandas; print('pandas', pandas.__version__)"
python -c "import record_api; print('record_api', record_api.__version__)"
export PYTHON_RECORD_API_TO_MODULES="pandas"
export PYTHON_RECORD_API_FROM_MODULES=mutable_error
export PYTHON_RECORD_API_OUTPUT_FILE=output.jsonl
python -m record_api
Python 3.7.6
pandas 1.0.2
record_api 1.1.0
Traceback (most recent call last):
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/__main__.py", line 12, in <module>
tracer.calls_from_modules[0], run_name="__main__", alter_sys=True
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 205, in run_module
return _run_module_code(code, init_globals, run_name, mod_spec)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/mgarcia/quansight/dataframe_tools/kaggle/mutable/scripts/mutable_error.py", line 4, in <module>
df['col'].map(lambda x: len(x.split(' ')))
File "/home/mgarcia/quansight/dataframe_tools/kaggle/mutable/scripts/mutable_error.py", line 4, in <module>
df['col'].map(lambda x: len(x.split(' ')))
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 564, in __call__
Stack(self, frame)()
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 372, in __call__
getattr(self, method_name)()
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 477, in op_CALL_METHOD
self.process((function,), function, args)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 354, in process
log_call(f"{filename}:{line}", fn, *args, **kwargs)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 262, in log_call
bound = Bound.create(fn, args, kwargs)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/core.py", line 239, in create
sig = signature(fn)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/pandas/core/generic.py", line 1799, in __hash__
f"{repr(type(self).__name__)} objects are mutable, "
TypeError: 'Series' objects are mutable, thus they cannot be hashed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/mgarcia/miniconda3/envs/pydata/lib/python3.7/site-packages/record_api/__main__.py", line 15, in <module>
raise Exception(f"Error running {tracer.calls_from_modules}")
Exception: Error running ['mutable_error']
It works for me on a clean environment, but with newer versions of the other deps:
$ conda create -n tmp -c conda-forge python=3.8 pandas
$ conda activate tmp
$ pip install python-record-api
$ python --version
Python 3.8.2
$ python -c "import pandas; print('pandas', pandas.__version__)"
pandas 1.1.0.dev0+1519.gd09f20e29
$ python -c "import record_api; print('record_api', record_api.__version__)"
record_api 1.1.0
I will try with your versions.
I see the issue though, I am caching calls to signature
to speed up the time, and I guess the Series class cannot be hashed...
I will add a fix in case it cannot hash something to just get the signature without a cache.
I just published 1.1.1, could you try that?
I just published 1.1.1, could you try that?
Yes, all good now, thanks!
Yay!