scikit-learn-contrib/category_encoders

RecursionError: maximum recursion depth exceeded while calling a Python object

TobiasSackmannDacoso opened this issue · 1 comments

Expected Behavior

no erros during hashing

Actual Behavior

following error is thrown:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/user/.local/lib/python3.8/site-packages/category_encoders/hashing.py", line 162, in require_data
self.require_data(self, data_lock, new_start, done_index, hashing_parts, cols=cols, process_index=process_index)
File "/home/user/.local/lib/python3.8/site-packages/category_encoders/hashing.py", line 162, in require_data
self.require_data(self, data_lock, new_start, done_index, hashing_parts, cols=cols, process_index=process_index)
File "/home/user/.local/lib/python3.8/site-packages/category_encoders/hashing.py", line 162, in require_data
self.require_data(self, data_lock, new_start, done_index, hashing_parts, cols=cols, process_index=process_index)
[Previous line repeated 954 more times]
File "/home/user/.local/lib/python3.8/site-packages/category_encoders/hashing.py", line 157, in require_data
hashing_parts.put({part_index: data_part})
File "", line 2, in put
File "/usr/lib/python3.8/multiprocessing/managers.py", line 834, in _callmethod
conn.send((self._id, methodname, args, kwds))
File "/usr/lib/python3.8/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
RecursionError: maximum recursion depth exceeded while calling a Python object

Steps to Reproduce the Problem

  1. Load 350 million line dataset
  2. Leave all parameters as default. Only set max_sample=2000 (also tried 200 or 10000)
  3. Try to encode the dataset: 6 out 10 features have to be encoded

Specifications

  • Version: 2.5.0
  • Platform: Ubuntu 20.04
  • Subsystem:

The problem is that we iterate recursively through the data in the multiprocessing hashing. So the maximum recursion depth is reached if there is more data than max_recursion_depth * max_sample *n_processors. I've fixed this by using a while loop instead of recursion