MLforHealth/MIMIC_Extract

tables.exceptions.HDF5ExtError: Problems creating the Array

2533245542 opened this issue · 2 comments

Hi,

I installed MIMIC_Extract and was able to do a test run with population=100 (by calling mimic_direct_extract.py). However, when I was trying to run it with the complete population, I got an error.

Here is a traceback of the error. Any idea on how to fix this? I installed the MIMIC database with Docker so MIMIC_Extract was running within a Docker container, but the storage space (>100GB left on the device) and available memory (>50GB) is not an issue here.

No known ranges for Basophils
No known ranges for pH urine
Glucose had 528 / 863595 rows cleaned:
  8 rows were strict outliers, set to np.nan
  520 rows were low valid outliers, set to 33.00
  0 rows were high valid outliers, set to 2000.00

No known ranges for Systemic Vascular Resistance
Height had 12 / 15182 rows cleaned:
  8 rows were strict outliers, set to np.nan
  0 rows were low valid outliers, set to 0.00
  4 rows were high valid outliers, set to 240.00

Sodium had 22 / 425997 rows cleaned:
  0 rows were strict outliers, set to np.nan
  20 rows were low valid outliers, set to 50.00
  2 rows were high valid outliers, set to 225.00

No known ranges for Lymphocytes ascites
Anion gap had 130 / 208219 rows cleaned:
  9 rows were strict outliers, set to np.nan
  108 rows were low valid outliers, set to 5.00
  13 rows were high valid outliers, set to 50.00

Shape of X :  (2200954, 312)
mimic_direct_extract.py:303: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  np.save(os.path.join(outPath, subjects_filename), data['subject_id'].as_matrix())
mimic_direct_extract.py:305: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  np.save(os.path.join(outPath, times_filename), data['max_hours'].as_matrix())
mimic_direct_extract.py:324: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  if dynamic_filename is not None: np.save(os.path.join(outPath, dynamic_filename), X.as_matrix())
/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/tables/attributeset.py:475: NaturalNameWarning: object name is not a valid Python identifier: 'axis0_nameAggregation Function'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  check_attribute_name(name)
Traceback (most recent call last):
  File "mimic_direct_extract.py", line 922, in <module>
    min_percent=args['min_percent']
  File "mimic_direct_extract.py", line 325, in save_numerics
    if dynamic_hd5_filename is not None: X.to_hdf(os.path.join(outPath, dynamic_hd5_filename), 'X')
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/pandas/core/generic.py", line 2377, in to_hdf
    return pytables.to_hdf(path_or_buf, key, self, **kwargs)
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/pandas/io/pytables.py", line 274, in to_hdf
    f(store)
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/pandas/io/pytables.py", line 268, in <lambda>
    f = lambda store: store.put(key, value, **kwargs)
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/pandas/io/pytables.py", line 889, in put
    self._write_to_group(key, value, append=append, **kwargs)
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/pandas/io/pytables.py", line 1415, in _write_to_group
    s.write(obj=value, append=append, complib=complib, **kwargs)
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/pandas/io/pytables.py", line 3022, in write
    blk.values, items=blk_items)
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/pandas/io/pytables.py", line 2812, in write_array
    self._handle.create_array(self.group, key, value)
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/tables/file.py", line 1168, in create_array
    track_times=track_times)
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/tables/array.py", line 197, in __init__
    byteorder, _log, track_times)
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/tables/leaf.py", line 290, in __init__
    super(Leaf, self).__init__(parentnode, name, _log)
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/tables/node.py", line 266, in __init__
    self._v_objectid = self._g_create()
  File "/root/miniconda3/envs/mimic_data_extraction/lib/python3.6/site-packages/tables/array.py", line 229, in _g_create
    nparr, self._v_new_title, self.atom)
  File "tables/hdf5extension.pyx", line 1297, in tables.hdf5extension.Array._create_array
tables.exceptions.HDF5ExtError: Problems creating the Array.
Job 'python mimic_direct_extract.py ...' terminated by signal SIGSEGV (Address boundary error)

I think I've had some fuss with pyarrow/pandas versions before as well as python 2 vs 3. Is your environment the same as the requirements?

Yes, the environment is the same as the requirements. I think it is just a matter of Docker. It worked when I was following the exact same steps outside of the Docker container.