som-shahlab/femr

Running piton extract flowsheet tool directly on BigQuery download fails

Miking98 opened this issue · 0 comments

Describe the bug

Throws an error:

multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f3d311519f0>'. Reason: 'PicklingError("Can't pickle <class 'zstd.ZstdError'>: import of module 'zstd' failed")'

Steps to reproduce the bug

export OMOP_SOURCE="./ignore"
export EXTRACT_DESTINATION="./ignore_extract"
python3 download_bigquery.py som-nero-nigam-starr som-nero-nigam-starr.mimic_omop /local-scratch/nigam/projects/clmbr_text_assets/data/
python3 tools/stanford/flowsheet_cleaner.py --num_threads 5 $OMOP_SOURCE "${EXTRACT_DESTINATION}_flowsheets"

Expected results

BigQuery download will be a .csv.gz, which trips up the extract code.

Actual results

Should natively handle the gunzip conversion into zstd.

Workaround

gunzip $OMOP_SOURCE/**/*.csv.gz
zstd -1 --rm $OMOP_SOURCE/**/*.csv