Running piton extract flowsheet tool directly on BigQuery download fails
Miking98 opened this issue · 0 comments
Miking98 commented
Describe the bug
Throws an error:
multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f3d311519f0>'. Reason: 'PicklingError("Can't pickle <class 'zstd.ZstdError'>: import of module 'zstd' failed")'
Steps to reproduce the bug
export OMOP_SOURCE="./ignore"
export EXTRACT_DESTINATION="./ignore_extract"
python3 download_bigquery.py som-nero-nigam-starr som-nero-nigam-starr.mimic_omop /local-scratch/nigam/projects/clmbr_text_assets/data/
python3 tools/stanford/flowsheet_cleaner.py --num_threads 5 $OMOP_SOURCE "${EXTRACT_DESTINATION}_flowsheets"
Expected results
BigQuery download will be a .csv.gz
, which trips up the extract code.
Actual results
Should natively handle the gunzip conversion into zstd.
Workaround
gunzip $OMOP_SOURCE/**/*.csv.gz
zstd -1 --rm $OMOP_SOURCE/**/*.csv