mmcdermott/MEDS_transforms
A simple set of MEDS polars-based ETL and transformation functions
PythonMIT
Pinned issues
Issues
- 0
If a time format is provided but the column is already a datetime type it should not throw an error and should instead warn the user
#232 opened by mmcdermott - 0
- 0
- 0
Update nested-ragged-tensors dependency to >=0.0.8
#212 opened by Oufattole - 0
Get test coverage to 100%
#216 opened by mmcdermott - 0
- 0
Should update log tests with pytest-loguru
#226 opened by mmcdermott - 0
Passing external splits during extraction fails due to type mismatch between string and Path
#222 opened by mmcdermott - 0
- 0
- 1
Upgrade nested_ragged_tensors to >= 0.0.8
#219 opened by Oufattole - 0
File paths with spaces in them break the runner
#217 opened by mmcdermott - 4
Attend to the new "MEDS_DEATH" code
#209 opened by rvandewater - 0
Shoud add a test with non-standard splits
#215 opened by mmcdermott - 0
Single stage tester is likely not checking for the right kinds of errors when `should_error` is True
#213 opened by mmcdermott - 4
We should be able to convert between different ontological code vocabularies.
#204 opened by mmcdermott - 5
- 1
Lock files should be pipeline ID specific in some way -- this will enable pipelines to flag when old run locks are present.
#194 opened by mmcdermott - 1
add_time_derived_measurements breaks if you use _script in the meds_transform_runner
#202 opened by Oufattole - 0
- 1
Stages that depend on code metadata having been recently computed (e.g., `filter_measurements`) should be better documented
#200 opened by mmcdermott - 2
- 0
- 1
Should distribute / package typing information too
#195 opened by mmcdermott - 2
- 0
`extract_code_metadata.py` should read in columns contributing to descriptions or parent codes as strings rather than inferring their types.
#190 opened by mmcdermott - 0
The unzipping solution causes errors if files have already been unzipped in the MIMIC-IV example
#188 opened by mmcdermott - 0
Duplication between `text_value` and `numeric_value` should be ignored when possible.
#182 opened by mmcdermott - 2
We need to be able to support joining on metadata based on partial code matches (e.g., no `valueuom`).
#148 opened by mmcdermott - 1
There should be an immediate way to identify when an entire stage has completed so entire pipelines can more directly short-circuit
#174 opened by mmcdermott - 4
Metadata extraction does not appear to be extracting some columns correctly
#156 opened by mmcdermott - 0
Metadata extraction should log a warning if code-part column names are not uniformly either extracted or not extracted across metadata sources.
#186 opened by mmcdermott - 0
MEDS-Extract Tests should be re-factored and split into multiple single-stage tests and one full-pipeline test
#183 opened by mmcdermott - 0
Make compatible with MEDS v0.3.2
#172 opened by mmcdermott - 0
Should pull the generic hydra resolvers (e.g., `get_script_docstring`) into a separate package
#180 opened by mmcdermott - 5
We need a more robust interface for ways of (a) processing numerical and categorical values and (b) normalizing output data in light of those modes.
#177 opened by mmcdermott - 0
See if we can support Python 3.11
#176 opened by mmcdermott - 0
Add badges to README
#170 opened by mmcdermott - 0
If a shard is empty, tensorization crashes.
#168 opened by mmcdermott - 5
Normalization stage is checking for aggregate_code_metadata/codes.parqet columns and metadata/codes.parquet columns in data/codes.parquet
#147 opened by Oufattole - 0
Multi-stage integration tests for pre-processing stages in sequence should be added.
#160 opened by mmcdermott - 1
Metadata input dir may be being set improperly to the last metadata stage's output directory instead of the `reducer_output_dir`
#161 opened by mmcdermott - 0
- 0
Using `do_summarize_all_codes` with `values/quantile` object configuration breaks the mapper
#165 opened by mmcdermott - 1
Typing should use Float32 Throughout
#158 opened by mmcdermott - 1
aggregate_code_metadata Quantile Binning CLI Bug
#162 opened by Oufattole - 0
Error message when `aggregate_code_metadata.py` gets an aggregation that should be an object but is just a string should be clearer.
#164 opened by mmcdermott - 1
Pipeline Configuration Improvements
#155 opened by mmcdermott - 0
`reshard_to_split` should (in a configurable manner) sub-shard the input rather than re-shard the input where possible.
#153 opened by mmcdermott - 0
The dropping of nulls and making the dataframe unique could be done once and shared across all time dependent fntrs.
#152 opened by mmcdermott