mmcdermott/MEDS_transforms

A simple set of MEDS polars-based ETL and transformation functions

PythonMIT

Pinned issues

Release 0.1 Tracker

#35 opened 6 months ago by mmcdermott

Open0

Issues

If a time format is provided but the column is already a datetime type it should not throw an error and should instead warn the user
#232 opened a month ago by mmcdermott
0
You should be able to re-type columns during extraction dynamically
#231 opened a month ago by mmcdermott
0
Boolean additional columns should be supported during extraction
#230 opened a month ago by mmcdermott
0
Update nested-ragged-tensors dependency to >=0.0.8
#212 opened 2 months ago by Oufattole
0
Get test coverage to 100%
#216 opened 2 months ago by mmcdermott
0
Operating only on splits via a split file is not appropriately tested.
#214 opened 2 months ago by mmcdermott
0
Should update log tests with pytest-loguru
#226 opened 2 months ago by mmcdermott
0
Passing external splits during extraction fails due to type mismatch between string and Path
#222 opened 2 months ago by mmcdermott
0
Filtering via the splits map file rather than shard names is broken.
#221 opened 2 months ago by mmcdermott
0
Error case integration tests should check for the right error message.
#220 opened 2 months ago by mmcdermott
0
Upgrade nested_ragged_tensors to >= 0.0.8
#219 opened 2 months ago by Oufattole
1
File paths with spaces in them break the runner
#217 opened 2 months ago by mmcdermott
0
Attend to the new "MEDS_DEATH" code
#209 opened 2 months ago by rvandewater
4
Shoud add a test with non-standard splits
#215 opened 2 months ago by mmcdermott
0
Single stage tester is likely not checking for the right kinds of errors when `should_error` is True
#213 opened 2 months ago by mmcdermott
0
We should be able to convert between different ontological code vocabularies.
#204 opened 3 months ago by mmcdermott
4
Static DataFrame Missing Rows for Patients Without Static Data
#205 opened 3 months ago by Oufattole
5
Lock files should be pipeline ID specific in some way -- this will enable pipelines to flag when old run locks are present.
#194 opened 4 months ago by mmcdermott
1
add_time_derived_measurements breaks if you use _script in the meds_transform_runner
#202 opened 4 months ago by Oufattole
1
All stages must have unique names or an error should be thrown.
#201 opened 4 months ago by mmcdermott
0
Stages that depend on code metadata having been recently computed (e.g., `filter_measurements`) should be better documented
#200 opened 4 months ago by mmcdermott
1
Misalignment Between Static and Event Sequence DataFrames
#197 opened 4 months ago by Oufattole
2
Add wget blocks to run.sh for MIMIC and eICU pipelines
#196 opened 4 months ago by coderabbitai
0
Should distribute / package typing information too
#195 opened 4 months ago by mmcdermott
1
The MIMIC ETL does not fully normalize parent codes to omop vocabs.
#181 opened 4 months ago by mmcdermott
2
`extract_code_metadata.py` should read in columns contributing to descriptions or parent codes as strings rather than inferring their types.
#190 opened 4 months ago by mmcdermott
0
The unzipping solution causes errors if files have already been unzipped in the MIMIC-IV example
#188 opened 4 months ago by mmcdermott
0
Duplication between `text_value` and `numeric_value` should be ignored when possible.
#182 opened 4 months ago by mmcdermott
0
We need to be able to support joining on metadata based on partial code matches (e.g., no `valueuom`).
#148 opened 5 months ago by mmcdermott
2
There should be an immediate way to identify when an entire stage has completed so entire pipelines can more directly short-circuit
#174 opened 4 months ago by mmcdermott
1
Metadata extraction does not appear to be extracting some columns correctly
#156 opened 4 months ago by mmcdermott
4
Metadata extraction should log a warning if code-part column names are not uniformly either extracted or not extracted across metadata sources.
#186 opened 4 months ago by mmcdermott
0
MEDS-Extract Tests should be re-factored and split into multiple single-stage tests and one full-pipeline test
#183 opened 4 months ago by mmcdermott
0
Make compatible with MEDS v0.3.2
#172 opened 4 months ago by mmcdermott
0
Should pull the generic hydra resolvers (e.g., `get_script_docstring`) into a separate package
#180 opened 4 months ago by mmcdermott
0
We need a more robust interface for ways of (a) processing numerical and categorical values and (b) normalizing output data in light of those modes.
#177 opened 4 months ago by mmcdermott
5
See if we can support Python 3.11
#176 opened 4 months ago by mmcdermott
0
Add badges to README
#170 opened 5 months ago by mmcdermott
0
If a shard is empty, tensorization crashes.
#168 opened 5 months ago by mmcdermott
0
Normalization stage is checking for aggregate_code_metadata/codes.parqet columns and metadata/codes.parquet columns in data/codes.parquet
#147 opened 5 months ago by Oufattole
5
Multi-stage integration tests for pre-processing stages in sequence should be added.
#160 opened 5 months ago by mmcdermott
0
Metadata input dir may be being set improperly to the last metadata stage's output directory instead of the `reducer_output_dir`
#161 opened 5 months ago by mmcdermott
1
Aggregation integration test should cover all integrations
#163 opened 5 months ago by mmcdermott
0
Using `do_summarize_all_codes` with `values/quantile` object configuration breaks the mapper
#165 opened 5 months ago by mmcdermott
0
Typing should use Float32 Throughout
#158 opened 5 months ago by mmcdermott
1
aggregate_code_metadata Quantile Binning CLI Bug
#162 opened 5 months ago by Oufattole
1
Error message when `aggregate_code_metadata.py` gets an aggregation that should be an object but is just a string should be clearer.
#164 opened 5 months ago by mmcdermott
0
Pipeline Configuration Improvements
#155 opened 5 months ago by mmcdermott
1
`reshard_to_split` should (in a configurable manner) sub-shard the input rather than re-shard the input where possible.
#153 opened 5 months ago by mmcdermott
0
The dropping of nulls and making the dataframe unique could be done once and shared across all time dependent fntrs.
#152 opened 5 months ago by mmcdermott
0