NIH Long COVID Computational Challenge -- Targeted Machine Learning Analysis Group

This is the formatted competition code for the L3C Challenge entry of the Targeted Machine Learning Analysis Group at UC Berkeley. See (TODO: maybe add writeup here for details of our analysis plan and results)

Steps for replicating the analysis

obtain the synthetic data (contact @trberg for box access)
extract the synthetic data: tar -xzf synthetic_data.tar.gz

add the additional data files to the synthetic data folder:

   LL_concept_sets_fusion_everyone.csv
   LL_DO_NOT_DELETE_REQUIRED_concept_sets_all.csv

build the docker container utils/build.sh
run utils/do_analysis.sh
fit models and predictions will be in the output folder

Enclave code formatter

The python module format_code can process raw code exported from the enclave (as in the src_raw folder) and generate runnable python code (as in the src folder). R is not currently supported.

BerkeleyBiostats/l3c_ctml

Steps for replicating the analysis

Enclave code formatter