HCUP Extraction Loading

This repo contains codebase for extraction raw HCUP data via HCUP load scripts then loading them as parquet onto our UHC server. This is the EL part of our ELT pipeline. The Transformation is organized in a data warehouse repository found at the hcup-dbt repo.

Please visit this repository's GitHub page for additional details.

Other useful links include:

Folders

This folder contains our extraction and loading codebase.

  • index.R controller script. Use this to manipulate the data extraction loading process.
  • 📁 renv contains project specific dependency management as per the renv package.
  • 📁 raw-hcup local development folder containing .acs, .dta, .do and .csv files.
  • 📁 R local project functions
  • 📁 documents supplemental documents
  • 📁 code local scripts
  • 📁 clean local cleaned objects

ELT (Extraction Loading Transform) schematic

graph LR
classDef subgraph_padding fill:none,stroke:none
 subgraph lan [ELT]
 subgraph subgraph_padding1 [ ]
   style lan stroke-dasharray: 5 5
         subgraph  <b>E</b>xtraction
           n1[.acs]--Stata load program-->n2[.dta]
         end
         n2[.dta]---n3[.dta]
         subgraph <b>L</b>oading
           n3[.dta]--R-->n4[.parquet]
         end
         n4[.parquet]---n5[.parquet]
         subgraph <b>T</b>ransformation
            n5[.parquet]--DBT-DuckDB-->n6[analytical_files]
        end
        end
 
 end       
 class subgraph_padding1 subgraph_padding

Infrastructure Summary

graph LR
classDef subgraph_padding fill:none,stroke:none

 subgraph subgraph_padding1 [ ]
  
        
        
         subgraph n1["Existing Drexel Infrastructure"]
            n6["Encrypted UHC Server"]--Arrow/DuckDB-->n5["R/Python-Git"]
            n5["R/Python-Git"]--Arrow/DuckDB-->n6["Encrypted UHC Server"]
        end
     
 
 end       
 class subgraph_padding1 subgraph_padding