This repo contains codebase for extraction raw HCUP data via HCUP load scripts then loading them as parquet onto our UHC server. This is the EL part of our ELT pipeline. The Transformation is organized in a data warehouse repository found at the hcup-dbt repo.
Please visit this repository's GitHub page for additional details.
Other useful links include:
This folder contains our extraction and loading codebase.
index.R
controller script. Use this to manipulate the data extraction loading process.📁 renv
contains project specific dependency management as per the renv package.📁 raw-hcup
local development folder containing .acs, .dta, .do and .csv files.📁 R
local project functions📁 documents
supplemental documents📁 code
local scripts📁 clean
local cleaned objects
graph LR
classDef subgraph_padding fill:none,stroke:none
subgraph lan [ELT]
subgraph subgraph_padding1 [ ]
style lan stroke-dasharray: 5 5
subgraph <b>E</b>xtraction
n1[.acs]--Stata load program-->n2[.dta]
end
n2[.dta]---n3[.dta]
subgraph <b>L</b>oading
n3[.dta]--R-->n4[.parquet]
end
n4[.parquet]---n5[.parquet]
subgraph <b>T</b>ransformation
n5[.parquet]--DBT-DuckDB-->n6[analytical_files]
end
end
end
class subgraph_padding1 subgraph_padding
graph LR
classDef subgraph_padding fill:none,stroke:none
subgraph subgraph_padding1 [ ]
subgraph n1["Existing Drexel Infrastructure"]
n6["Encrypted UHC Server"]--Arrow/DuckDB-->n5["R/Python-Git"]
n5["R/Python-Git"]--Arrow/DuckDB-->n6["Encrypted UHC Server"]
end
end
class subgraph_padding1 subgraph_padding