- need to be superuser on a local postgresql
- create a database called
sana
- replace all instances of hardcoded file path
/home/rjspotter/code/sana
withpwd
- run the files in the src directory in order e.g.
psql -h localhost -f src/000_schema.sql sana
-
Using a Hub and Spoke architecture
-
Decided to separate the concepts of Member, Patient, and Person
- It's more flexible this way
- Since there's no direct link between Member and Patient in the data going through a 3rd concept allows you to refactor that concept later
- Especially since that 3rd concept is a little fuzzy
-
The surrogate (hub_id) key for Person is "Latin Normative" i.e. assumes that only letters and numbers are useful in identifying people
-
The surrogage key for Person is also not collision-proof so these results can only be considered "probabalisticly true"
-
Does not account for duplicate or late arriving data
-
hubs, links, and (non-conformed) satellites can be processed in parallel
-
"average per-claim cost across all members" doesn't specify which average so I gave mean, median, and mode
-
"most expensive patients" stops as the concept of a patient but could join in the concept of a person if we needed to display name et al
- Unit Rate is blank so I dropped it from the warehouse.