An open source repository for applying dbt to NGPVAN pipeline data
- About the package
- How do I use this package?
- Layers
- Macros
- DOCUMENTATION COMING SOON
- Coming in a future release...
- How is this package maintained and can I contribute?
- Code of Conduct
Current version: v1.1.0
This package is currently limited to a selection of transactional NGPVAN tables which can contain voterfile data, and several reference/lookup tables used to enhance that data.
Our goal is to add more tables over time and eventually have a complete package covering the 100+ NGPVAN tables provided in their Pipeline sync.
NGPVAN is a complicated system with a lot of variety in functionality and configuration, and every organization uses it differently. We are hoping to develop this package to be extremely flexible and allow for a wide variety of use-cases and levels of complexity.
NOTE: The package is currently optimized for BigQuery, but we are working to add Redshift compatability.
Include the following dbt-ngpvan package version in your packages.yml
file:
packages:
- git: "https://github.com/move-coop/dbt-ngpvan.git"
revision: [">=1.1.0", "<1.2.0"]
Run dbt deps
In order to run this package in your dbt project you must first add your source schema to the config settings.
Add the following configuration to your root dbt_project.yml
file:
vars:
dbt_ngpvan_config:
schema_list: ['your_schema']
If you have raw NGPVAN data in more than one schema, you can list all of them here and the package will union the same raw tables across all schemas. All models include a source_schema
column so you can differentiate between data coming from different schemas.
There are a number of additional config settings you can use to customize the package to your needs.
Below are the default settings for these configs and more detail on using them.
-
schema_list: REQUIRED
- default:
schema_list: []
-
vendor_name
- default:
vendor_name: ngpvan
- description: Many of us use different names to refer to NGPVAN - VAN, SmartVAN, NGP, EveryAction, etc. This setting controls the identifier each model will use in your database.
- example: Changing this setting to 'van' means that running the
base_ngpvan__contactscontacts
model will create a table in your database namedbase_van__contactscontacts
-
source_database
- default:
source_database: target.database
- description: Most folks won't need to use this setting, only needed if your raw/source data is in a different database than the one set in your
profiles.yml
file.
-
table_logic:
- default:
table_logic: pattern
- description: By default, each base model in the package looks for any tables in the schema(s) you've listed above that match their table name pattern. This accounts for tables with prefixes (like the
tsm_
used in many SmartVAN tables). This setting requires that your source tables match the table name patterns typically provided in a VAN Pipeline sync, with no underscores A table namedtsm_tmc_contactsactivistcodes
will work, but a table namedcontacts_activist_codes
will NOT work. If your tables do not match these patterns you'll need to either create views with the correct names, or use thetable_logic: list
setting (still in development).
-
table_exclude_list
- default:
table_exclude_list: []
- description: List any tables in the source schema(s) which match the table patterns but should NOT be used as sources in the base models.
- example: If you have both mymembers (
_mym
) and voterfile (_myv
or_vf
) tables but you don't want both pulled into the base models you can list one set here, e.g.[contactsactivistcodes_mym]
-
lookup_tables:
- default:
lookup_tables: false
- description: Enables/disables staging models for lookup tables. Values from these tables are already being joined into the appropriate staging models, so building stg models for these is typically unnecessary. Includes these tables:
codetypes
contacttypes
results
-
packages
- default:
packages: - core: enabled: true - myvoters: enabled: true
- description: Enables/disables models and certain functionality for different NGPVAN packages, currently only the core and voterfile packages (additional packages are in development). Set
myvoters: enabled: false
if you don't have_myv
or_vf
tables.
-
table_list IN DEVELOPMENT
-
add_ons IN DEVELOPMENT
-
segmentation IN DEVELOPMENT
- Function:
- Prep layer for staging: unions data across schemas into a common table and adds metadata fields
- Default schema:
dbt_base
- Default table names:
base_{vendor_name | ngpvan}__{tablename}
- Default materialization: view
- Metadata columns added:
_dbt_source_relation
source_schema
source_table
database_mode
(if myvoters is enabled)is_myvoters
(if myvoters is enabled)segment_by
(set to committeeid)
- Additional notes:
- Function:
- This layer creates the basic building blocks utilized for all downstream data transformations
- Standardizes and cleans incoming raw data
- Renames columns
- Default schema:
dbt_staging
- Default table names:
stg_ngpvan__{table_name}
- Transformations:
- Clean and format fields such as timestamps, phone numbers, email addresses, etc
- Metadata columns added:
- All metadata columns created in
base
layer vendor
column
- All metadata columns created in
- Additional notes:
- Users may add their own columns by customizing the
ngpvan__stg__additional_fields
macro
- Users may add their own columns by customizing the
- Function:
- Create denormalized tables for downstream querying/dashboard/reporting use, and for eventual joins/unions with data from other vendors/platforms.
- Default schema:
dbt_intermediate
- Default table names:
int_ngpvan__{table_name}
- Tables included:
int_ngpvan__01__activist_codes
int_ngpvan__01__admins
int_ngpvan__01__contacts_attempts
int_ngpvan__01__survey_responses
- Transformations:
- Major joins, transformations, and aggregations
- Reformat columns and add columns where helpful
- Metadata added:
- All metadata columns created in
base
andstaging
layers
- All metadata columns created in
- Additional notes:
- Users may add their own columns by customizing the
ngpvan__int__additional_fields
macro
- Users may add their own columns by customizing the
*** DOCUMENTATION COMING SOON ***
- Macro documentation
- Add column lists and descriptions to YAML model files
- Complete set of models for "core" NGPVAN package
- Models for additional NPGVAN packages (digital, ngp, development, etc)
- Individual model versioning
- Changelog
- Improved Redshift compatability
- Additional adapter compatability
The team maintaining this package only maintains the latest version of the package. We highly recommend that you stay consistent with the latest version of the package.
A small team of analytics engineers at withDataLove and The Movement Cooperative develops and maintains these dbt packages. However, the packages are made better by community contributions!
We highly encourage and welcome contributions to this package. Check out this dbt Discourse article to learn how to contribute to a dbt package!
Please read the code of conduct before using or contributing to this package!