A starter template for Snowflake Cloud Database
- Snowflake offers a 1 month free trial, and with this repo you should be able to get a sense for the basics of Snowflake within an hour.
- This template will create the necessary
ROLE
,USER
,DATABASE
,SCHEMA
&WAREHOUSE
(s) necessary to get up and running with Snowflake:
Strongly recommend taking a looking at managing Snowflake with Terraform once you're familiar with the basics.
- Snowflake instance (takes 5min to setup, no credit card for 1 month)
- SnowSQL CLI (optional)
Copy first_run.sql into a worksheet as in the screenshot below and Run All.
Or use the CLI
snowsql -c <your_connection_name> -f first_run.sql
The following is created, as described in first_run.sql
├── DATABASES
│ ├── RAW # This is the landing pad for everything extracted and loaded
│ └── ANALYTICS # This database contains tables and views accessible to analysts and reporting
├── WAREHOUSES
│ ├── WAREHOUSE_INGEST # Tools like Stitch will use this warehouse to perform loads of new data
│ ├── WAREHOUSE_TRANSFORM # This is the warehouse that dataform/dbt will use to perform all data transformations
│ ├── WAREHOUSE_REPORT # BI tools will connect to this warehouse to run analytical queries
├── ROLES
│ ├── ROLE_INGEST # Give this role to your Extract/Load tools/scripts to load data
│ ├── ROLE_TRANSFORM # Give this role to Dataform/dbt to transform data, and Data Engineers
│ ├── ROLE_REPORT # Give this role to BI tools / Analysts to query analytics data
├── USERS
│ ├── USER_INGEST # eg: Stitch User
│ ├── USER_TRANSFORM # eg: Dataform User
│ ├── USER_REPORT # eg: Looker user
Use the test_permissions.sql SQL to:
- create a base table in the
RAW
database, load a test row using theROLE_INGEST
role - create a new table and view in
ANALYTICS
using theROLE_TRANSFORM
role - query that view using the
ROLE_REPORT
role
NB replace <USERNAME>
in the file with your login name
Or use the CLI:
snowsql -c <your_connection_name> -f test_permissions.sql
JSON is very well handled in Snowflake, and worth a look. The test_json.sql file runs through the flattening of raw JSON into a table.
Or use the CLI:
snowsql -c <your_connection_name> -f test_json.sql
- Key to note is the
RECURSIVE=>TRUE
flag
UDF allow you to create functions in SQL or JavaScript. The test_udf.sql file runs through the creation and testing of a SQL and JavaScript UDF. See the docs for more
Or use the CLI:
snowsql -c <your_connection_name> -f test_udf.sql
The first_run_drop.sql file will drop all objects created by first_run.sql
Or use the CLI:
snowsql -c <your_connection_name> -f first_run_drop.sql
If you want to do this more than once, the SnowSQL CLI is great.
git clone https://github.com/mattarderne/snowflake-starter.git
cd snowflake-starter
snowsql -c <your_connection_name> -f first_run.sql
If the following script runs without error, then that is an end to end test... it should take about a minute. (change the <placeholders>
in the file)
sh tests/run.sh
SnowAlert is a project maintained by Snowflake that provides some useful system monitoring features. I like to use some of the queries they have created to monitor cost spikes.
The snowAlert.sql creates the views and runs the queries necessary to get alerts. Running it daily in Dataform/dbt is a nice way to get custom alerts to unusual spikes
snowsql -c <your_connection_name> -f utils/snowAlert.sql -o friendly=false -o quiet=true
If you'd like to keep track of the evolution of your Snowflake Data Warehouse, snowflakeinspector is a great tool to do just that. Query your metadata and paste the results into their tool and you'll get a nice explorable visualisation as below:
- Thanks to Trevor for reviews, thoughts and guidance, checkout his code here
- Thanks for the inspiration Calogica.com and Fishtown Analytics. JSON tutorial here
- Read my writing on the topic of Data Systems at groupby1
- think about adding Snowsql CLI
- add some JSON to the permissions test
- add the Snowflake credits query pack 12
- think about snowflake-inspector inclusion github
- add a UDF
- compare
TO ROLE role
andTO role
- update
first_run.sql
to use Variables like these - Add something about masking semi-structured data with snowflake
- think about permissions management with permifrost
- add some more automation to the testing
- add an option to pipe some data into snowflake
- add a quick deploy BI tool with dbt? example
- think about adding some kind of Query credit usage analysis and troubleshooting
- think about adding some over-permission analysis
- add IP whitelisting to script
- create a script to run and specify account name etc
- fine tune the warehouse specifications appropriately
- add some features to make sure this is compatible with RA_datawarehouse
- update the sql to match the mm style guide
- Add some data with the docker dataset
- Schema:
- explore "analytics" database for primary keys, analyse for similarity, unnamed primary keys, variables etc
- make suggestions
- Costs
- Explore query history and build a recommendation for query optimisation
- Visualise