- Snowflake is a
fully managed
cloud data platform - It is deployed by the vendor
- No access to the underlying OS or hardware
- No need for ongoing component adminstration
- Cloud: AWS, Azure, GCP
- Service
- Storage seperated from compute
- Multiple compute clusters("Virtual Warehouses")
- Virtual warehouses concurrently access data without connection
- Centralized management
- Metadata seperate from storage and compute
- Full transactional consistency across entire system.
- Data Warehouse
- Data Lake
- Data Processing
- Data Sharing
-
Core Services:
- Virtual Warehouses: Cluster of compute resources that executes database queries and commands
- Snowpipe
-
Tools:
- Web portal
- Command line interface - SnowSQL
- Connectors and drivers
-
Connectors:
- Python
- Spark
- Kafka
-
Drivers:
- Node.js
- Go
- .NET
- JDBC
- OBBC
-
3rd Party Applications:
- Power BI
- tableau
- salesforce
- looker
- Storage
- It is
Flat rate per terabyte
- Based on
Region Payment Plan
- Storage Charges:
Data + Time Travel + Fail-Safe
- Time Travel: Enables access to data that has been changed or deleted at any point within a defined period
- Fail-Safe: Recovery from a catastrophic incident like storage failure, corruption or security breach, It is last resource measure
- Continuous Data Protection: Active Data(Data is active until it is changes) -> Time Travel (Records stay here for the duration of the Time Travel period(1-90 Days)) -> Automatice Fail-Safe (Records stay here for another 7 days after Time Travel Expiration)
- It is
- Compute Resources:
- Number and size of virtual warehouses
- Serverless features:
Snowpipe
,Background table mainternance
,Database replication
- Cloud services:
Authentication and Authorization
,Infrastructure management
,Query parsing and optimization
- Data transfer
Unit to measure to pay for compute resources
-
Standard:
- SQL data warehouse engine
- Encryption in transit and at rest
- All system tools available
- One day of time-travel
-
Enterprise
- Standard+
- Materialized views, data masking, point-lookup optimization
- Multi-Cluster warehouses
- Periodic rekey of encryption
- Up to 90 days of Time-Travel
-
Business Critical
- Enterprise
- HIPAA and PCI support
- Customer-Managed keys
- AWS and Azure Private links
- Disaster recovery failover and failback
-
Virtual Private Snowflake
- Business Critical+
- Dedicated metadata store and pool of virtual servers
- Isolated environment from all other snowflake accounts
-
Massively Parallel Processing Database Engine
-
Decoupled Compute and Storage
-
Tight integration with cloud storage
-
Operations:
-
Executing DDL does not always require an active Virtual Warehouse
-
It has procedure for the automatic operations
-
It provides the Flexibility and Elasticity
-
Size
-
The actual monetary cost of a credit depends on the edition of your account
-
Multiple Virtual Warehouses
-
Multi-Cluster Warehouses
-
Table Storage
- Micro Partitions:
- Automatic partitioning scheme
- 50MB-500MB in size
- Compressed columnar storage
- Pruned during query execution
- Clusteting key
- Tables 1TB+
- Orderes the micro-partition records based on the key
- Automatically maintained by Snowflake
- Useful for range and equqlity predicates on the key
- Search Optimization Service
- Serverless feature
- Enterprise edition
- Table-level property
- Micro Partitions:
Not Null is enforced
- User Defined Functions
- External Functions
- Stored procedures
- Modules can be coded in SQL or JavaScript
-
Installation: https://docs.snowflake.com/en/user-guide/snowsql-install-config
-
Run command:
snowsql -a <account-name> -u <login-name>
-
Add the account-name, username and passord in the ./snowsql/config file
-
Run:
snowsql
-
Supported file formats
-
Options:
-
Best Practices
- Loading virtual warehouses
- 10 to 100 MB compressed
- Aggregate or split data to get this range
- Seperate the files by folders with source and date information
- Web portal
- Copy command
- Snowpipe
-
It is used for loading data from files into Snowflake tables and unloading data into files
-
Types: Internal: Storaged and managed by snowflake, External: s3, azure, gcp storage
-
Internal Stages: User Stage, Table Stage, Named Stage
-
Import Process
- Continuous data loading
- Severless feature
- Same file best practices
- Recommended 1 minute cadence
- Script
-
Export Process Script
- Access Control
-
Users can temporarily assume the permissions of any of their roles when necessary.
-
Role activation can be done through USE ROLE Command and Web Portal
-
Built-In Roles: ACCOUNTADMIN, SECURITYADMIN, SYSADMIN, USERADMI, PUBLIC
-
Securables,
-
Securbles: Virtual Warehouses, Users, Integrations
- Monitoring involves virtual warehouse and query usages.