The Semantic Model Generator
is used to generate a semantic model for use in your Snowflake account.
Your workflow should be:
- Setup to set credentials.
- Usage to create a model either through Python or command line.
- Post Generation to fill out the rest of your semantic model.
- Validating Your Final Semantic Model to ensure any changes you've made are valid.
- Admin App spin up the admin app for easier generation of your semantic model.
Or, if you want to see what a semantic model looks like, skip to Examples.
We currently leverage credentials saved as environment variables. Note, host
is optional depending on your Snowflake deployment.
A. To find your Account locator, please execute the following sql command in your account.
SELECT CURRENT_ACCOUNT_LOCATOR();
B. To find the SNOWFLAKE_HOST
for your account, follow these instructions.
- Currently we recommend you to look under the
Account locator (legacy)
method of connection for better compatibility on API. - It typically follows format of:
https://<accountlocator>.<region>.<cloud>.snowflakecomputing.com
- To set these on Mac OS/Linux:
export SNOWFLAKE_ROLE="<your-snowflake-role>"
export SNOWFLAKE_WAREHOUSE="<your-snowflake-warehouse>"
export SNOWFLAKE_USER="<your-snowflake-user>"
export SNOWFLAKE_PASSWORD="<your-snowflake-password>"
export SNOWFLAKE_ACCOUNT_LOCATOR="<your-snowflake-account-locator>"
export SNOWFLAKE_HOST="<your-snowflake-host>"
- To set these on Windows:
set SNOWFLAKE_ROLE=<your-snowflake-role>
set SNOWFLAKE_WAREHOUSE=<your-snowflake-warehouse>
set SNOWFLAKE_USER=<your-snowflake-user>
set SNOWFLAKE_PASSWORD=<your-snowflake-password>
set SNOWFLAKE_ACCOUNT_LOCATOR=<your-snowflake-account-locator>
set SNOWFLAKE_HOST=<your-snowflake-host>
- To set these within a Python environment:
import os
# Setting environment variables
os.environ['SNOWFLAKE_ROLE'] = '<your-snowflake-role>'
os.environ['SNOWFLAKE_WAREHOUSE'] = '<your-snowflake-warehouse>'
os.environ['SNOWFLAKE_USER'] = '<your-snowflake-user>'
os.environ['SNOWFLAKE_PASSWORD'] = '<your-snowflake-password>'
os.environ['SNOWFLAKE_ACCOUNT_LOCATOR']='<your-snowflake-account-locator>'
os.environ['SNOWFLAKE_HOST'] = '<your-snowflake-host>'
You may generate a semantic model for a given list of fully qualified tables following the {database}.{schema}.{table}
format. Each table in this list should be a physical table or a view present in your database.
If your snowflake tables and comments do not have comments, we currently leverages cortex LLM function to auto-generate description suggestions. Those generation are suffixed with '__' and additional comment to remind you to confirm/modity the descriptions.
All generated semantic models by default are saved either under semantic_model_generator/output_models
if running from the root of this project or the current directory you're in.
- Ensure you have installed the Python package. Note, the version below should be the latest version under the
dist/
directory.
pip install dist/*.whl
- Activate Python shell
python
- Generate a semantic model.
from semantic_model_generator.generate_model import generate_base_semantic_model_from_snowflake
BASE_TABLES = ['<your-database-name-1>.<your-schema-name-1>.<your-base-table-or-view-name-1>','<your-database-name-2>.<your-schema-name-2>.<your-base-table-or-view-name-2>']
SNOWFLAKE_ACCOUNT = "<your-snowflake-account>"
SEMANTIC_MODEL_NAME = "<a-meaningful-semantic-model-name>"
generate_base_semantic_model_from_snowflake(
base_tables=BASE_TABLES,
snowflake_account=SNOWFLAKE_ACCOUNT,
semantic_model_name=SEMANTIC_MODEL_NAME,
)
Unlike the Python route above, using the CLI assumes that you will manage your environment using poetry
and pyenv
for Python versions.
This has only been tested on Mas OS/Linux.
- If you need brew,
make install-homebrew
. - If you need pyenv,
make install-pyenv
andmake install-python-3.8
. make setup
Make setup will install poetry if needed.
This is the script version run on the command line.
poetry shell
. This will activate your virtual environment.- Run on your command line.
python -m semantic_model_generator.generate_model \
--base_tables "['<your-database-name-1>.<your-schema-name-1>.<your-base-table-or-view-name-1>','<your-database-name-2>.<your-schema-name-2>.<your-base-table-or-view-name-2>']" \
--semantic_model_name "<a-meaningful-semantic-model-name>" \
--snowflake_account="<your-snowflake-account>"
Important: After generation, your YAML files will have a series of lines with # <FILL-OUT>
. Please take the time to fill these out with your business context.
By default, the generated semantic model will contain all columns from the provided tables/views. However, it's highly encouraged to only keep relevant columns and drop any unwanted columns from the generated semantic model.
In addition, consider adding the following elements to your semantic model:
- Logical columns for a given table/view that are expressions over physical columns.
- Example:
col1 - col2
could be theexpr
for a logical column.
- Example:
- Synonyms. Any additional synonyms for column names.
- Filters. Additional filters with their relevant
expr
.
After you've edited your semantic model, you can validate this file before uploading.
- Using Python. Ensure you've installed the package.
from semantic_model_generator.validate_model import validate_from_local_path
YAML_PATH="/path/to/your/model_yaml.yaml"
SNOWFLAKE_ACCOUNT="<your-snowflake-account>"
validate_from_local_path(
yaml_path=YAML_PATH,
snowflake_account=SNOWFLAKE_ACCOUNT
)
- Using the command line. Ensure
poetry shell
is activated.
python -m semantic_model_generator.validate_model \
--yaml_path="/path/to/your/model_yaml.yaml \
--snowflake_account="<your-account-name>"
- Ensure you have installed the Python package.
pip install dist/*.whl
- Start the admin streamlit app on your local.
make run_admin_app
If you have an example table in your account with the following DDL statements.
CREATE TABLE sales.public.sd_data (
id SERIAL PRIMARY KEY,
dt DATETIME,
cat VARCHAR(255),
loc VARCHAR(255),
cntry VARCHAR(255)
chn VARCHAR(50),
amt DECIMAL(10, 2),
unts INT,
cst DECIMAL(10, 2)
);
Here is an example semantic model, with data elements automatically generated from this repo and filled out by a user.
# Name and description of the semantic model.
name: Sales Data
description: This semantic model can be used for asking questions over the sales data.
# A semantic model can contain one or more tables.
tables:
# A logical table on top of the 'sd_data' base table.
- name: sales_data
description: A logical table capturing daily sales information across different store locations and product categories.
# The fully qualified name of the base table.
base_table:
database: sales
schema: public
table: sd_data
# Dimension columns in the logical table.
dimensions:
- name: product_category
synonyms:
- "item_category"
- "product_type"
description: The category of the product sold.
expr: cat
data_type: NUMBER
unique: false
sample_values:
- "501"
- "544"
- name: store_country
description: The country where the sale took place.
expr: cntry
data_type: TEXT
unique: false
sample_values:
- "USA"
- "GBR"
- name: sales_channel
synonyms:
- "channel"
- "distribution_channel"
description: The channel through which the sale was made.
expr: chn
data_type: TEXT
unique: false
sample_values:
- "FB"
- "GOOGLE"
# Time dimension columns in the logical table.
time_dimensions:
- name: sale_timestamp
synonyms:
- "time_of_sale"
- "transaction_time"
description: The time when the sale occurred. In UTC.
expr: dt
data_type: TIMESTAMP
unique: false
# Measure columns in the logical table.
measures:
- name: sales_amount
synonyms:
- "revenue"
- "total_sales"
description: The total amount of money generated from the sale.
expr: amt
data_type: NUMBER
default_aggregation: sum
- name: sales_tax
description: The sales tax paid for this sale.
expr: amt * 0.0975
data_type: NUMBER
default_aggregation: sum
- name: units_sold
synonyms:
- "quantity_sold"
- "number_of_units"
description: The number of units sold in the transaction.
expr: unts
data_type: NUMBER
default_aggregation: sum
- name: cost
description: The cost of the product sold.
expr: cst
data_type: NUMBER
default_aggregation: sum
- name: profit
synonyms:
- "earnings"
- "net income"
description: The profit generated from a sale.
expr: amt - cst
data_type: NUMBER
default_aggregation: sum
# A table can define commonly used filters over it. These filters can then be referenced in user questions directly.
filters:
- name: north_america
synonyms:
- "North America"
- "N.A."
- "NA"
description: "A filter to restrict only to north american countries"
expr: cntry IN ('canada', 'mexico', 'usa')
In order to push a new build and release, follow the steps below. Note, only admins are allowed to push release/v
tags.
You should follow the setup commands from usage-cli to install poetry and create your environment.
- Checkout a new branch from main. You should name this branch
release/vYYYY-MM-DD
. - Bump the poetry:
poetry version patch
poetry version minor
poetry version major
- Update the
CHANGELOG.md
adding a relevant header for your version number along with a description of the changes made. - Run
make build
to create a new .whl file. - Push your files for approval.
- After approval, run
make release
which will cut a new release and attach the .whl file. - Merge in your pr.
- Note: If you
make release
does not trigger the GH action. Please delete the tag and push again.