Learn Snowpark is a project aimed at acquiring proficiency in using Snowpark, a tool developed by Snowflake for running SQL queries in programming language environments like Python. This project is based on the official Snowflake-Labs template called 'snowpark-python-template' (https://github.com/Snowflake-Labs/snowpark-python-template).
The initial task accomplished in this project involved uploading the classic Iris dataset to Snowflake for analysis, a common dataset in the field of machine learning. Following this, a series of comprehensive tests and analyses were conducted, including:
- EDA (Exploratory Data Analysis): Extensive exploratory data analysis was performed using various tools such as ydata_profiling, sweetviz, and dtale. These tools were employed to gain valuable insights into the dataset, understand its characteristics, and identify patterns and trends in the data. The EDA process played a crucial role in preparing the data for further analysis and decision-making.
The JSON connector has been developed by me so that I don't have to use environment variables. Simply create a connection.json file in the root of the project with the following format.
{
"account" : "<replace with your account identifer>",
"user" : "<replace with your username>",
"password" : "<replace with your password>",
"role" : "<replace with your role>",
"warehouse" : "<replace with your warehouse>",
"database" : "<replace with your database>",
"schema" : "<replace with your schema>"
}
Once we have it, we have to use the get_json_config function found in the src/util/local.py script.
Original connection mode of the template. Set the following environment variables with your Snowflake account information:
# Linux/MacOS
export SNOWSQL_ACCOUNT=<replace with your account identifer>
export SNOWSQL_USER=<replace with your username>
export SNOWSQL_ROLE=<replace with your role>
export SNOWSQL_PWD=<replace with your password>
export SNOWSQL_DATABASE=<replace with your database>
export SNOWSQL_SCHEMA=<replace with your schema>
export SNOWSQL_WAREHOUSE=<replace with your warehouse>
# Windows/PowerShell
$env:SNOWSQL_ACCOUNT = "<replace with your account identifer>"
$env:SNOWSQL_USER = "<replace with your username>"
$env:SNOWSQL_ROLE = "<replace with your role>"
$env:SNOWSQL_PWD = "<replace with your password>"
$env:SNOWSQL_DATABASE = "<replace with your database>"
$env:SNOWSQL_SCHEMA = "<replace with your schema>"
$env:SNOWSQL_WAREHOUSE = "<replace with your warehouse>"
Optional: You can set this env var permanently by editing your bash profile (on Linux/MacOS) or using the System Properties menu (on Windows).
Create and activate a conda environment using Anaconda:
conda env create --file environment.yml
conda activate snowpark
Press Ctrl
+Shift
+P
to open the command palette, then select Python: Select Interpreter and select the snowpark interpreter under the Conda list.
Go to File > Settings > Project > Python Interpreter and select the snowpark interpreter.
- A Snowflake account
- Python 3.8 or greater
- An IDE or code editor (VS Code, PyCharm, etc.)
- Iris dataset: https://archive.ics.uci.edu/dataset/53/iris
Once you've set your credentials and installed the packages, you can test your connection to Snowflake by executing the stored procedure in app.py
:
python src/app.py [connection.json]
You should see the following output:
------------------------------------------------------
|"HELLO_WORLD" |
------------------------------------------------------
|Welcome to Learn Snowpark! |
|Learn more: https://github.com/agr17/learn-snow... |
------------------------------------------------------
python src/sizes.py [connection.json]
python src/eda.py <ydata | dtale | sweetviz> [connection.json]
You can run the test suite locally from the project root:
python -m pytest
Have an idea for an improvement? Fork this repository and open a PR with your idea!