What is Santoku?
Santoku is a toolkit written in Python for interacting with AWS, Google Cloud platform, Salesforce and Slack.
The purpose of Santoku is to have the interactions with all the external services collected in a single package. The package contains wrappers around the respective APIs and high level methods for the most common patterns in order to simplify the interaction with those services, whether by being shorter to type, more descriptive, more specific to our needs or simply easier to read for developers.
Quickstart
Installation
If you have the wheel, you can install it with:
pip install --upgrade --force-reinstall dist/santoku-*.whl
Run the following command to install it from PyPI:
pip install santoku
Or use the following to install it via Poetry
poetry add santoku
How To Use It
You can use the package as follows:
from santoku.slack.slack_bot_handler import SlackBotHandler
slack_bot = SlackBotHandler.from_aws_secrets_manager("your_secret")
Content
The package santoku
contains several subpackages: aws
, google
, salesforce
, slack
, sql
. Each subpackage provides connection to different external services and are formed by a collection of modules, where each module consists of handlers for more specific services. Each handler class has unit tests to ensure the correct behaviour of the methods of these classes.
AWS
AWS (Amazon Web Services) is a cloud computing platform that provides a set of primitive abstract technical infrastructure and distributed computing building blocks and tools.
The connection to AWS has been done through the AWS SDK, which in Python is called boto3. We provide wrappers of the boto3
SDK to make easy the operations to interact with different services.
The use of this subpackage requires having AWS credentials somewhere. We provide flexibility to either keep credentials in AWS credentials/configuration file, set environment variables, or to pass them directly as arguments in the initializer of each handler class. More info on AWS configurations and credentials here.
The unit tests in this subpackage implement mocks to the AWS services and do not pretended to access or modify the environment of your real account. In order to have safer unit tests for the AWS subpackage, we use moto, a mocking library for most AWS services, which allows our methods to interact with a fully mocked version of the AWS environment via decorators while not needing an actual connection to the internet or a test AWS account.
Amazon S3
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability, data availability, security, and performance.
We provide methods to easily list and delete objects inside buckets; read and write content within S3 objects; upload a dataframe into csv or parket format to a specific location; generate and upload an Amazon Quicksight manifest in S3 in order to create analysis in Amazon Quicksight, and so on.
An object can be uploaded to S3 with the following:
from santoku.aws.s3_handler import S3Handler
s3_handler = S3Handler()
s3_handler.put_object(bucket="your_bucket_name", object_key="your_object_key", content="Your object content.")
AWS Secrets Manager
AWS Secrets Manager protects secrets needed to access applications, services, and IT resources. The service allows rotating, managing, and retrieving credentials, keys, and other secrets.
We provide methods to get the content of a previously created secret.
from santoku.aws.secrets_manager_handler import SecretsManagerHandler
secrets_manager = SecretsManagerHandler()
secret_content = secrets_manager.get_secret_value(secret_name="your_secret_name")
We use this service as our default credential manager. Most classes that require some form of authentication in santoku are provided with alternative class methods that retrieve the credentials directly from Secrets Manager. For example, instead of directly providing credentials to the BigQuery handling class, we simply provide it with the name of the secret where they are stored:
from santoku.google.bigquery import BigQueryHandler
bigquery_handler = BigQueryHandler(
type="your_type",
project_id="your_project_id"
private_key_id="your_private_key_id"
private_key="your_private_key"
client_email="your_client_email"
client_id="your_client_id"
auth_uri="your_auth_uri"
token_uri="your_token_uri"
auth_provider_x509_cert_url="your_auth_provider_x509_cert_url"
client_x509_cert_url="your_client_x509_cert_url"
)
or
bigquery_handler = BigQueryHandler.from_aws_secrets_manager(
secret_name="your_secret_name"
)
Amazon Simple Queue Service
Amazon Simple Queue Service (SQS) is a fully managed message queuing service that supports programmatic sending of messages via web service applications as a way to communicate over the Internet.
We provide methods to receive, delete, and send single or a batch of messages.
from santoku.aws.sqs_handler import SQSHandler
sqs_handler = SQSHandler()
entries = [
{
"Id": "Id1",
"MessageBody": "Your message 1",
},
{
"Id": "Id2",
"MessageBody": "Your message 1",
}
]
sqs_handler.send_message_batch(queue_name="your_queue_name", entries=entries)
Google Cloud Platform
Google Cloud Platform a suite of cloud computing services provided by Google that runs on the same Cloud infrastructure that Google uses internally for its end-user products.
The connection to Google Cloud Platform has been done using the google-cloud-core package.
The use of this subpackage requires having Google Cloud Platform credentials (in this case, a service account for programmatic access), these can be passed as arguments in the initializer of the handler class directly, or you can store them in AWS Secrets Manager and retrieve them during the initialization using the class method instead.
We provide a handler that allows doing queries on BigQuery services:
query_results = bigquery_handler.get_query_results(query="SELECT * FROM `your_table`")
Salesforce
Salesforce is a Customer Relationship Management (CRM) platform that gives to the marketing, sales, commerce, and service depertments a single, shared view of every customer.
The connection to Salesforce has been done using the Salesforce REST API.
The use of this subpackage requires having Salesforce credentials, these can be passed as arguments in the initializer of the handler class directly, or you can store them in AWS Secrets Manager and retrieve them during the initialization using the class method instead.
This subpackage provide methods to insert/modify/delete salesforce object records. You can perform operations by doing HTTP requests directly or using methods with higher level of abstraction, which are easier to handle. The lasts ones are just wrappers of the HTTP request method. To obtain records you can perform queries using SOQL.
The unit tests require valid Salesforce credentials to be executed. The tests are implemented in the way that no new data will remain in the account and no existent data will be modified. However, having Salesforce credentials for sandbox use is recommended.
You can use the package to perform a request as follows.
from santoku.salesforce.objects_handler import ObjectsHandler
objects_handler = ObjectsHandler(
auth_url="your_auth_url",
username="your_username",
password="your_password",
client_id="your_client_id",
client_secret="your_client_secret",
)
contact_payload = {"FirstName": "Alice", "LastName": "Ackerman", "Email": "alice@example.com"}
objects_handler.do_request(method="POST", path="sobjects/Contact", payload=contact_payload)
or insert a record with
objects_handler.insert_record(sobject="Contact", payload=contact_payload)
Finally, you can do a SOQL query with:
records = objects_handler.do_query_with_SOQL("SELECT Id, Name from Contact")
Slack
Slack is a proprietary business communication platform. A Slack Bot is a nifty way to run code and automate tasks. In Slack, a bot is controlled programmatically via a bot user token that can access one or more of Slack’s APIs.
The connection to Slack has been done using the Slack Web API.
The use of this subpackage requires having Slack API Token of a Slack Bot, which can be passed as argument in the initializer of the handler class directly, or you can store it in AWS Secrets Manager and retrieve it during the initialization using the class method instead.
This subpackage provide methods to send messages to a channel. A message can be sent with:
from santoku.slack.slack_bot_handler import SlackBotHandler
slack_bot_handler = SlackBotHandler(api_token="your_api_token")
slack_bot_handler.send_message(channel="your_chanel_name", message="Your message.")
SQL
SQL (Structured Query Language) is a domain-specific language designed for managing data held in a relational database management system (RDBMS). The purpose of this subpackage is to provide connection to different RDBMSs.
MySQL
MySQL is an open-source RDBMS. The connection to MySQL has been done using the MySQL Connector for python.
The use of this subpackage requires having MySQL authentication parameters, which can be passed as argument in the initializer of the handler class directly, or you can store it in AWS Secrets Manager and retrieve it during the initialization using the class method instead.
This subpackage provides methods to do queries and retrieve the results in different forms.
from santoku.sql.mysql_handler import MySQLHandler
mysql_handler = MySQLHandler(user="your_user", password="your_password", host="your_host", database="your_database")
mysql_handler.get_query_results(query="SELECT * FROM your_table")
Development
Project
We use Poetry to handle this project. Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. More detais in their documentation.
Poetry is already included in the development environment.
Dependencies
If you want to add dependencies to your project, you can specify them in the tool.poetry.dependencies
section of the pyproject.toml
file. Also, instead of modifying the pyproject.toml
file by hand, you can use the add command:
poetry add <package_name>
Poetry will automatically find a suitable version constraint and install the package and subdependencies.
Note: Remember to commit changes of poetry.lock
and pyproject.toml
files after adding a new dependency.
Note: You can find more details on how to handle versions of packages here.
Environment
We provide a development environment that uses the Visual Studio Code Remote - Containers extension. This extension lets you use a Docker container in order to have a consistent and easily reproducible development environment.
The files needed to build the container are located in the .devcontainer
directory:
devcontainer.json
contains a set of configurations, tells VSCode how to access the container and which extensions it should install.Dockerfile
defines instructions for the building of the container image.
More info here.
Environment Variables
The containerized environment will automatically set as environment variables the your variables stored in a .env
file at the top level of the repository. For example, this is required for certain tests that require credentials, which are (of course) not versioned. Be aware that:
- The Docker image building process will fail if you do not include a
.env
file at the top level of the repository. - If you change the contents of the
.env
file you will need to rebuild the container for the changes to take effect within the environment.
Sharing Git credentials with your container
The containerized environment will automatically forward your local SSH agent if one is running. More info here. It works for Windows and Linux.
Creating a PR
Create a pull request (PR) to propose and collaborate on changes to the project. These changes MUST BE proposed in a branch, which ensures that the main branch only contains finished and approved work.
Be sure to run tests locally before commiting your changes.
Running tests
The tests are implemented with pytest and there are unit tests for each of the handler modules. Tests in the aws
subpackage implement mocks to S3 and do not require real credentials, however, the remaining tests in other subpackages do.
To run the tests just execute pytest
if you are already inside Poetry virtual environment or poetry run pytest
.
Moreover, when a PR is created a GitHub Actions CI pipeline (see .github/workflows/ci.yml
) is executed. This pipeline is in charge of running tests.
Release
Wheel is automatically created and uploaded to PyPI by the GitHub Actions CD pipeline (see .github/workflows/cd.yml
) when the PR is merged in main branch.
Why Santoku?
From Wikipedia:
The Santoku bōchō (Japanese: 三徳包丁; "three virtues" or "three uses") or Bunka bōchō (文化包丁) is a general-purpose kitchen knife originating in Japan. Its blade is typically between 13 and 20 cm (5 and 8 in) long, and has a flat edge and a sheepsfoot blade that curves down an angle approaching 60 degrees at the point. The term Santoku may refer to the wide variety of ingredients that the knife can handle: meat, fish and vegetables, or to the tasks it can perform: slicing, chopping and dicing, either interpretation indicating a multi-use, general-purpose kitchen knife.