Example of how to build ml training workflow on AWS by Prefect
Blog(Japanese): https://developers.cyberagent.co.jp/blog/archives/38253/
Production | Develop | |
---|---|---|
Agent | ECS Agent | Docker Agent |
Storage | Docker Storage | Docker Storage |
Executor | DaskExecutor | LocalDaskExecutor |
Schedule | every 1 hour | - |
Notification | Slack | - |
Following simple ML model training pipeline is implemented.
Extract Raw Data -> Preprocess Data -> Train Model -> Validate Model -> Upload Trained Model
- Data: avazu-ctr-prediction dataset
- Preprocess: Feature Hashing
- Model: SGDClassifier
Run Docker Agent in local environment.
$ prefect agent docker start --label develop
[2022-10-12 10:24:09,492] INFO - agent | Registering agent...
[2022-10-12 10:24:09,736] INFO - agent | Registration successful!
____ __ _ _ _
| _ \ _ __ ___ / _| ___ ___| |_ / \ __ _ ___ _ __ | |_
| |_) | '__/ _ \ |_ / _ \/ __| __| / _ \ / _` |/ _ \ '_ \| __|
| __/| | | __/ _| __/ (__| |_ / ___ \ (_| | __/ | | | |_
|_| |_| \___|_| \___|\___|\__| /_/ \_\__, |\___|_| |_|\__|
|___/
[2022-10-12 10:24:09,951] INFO - agent | Starting DockerAgent with labels ['develop']
[2022-10-12 10:24:09,951] INFO - agent | Agent documentation can be found at https://docs.prefect.io/orchestration/
[2022-10-12 10:24:09,951] INFO - agent | Waiting for flow runs...
Run ECS Agent on ECS Service.
You can use terraform to set up ECS Service hosting ECS Agent.
Following commands create below infrastructure.
- Change variable values in
infra/secrets.tfvars
access_key = "*****" # AWS Access Key
secret_key = "*****" # AWS Secret Key
region = "*****" # AWS Region
prefect_api_key = "*****" # Prefect Cloud API Key
- Run terraform
cd infra
terraform init
terraform apply -var-file=vars.tfvars
-
Change the AWS resource settings in the
config.toml
to match your environment. -
Auth login to Prefect Cloud. Use your API key.
$ prefect auth login -k pcs_*****
Logged in to Prefect Cloud tenant "*****'s Account" (XXXXXXX-s-account)
- Export PATH to
PREFECT__USER_CONFIG_PATH
so that Prefect read config.toml.
$ export PREFECT__USER_CONFIG_PATH="$PWD/config.toml"
- Auth Login to AWS ECR repository to use Docker Storage.
$ aws ecr get-login-password --region ap-northeast-1 | docker login --username AWS --password-stdin *****.dkr.ecr.ap-northeast-1.amazonaws.com
$ prefect register --project develop -p ml_workflow.py
Collecting flows...
Processing 'ml_workflow.py':
Building `Docker` storage...
[2022-10-14 11:10:30+0900] INFO - prefect.Docker | Building the flow's Docker storage...
Step 1/15 : FROM python:3.8.6-slim
....
Successfully tagged *****.dkr.ecr.ap-northeast-1.amazonaws.com/prefect_introduction/prod-prefect-flow:latest
[2022-10-13 23:17:33+0900] INFO - prefect.Docker | Pushing image to the registry...
Pushing [==================================================>] 590.2MB/578MBkB
Registering 'ml-workflow'... Done
└── ID: 519d3844-087a-44ef-8432-9804f06df1c5
└── Version: 1
======================== 1 registered ========================