openclimatefix/ocf-infrastructure

AWS MWAA

peterdudfield opened this issue · 13 comments

Should we start using Airflow Apache to run and trigger our services

AWS MWAA - https://aws.amazon.com/managed-workflows-for-apache-airflow/pricing/

Things we want from it

  • a UI to easily see whats going on
  • a easy way to re run tasks
  • version control

Other benefits of using an airflow instance:

  • Add dependency checks (i.e. trigering a job based on the successful completion of another job)
  • Monitoring and alerting about DAG failures
  • Centralised solution for viewing all pipelines

Suggest the steps forward are

  1. Sol spends a day, playing with it and see what can be done
  2. Try to Estimate how much time it would take to move over Nowcasting / Sites to this
  3. Go / no go
  4. Move over to this system

Managed to kick something off on ECS using

    gsp_consumer = EcsRunTaskOperator(
        task_id='gsp-consumer',
        task_definition="gsp",
        cluster=cluster,
        overrides={},
        launch_type = "FARGATE",
        network_configuration={
            "awsvpcConfiguration": {
                "subnets": ["subnet-XXX"],
                "securityGroups": ["sg-XXX"],
                "assignPublicIp": "DISABLED",
            },
        },
    )

did something a bit more with

with DAG('general_data', schedule_interval="*/5 * * * *", default_args=default_args) as dag4:
    latest_only = LatestOnlyOperator(task_id="latest_only")

    pv_consumer = EcsRunTaskOperator(
        task_id='pv-consumer',
        task_definition="pv",
        cluster=cluster,
        overrides={},
        launch_type = "FARGATE",
        network_configuration={
            "awsvpcConfiguration": {
                "subnets": ["subnet-0c3a5f26667adb0c1"],
                "securityGroups": ["sg-05ef23a462a0932d9"],
                "assignPublicIp": "ENABLED",
            },
        },
    )

    gsp_consumer = EcsRunTaskOperator(
        task_id='gsp-consumer',
        task_definition="gsp",
        cluster=cluster,
        overrides={},
        launch_type = "FARGATE",
        network_configuration={
            "awsvpcConfiguration": {
                "subnets": ["subnet-0c3a5f26667adb0c1"],
                "securityGroups": ["sg-05ef23a462a0932d9"],
                "assignPublicIp": "ENABLED",
            },
        },
    )

    national_forecaster = EcsRunTaskOperator(
        task_id='forecast-national',
        task_definition="forecast_national",
        cluster=cluster,
        overrides={},
        launch_type = "FARGATE",
        network_configuration={
            "awsvpcConfiguration": {
                "subnets": ["subnet-0c3a5f26667adb0c1"],
                "securityGroups": ["sg-05ef23a462a0932d9"],
                "assignPublicIp": "ENABLED",
            },
        },
    )

    latest_only >> pv_consumer
    latest_only >> gsp_consumer >> national_forecaster

Screenshot 2023-06-21 at 10 46 20

I found I could only run 1 task at a time when running locally - this was annoying when trying to triggere different things off

Tried spinning up the MWAA on AWS. notes

Looking into Airflow MWAA costs, it costs ~400 a month, so will now look into running airflow on Elasticbeanstalk on a small t3 machine. We only need a small machine as we are jus triggering AWS tasks off

This could be a way to run it on ECS

Here is the how to run airflow using docker, this might be useful if running on Elastic Beanstalk

Update

  • spun up EKS, medium difficulty

  • made node on ec2 and made job to run gspconsumer

  • used k9s to view cluster

  • used kubectl to connect to cluster

  • Potentially could cost $70 a month to run EKS (empty) cluster. Would cost 0 on GCP. Could migrate but might be hassle

  • Briefly looked at Prefect and thought we might be able to run this on ELB/ EC2.

Brief thoughts on Prefect:

  • Fairly straightforward to set up locally, not quite as easy as dagster but much easier than Airflow.
  • Nice UI (maybe the nicest?) however I can't work out how to kick off and backfill jobs from the UI as of yet.
  • Can be run on EC2 and be used to kick off tasks on ECS (!).
  • Seems a fair amount of functionality is locked behind the the paid version, most of it is non-critical though

Ill close this, as we decided to deploy an airflow instance our selfves