Metadata service implementation for Metaflow. This provides a thin wrapper around a database and keeps track of metadata associated with metaflow entities such as Flows, Runs, Steps, Tasks, and Artifacts.
For more information, see Metaflow's website
The Migration service is a tool to help users manage underlying DB migrations and launch the most recent compatible version of the metadata service
Note that it is possible to run the two services independently and a Dockerfile is supplied for each service. However the default Dockerfile combines the two services.
Also note that at runtime the migration service and the metadata service are completely disjoint and do not communicate with each other
The service depends on the following Environment Variables to be set:
- MF_METADATA_DB_HOST [defaults to localhost]
- MF_METADATA_DB_PORT [defaults to 5432]
- MF_METADATA_DB_USER [defaults to postgres]
- MF_METADATA_DB_PSWD [defaults to postgres]
- MF_METADATA_DB_NAME [defaults to postgres]
Optionally you can also overrider the host and port the service runs on
- MF_METADATA_PORT [defaults to 8080]
- MF_MIGRATION_PORT [defaults to 8082]
- MF_METADATA_HOST [defaults to 0.0.0.0]
pip3 install ./ python3 -m services.metadata_service.server
Swagger UI: http://localhost:8080/api/doc
docker-compose up -d
to access the container run
docker exec -it metadata_service /bin/bash
within the container curl the service directly
curl localhost:8080/ping
Latest release of the image is available on dockerhub
docker pull netflixoss/metaflow_metadata_service
Be sure to set the proper env variables when running the image
docker run -e MF_METADATA_DB_HOST='<instance_name>.us-east-1.rds.amazonaws.com' \ -e MF_METADATA_DB_PORT=5432 \ -e MF_METADATA_DB_USER='postgres' \ -e MF_METADATA_DB_PSWD='postgres' \ -e MF_METADATA_DB_NAME='metaflow' \ -it -p 8082:8082 -p 8080:8080 metaflow_metadata_service
Note may need to do a rolling restart to get latest version of the image if you don't have it already
You can manage the migration either via the api provided or with the utility cli provided with migration_tools.py
- check status and note version you are on
- Api:
/db_schema_status
- cli:
python3 migration_tools.py db-status
- Api:
- see if there are migrations to be run
- if there are any migrations to be run
is_up_to_date
should be false and a list of migrations to be applied will be shown underunapplied_migrations
- if there are any migrations to be run
- take backup of db
- in case anything goes wrong it is a good idea to take a back up of the db
- migrations may cause downtime depending on what is being run as part of the migration
- Note concurrent updates are not supported. it may be advisable to reduce your cluster size to a single node
- upgrade db schema
- Api:
/upgrade
- cli:
python3 migration_tools.py upgrade
- Api:
- check status again to verify you are on up to date version
- Api:
/db_schema_status
- cli:
python3 migration_tools.py db-status
- Note that
is_up_to_date
should be set to True andmigration_in_progress
should be set to False
- Api:
- do a rolling restart of the metadata service cluster
- In order for the migration to be effective a full restart of the containers is required
- latest available version of service should be ready
- cli:
python3 migration_tools.py metadata-service-version
- cli:
- If you had previously scaled down your cluster it should be safe to return it to the desired number of containers
Within the published metaflow_metadata_service image the migration service is packaged along with the latest version of the metadata service compatible with every version of the db. This means that multiple versions of the metadata service comes bundled with the image, each is installed under a different virtual env.
When the container spins up, the migration service is launched first and determines what virtualenv to activate depending on the schema version of the DB. This will determine which version of the metadata service will run.
Tests are run using Tox and pytest.
Run following command to execute tests in Dockerized environment:
docker-compose -f docker-compose.test.yml up -V --abort-on-container-exit
Above command will make sure there's PostgreSQL database available.
Usage without Docker:
# Run all tests tox # Run unit tests only tox -e unit # Run integration tests only tox -e integration # Run both unit & integrations tests in parallel tox -e unit,integration -p
There are several ways to get in touch with us:
- Open an issue at: https://github.com/Netflix/metaflow-service
- Email us at: help@metaflow.org
- Chat with us on: http://chat.metaflow.org