Setting Up and End-to-End CI/CD Framework
Opened this issue · 2 comments
callahantiff commented
Task
Task Type: INFRASTRUCTURE
Determine which tools we will use in order to set-up an end-to-end CI/CD framework.
TODO
The requirements for this system include:
- Leveraging GitHub Actions to:
- Test the codebase
- Downloaded needed resources and build the Docker Container
- Deploy and run the Docker container via Google Cloud Run (one for each KG build type)
- Generate baseline embeddings (#71)
- Returning all results
- Pushing certain files to Neo4J instance and SPARQL Endpoint
Potential Configurations:
- CI/CD with Serverless Containers on GCP - Described here
- Consider using Google Cloud Composer to kick-off the first task of the monthly build process which downloads and preprocess the data used for each build (LOD and Ontology data)
Proposed Tasks for CI/CD
- Download all LOD and Ontology data
- Preprocess and Clean data
- KG Build
callahantiff commented
TODO
- Script out data download and write to GCS (TASK 1)
- If any failure in download, default to last build's version of downloaded data and log issue
- Script out preprocessing of LOD and Ontology data (TASK 2)
- Log any issues
- Output updated
resource_info.txt
,edge_source_list.txt
, andontology_source_list.txt
to Docker container - Decide if ontologies are merged in TASK 2 and then merged data is also sent to
resources/knowledge_graphs
in Docker container
- Update Docker build trigger to pull from TASKS 1-2 (TASK 3)
- Add script that runs after each successful build and copies data from
release_v2.0.0/archived_builds/build_DDMMYYY
torelease_v2.0.0/current_build/
- Update SPARQL Endpoint and Neo4J
callahantiff commented
Nearly done. The jobs are too long to use GitHub-hosted runners for GitHub Actions. Need to explore other options for self-hosted runner. Considering using Terraform.