/foldy

Foldy: a web-based platform for interactive protein structure analysis

Primary LanguageTypeScriptOtherNOASSERTION

Foldy

Foldy is a webtool for doing computational structural biology, centered around protein structure prediction with AlphaFold.

Table of Contents

Local Deployment

Local development is supported with docker compose.

Initial setup

  1. Install docker which includes docker-compose [installation instructions]

  2. Clone this repository:

    git clone https://github.com/JBEI/foldy.git
    cd foldy
  3. For code completion and running the frontend, install frontend dependencies and backend dependencies with [conda]

    conda create -y -n foldy-environment
    conda activate foldy-environment
    conda install -c conda-forge nodejs==16.17.1 python==3.7.13
    pip install -r backend/requirements.txt
    cd frontend
    npm install
  4. Start the backend in a new terminal window.

    • From the root of the foldy repo, call docker-compose up
  5. Start the frontend in a new terminal window

    • From the frontend/ directory, call npm start
  6. Create the DBs in the local postgres instance

    • From the root of the foldy repo, call docker-compose exec backend flask db upgrade

For live development

Changes to both the frontend and backend will be live-reloaded.

  1. Start the backend in a new terminal window.
    • From the root of the foldy repo, call docker-compose up
  2. Start the frontend in a new terminal window
    • From the frontend directory: npm start
  3. Visit localhost:3000 in your browser.

Upgrading Database in development

If any changes are made to the database models, execute the following commands to create a revision and migrate database

docker-compose exec backend flask db stamp $CURRENT_REVISION_NUMBER
docker-compose exec backend flask db migrate
docker-compose exec backend flask db upgrade

Development Tasks

TODO: Protein structure prediction tasks in the development environment are not actually performed. Instead a few test cases have been precomputed that auto-complete once queued.


Production Deployment

Once you are satisfied with the application, you can deploy the application into production by following the procedure below.

Initial Setup

This site is built on Kubernetes (specifically Google Kubernetes Engine, GKE). A few Google Cloud resources need to be created, included a GKE project, and then all resources within GKE can be deployed at once. The Kuberenetes config, and its resources, are expressed using a tool called Helm.

Prior to deployment, you must choose the following variables:

  • GOOGLE_PROJECT_ID: ID for institution google cloud project. Does not need to be foldy specific. Can be retrieved from google cloud console.

  • GKE_CLUSTER_NAME: Name of kubernetes foldy cluster, typically 'foldy'

  • GOOGLE_SERVICE_ACCOUNT_ID: Name of service account that foldy uses, typically 'foldy-sa'

  • GOOGLE_SQL_DB_NAME: Name of SQL database in gke cluster, typically 'foldy-db'

  • GOOGLE_SQL_DB_PASSWORD: SQL database password in gke cluster, for example use the following command to generate a secure password:

    python -c 'import secrets; print(secrets.token_urlsafe(32))'
  • FOLDY_DOMAIN: Domain name selected for foldy application

  • FOLDY_USER_EMAIL_DOMAIN: Email domain to allow access, e.g. "lbl.gov" will allow all users with "@lbl.gov" email addresses to access

  • GOOGLE_BUCKET_NAME: Name of google cloud bucket, for example 'berkeley-foldy-bucket' however it needs to be unique globally like an email address needs to be unique globally

  • GOOGLE_ARTIFACT_REPO: Name of google cloud docker image repository, typically 'foldy-repo'

  • GOOGLE_CLOUD_STATIC_IP_NAME: Name of google cloud static IP resource, typically 'foldy-ip'

These variables will be used throughout this procedure. Once completed, execute the following procedure:

  1. Clone this repo

    git clone --recurse-submodules https://github.com/JBEI/foldy.git
    cd foldy
  2. Copy the following templates:

    cp foldy/values_template.yaml foldy/values.yaml
    cp db_creation_resources_template.yaml db_creation_resources.yaml
  3. Choose a domain! We named our instance LBL foldy, and reserved the domain foldy.lbl.gov with our IT folks, and we think it reads pretty well. If you don't have an IT team who can provision a domain name / record for you, you can reserve an address like ourinstitute-foldy.com using any commercial hostname provider

  4. Enable cloud logging API for prometheus / metrics

  5. Install local tools gcloud, helm, and kubectl:

    1. Install Google Cloud CLI [instructions here]
    2. Install Helm CLI [instructions here], briefly brew install helm
    3. Install Kubectl CLI [instructions here]). Briefly, make sure you call gcloud components install kubectl and gcloud components install gke-gcloud-auth-plugin
  6. Create following google cloud resources

    • Create foldy service account which has scopes/permissions to access necessary foldy resources

      • From google cloud console.
      • Make sure to provide following roles:
        • artifact registry administrator
        • artifact registry reader
        • cloud sql client
        • compute admin
        • logging admin
        • monitoring admin
        • storage admin
        • storage object admin
      • Fill in service account details in cluster_config.yaml
    • Create Kubernetes project

      gcloud container clusters create $GKE_CLUSTER_NAME --enable-managed-prometheus --region=us-central1-c --workload-pool=$GOOGLE_PROJECT_ID.svc.id.goog
    • Enable kubectl

      gcloud container clusters get-credentials $GKE_CLUSTER_NAME
    • Create PostgreSQL DB:

      gcloud sql instances create ${GOOGLE_SQL_DB_NAME} --tier=db-f1-micro --region=us-central1 --storage-size=100GB --database-version=POSTGRES_13 --root-password=${GOOGLE_SQL_DB_PASSWORD}
      • Then, through the cloud console, enable private IP at https://console.cloud.google.com/sql/instances/${GOOGLE_SQL_DB_NAME}, and note the DB IP address as GOOGLE_SQL_DB_PRIVATE_IP
      • Now, fill in DATABASE_URL in foldy/values.yaml using following example: postgresql://postgres:${GOOGLE_SQL_DB_PASSWORD}@${GOOGLE_SQL_DB_PRIVATE_IP}/postgres
    • Allocate Static IP Address

      • From the Cloud Console, reserve an external static IP address
      • Make it IPv4, Regional (us-central1, attached to None)
      gcloud compute addresses create ${GOOGLE_CLOUD_STATIC_IP_NAME} --global
      gcloud compute addresses describe ${GOOGLE_CLOUD_STATIC_IP_NAME} --global
    • OAuth Client ID

      • Create OAuth Client ID for production
        • Using the Google cloud console.
        • Application type: Web Application
        • Name: ${GKE_CLUSTER_NAME}-prod
        • Authorized javascript origins: https://${FOLDY_DOMAIN}
        • Authorized redirect URIs: https://${FOLDY_DOMAIN}/api/authorize
        • Then paste the ID and secret in the GOOGLE_CLIENT_{ID,SECRET} fields in foldy/values.yaml
    • Create gcloud bucket using cloud console with following attributes:

      • Name = ${GOOGLE_BUCKET_NAME}
      • Multi-region
      • Autoclass storage class
      • Prevent public access
      • No object protection
    • Create gcloud docker image repo by running:

      gcloud artifacts repositories create ${GOOGLE_ARTIFACT_REPO} --repository-format=docker --location=us-central1
    • Enable permission to push and pull images from artifact registry with:

      gcloud auth configure-docker us-central1-docker.pkg.dev
    • Create node pools by running: bash scripts/create_nodepools.sh

  7. Fill out template files

    • Fill in SECRET_KEY in foldy/values.yaml with random secure string, for example use the following command
    python -c 'import secrets; print(secrets.token_urlsafe(32))'
    • EMAIL_USERNAME and EMAIL_PASSWORD in foldy/values.yaml are optional. They will be used for status notifications, but they must be gmail credentials if specified.
    • Fill in variables in foldy/values.yaml with appropriate values
  8. Install the Keda helm/kubernetes plugin with docs

  9. Bind service account to GKE

    gcloud iam service-accounts add-iam-policy-binding ${GOOGLE_SERVICE_ACCOUNT_ID}@${GOOGLE_PROJECT_ID}.iam.gserviceaccount.com --role roles/iam.workloadIdentityUser --member "serviceAccount:${GOOGLE_PROJECT_ID}.svc.id.goog[default/foldy-ksa]"
  10. Build and push docker images to your google artifact registry with

    bash build_and_deploy_containers.sh
  11. Make sure that the ImageVersion is properly set in foldy/values.yaml, then deploy the kubernetes services using

    helm install foldy foldy
  12. Initialize tables in PostgreSQL database

    kubectl exec service/backend -- env FLASK_APP=main.py flask db upgrade
  13. Fill out db_creation_resources.yaml with appropriate variables and download alphafold databases into a persistent volume with

    kubectl apply -f db_creation_resources.yaml

    Can monitor progress of database download with

    kubectl logs --follow --timestamps --previous create-dbs |less

    Note, don't run any jobs until database download has been completed.

  14. Reserve a domain name

    • Can use this command to find static IP address
    gcloud compute addresses describe ${GOOGLE_CLOUD_STATIC_IP_NAME} --global
    • You can add an ANAME record pointing at the static IP address provisioned above.

Note, using the us-central1-c region is required because most google A100s are located in that region.

Deploying new code

  1. Increment ImageVersion in foldy/values.yaml

  2. Rebuild the docker images:

    ./build_and_deploy_containers.sh ${PROJECT_ID} ${GOOGLE_ARTIFACT_REPO} ${IMAGE_VERSION}
  3. Update the helm chart helm upgrade foldy foldy

Acknowledgements

Foldy utilizes many separate libraries and packages including:

We thank all their contributors and maintainers!

Use of the third-party software, libraries or code Foldy may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

License

Foldy is distributed under a modified BSD license (see LICENSE).

Copyright Notice

Foldy Copyright (c) 2023, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and University of California, Berkeley. All rights reserved.

If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Intellectual Property Office at IPO@lbl.gov.

NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.