prod107: search engine deployment
Closed this issue · 3 comments
The deployment of a first version of the search engine stack is a target of the upcoming prod107 release. #359 introduces the playbook allowing to deploy the stack while #367 contains the logic to define a new group of servers where the service should be deployed.
While #367 focuses on deploying the service in the simple context of a pilot VM where all the services are colocated on a single nde, for the scope of prod107 we will need to decide how to deploy it the multi-nodes architecture used for production deployments.
The current set of instances (and their relationship) created for each deployment can be loosely summarized by:
idr-database -> idr-omeroreadonly-1,idr-omeroreadonly-2,idr-omeroreadonly-3,idr-omeroreadonly-4, idr-omeroreadwrite -> idr-proxy
idr-database,idr-omeroreadonly-1,idr-omeroreadonly-2,idr-omeroreadonly-3,idr-omeroreadonly-4, idr-omeroreadwrite ->idr-management
Listing the various architectures available
Option 1: deploy the app in the management instance
Pros: it benefits from the Docker prerequisites being installed in the management instance (Currently used for monitoring), it's the strategy originally used for #359, currently deployed on test104
Cons: the compute capacity of the management VM is limited esp. for a full indexing, consuming the search endpoint from the omero nodes would require to go through the proxy
Option 2: deploy the app in the omeroreadwrite instance
Pros: this is a more scaled version of the option 1, it uses the fact the omeroreadwrite server has a larger compute capacity. Additionally, this makes use of the capacity of omeroreadwrite which is currently unused when the deployment is moving to production (except for minor DB updates like adding DOIs/publication)
Cons: same as above, consuming the search endpoints from the omeroreadonly nodes currently requires to go through the proxy without additional nginx configuration
Option 3: deploy the app in a new searchengine instance
Pros: allows to tailor compute/storage capacity of the instance to the exact needs of the app. Allows the various omero instances to access the searchengine service in the same way as the database is accessed
Cons: requires 1 more instance would be created per production deployment and probably needs to be reviewed with the global tenancy capacity
Option 4: deploy the app across all omero instances
Pros: for indexing, this would keep the benefit of option 2 and use the compute capacity of omeroreadwrite, if we are thinking of integrating with omero-web or idr-gallery, it colocates the service and simplifies. Also this starts scaling the service in the same way as the OMERO.web servers
Cons: probably requires additional thoughts on how to distribute the data especially the elasticsearch database, possible moving towards an ElasticSearch cluster (4a) or moving ElasticSearch to yet another instance (option 4b)
I think it may be a good idea to start with option 3 and deploy it in a new instance. This will give us the opportunity to deploy, configure, and maybe reconfigure the instance and the apps without affecting anything else.
@khaledk2 coming back to this, a few outstanding questions:
- based on your latest investigation of indexing, what would you recommend for the compute capacity of a standalone searchengine VM? 16VCPUs/64GB RAM like
omeroreadwriteor 8 CPUS/ 32GB RAM likeomeroreadonly? - what should be the typical size of the underlying data volume? And should this volume follow the same snapshotting/cloning lifecyle as the DB/binary repository/nginx cache?
- are we happy recreating an idr-testing deployment from scratch with the initial set of choices? @will-moore
@khaledk2 coming back to this, a few outstanding questions:
- based on your latest investigation of indexing, what would you recommend for the compute capacity of a standalone searchengine VM? 16VCPUs/64GB RAM like
omeroreadwriteor 8 CPUS/ 32GB RAM likeomeroreadonly?
It would be good to have a VM like pilot-idr0000-omeroreadwrite (16VCPUs/64GB RAM).
- what should be the typical size of the underlying data volume? And should this volume follow the same snapshotting/cloning lifecyle as the DB/binary repository/nginx cache?
A data volume of 50 to 100 GB should be fine (preferably SSD). Yes, I think this should be fine. I will test getting the elastic search indices from a disk copy,
- are we happy recreating an idr-testing deployment from scratch with the initial set of choices? @will-moore
Yes, I think it should be fine.