wasmCloud/wadm

[FEAT] Emit application events on wadm topic

lachieh opened this issue · 5 comments

It would be good to have some more events to be able to subscribe to for outside systems to understand the state of an application. These are a few events that I know I'd be interested in using but this shouldn't be considered exhaustive.

  • Application status change (+ new status message)
  • Attempting change (scaling, deploying link, putting config)
  • Change successful/failure
  • App manifest deployed/undeployed
  • Health check (?) passed/failed/constant

I'd also add manifest deploy/undeploy events to this list. You can kind of get at them from the wadm.notify subject, but they're very much representations of wadm's internal state since they tell you things like when scalers are being created or deleted.

My ideal scenario would be that we either refactor how the wadm.status stream works or add a new stream where that each event type is actually reflected in the subject (ex. wadm.status.default.app.manifest_deploy, wadm.status.default.status_change. That way external consumers can subscribe to only the events they're interested in using subject filters instead of parsing each message to determine the type and discarding anything that doesn't match

after scanning some code, here's what I am thinking:

  • have an else block here where we dispatch statuses
                                            if let Err(e) = self
                                                .status_publisher
                                                .publish_status(format!("{}.{}", &name, &parsed_event), Status::new(
                                                    StatusInfo::appropriate_variant_based_on_parsed_event_type(format!({&parsed_event}))
                                                    vector_of_all_scalers_statuses
                                                ))
                                                .await
                                            {
                                                warn!(error = ?e, "Failed to publish application status");
                                            }

somethings I have doubts on

  • the current status stream is defined/templated as wadm.status.{lattice_id}.{manifest}, so I am not sure if that means the proposed app status topic wadm.status.{lattice_id}.{manifest}.{parsed_event.raw_type()} will work or we need to update the definition of the status stream
  • in terms of payload for these statuses, are the manifest/application's scaler statuses what we want to send to the listeners? just the event's specific scaler (via scaler_id) or all of them? something else entirely? or more additional info sourced from the event's inner data/payload?

the current status stream is defined/templated as wadm.status.{lattice_id}.{manifest}, so I am not sure if that means the proposed app status topic wadm.status.{lattice_id}.{manifest}.{parsed_event.raw_type()} will work or we need to update the definition of the status stream

I think here we should be able to publish to the topic just fine, but we won't actually be able to persist that status in the status stream. For a first PR we could add these new events which would be available for subscribing, but we'd have to publish the old status events on the old topic still. Using a new topic that's not technically in a stream would be easy, but if we wanted to persist them in a stream we'd have to come up with a good way to limit the number of events we keep for a particular lattice/application

in terms of payload for these statuses

The payloads should be per-event IMO, the overall application status could still include all of the scaler statuses and the individual events could include smaller sets of information.

What I would see as useful

  • App deployed/undeployed, which can mention the version and the application name
  • Host reaped, e.g. did not receive a heartbeat 2x and we remove it from state
  • @protochron anything else that would be really helpful to know from the operator side?

🔮 As a future change, I'm thinking that this could be a useful status extension:

  • Backing off, as implemented in #379 I was using the notify stream but I think we could use status and react to that change instead, removing the notify stream entirely.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this has been closed too eagerly, please feel free to tag a maintainer so we can keep working on the issue. Thank you for contributing to wasmCloud!