Synchronize contributions from real-time-cloud project with cloud-metadata if-plugin

Question

Synchronize contributions from real-time-cloud project with cloud-metadata if-plugin

adrianco opened this issue 5 months ago · 3 comments

The real-time-cloud project has been developing a table of values for cloud provider region information, and working towards an IF implementation.

In the meantime, an early version of the table, with outdated headers and missing data was copied into cloud-metadata as gsf-data.csv and was updated to include geolocation information, and some newer data.

We now want to contribute a final version of the table and have to merge the updates, need to discuss this jointly between the real-time-cloud team and the if-plugin cloud-metadata contributors so that we don't stomp on each others work.

Answer 1 · 2024-04-24T09:32:05.000Z

Hi @adrianco - thanks for raising the issue - we'd love to see the cloud-metadata plugin updated to use your much richer dataset.

Today we provide cloud-metadata with instance level info and return information about the hardware it uses. If region data is provided, all we do with it is return the other region IDs we know about (e.g. if you provide a location we'll return the same region in the right format for watt-time or EM queries, which are done in a separate plugin). As I understand from the call yesterday, you'd like to extend this behaviour so that providing region info returns more region-level metadata that can be used later in the execution pipeline. We can certainly do this!

Our current implementation uses a locally stored copy of an old version of your dataset. It's not very efficient for you and IF to both maintain copies of the dataset. It seems likely that they will fall out of sync again some time after this initial upgrade is done. It makes sense for yours to be the source of truth as you do the work on sourcing and verifying the data.

Maybe one option is for us to develop the plugin with you for this initial upgrade and then hand over ownership of the plugin to RTC altogether, migrating it into an RTC repository?

This is the model we are using for other groups that have some tight relationship with a specific plugin - ultimately the core IF library will only contain the most generic, all-purpose builtin features and everything else will be community-owned and maintained.

Here's a proposal for how this could happen:

you provide a specification for the behaviours you would like to see in the plugin - we can back and forth on it as needed.
you provide a copy of the dataset you would like to pull data from
we update the plugin to your specification
we apply our testing and QA process to the new version of the plugin
we update the plugin documentation including instructions for how to run and test.
you fork the updated plugin into an RTC repository
we deprecate it in our if-plugins repository and schedule for deletion
we delete and link out to your repository from our documentation

Of course, you could still reach out to us for collaboration or support in maintaining the plugin where necessary, but it frees you to do the small updates to the plugin required when you change your data without relying on IF team availability.

What do you think of this idea?

@jawache

Answer 2 · 2024-05-07T08:46:26.000Z

@jawache @adrianco any more thoughts on this?

Answer 3 · 2024-05-07T14:57:09.000Z

Thanks @adrianco and @jmcook1186,

So I propose this approach.

RTC team maintains a CSV file in the rtc repo
IF team maintains a new generic plugin called lookup which is configured to point to a CSV file (like the one maintained by the RTC team).

We can discuss how best to implement lookup but I think the logic is really generic enough (lookup values from a CSV file) that I think it might be quite easy to define.

initialize:
  plugins:
    cloud-region-metadata:
      method: Lookup
      path: builtins
      global-config:
        csv: https://github.com/Green-Software-Foundation/real-time-cloud/blob/main/cloud-region.csv
    cloud-instance-metadata:
      method: Lookup
      path: builtins
      global-config:
        csv: https://github.com/Green-Software-Foundation/real-time-cloud/blob/main/cloud-instance.csv

cloud-region-medata would take input as cloud/vendor and cloud/region, this has to map to a single row in https://github.com/Green-Software-Foundation/real-time-cloud/blob/main/cloud-region.csv and the plugin would just return the rest of the columns in that row as outputs.

cloud-instance-medata would take input as cloud/vendor and cloud/instance, this has to map to a single row in https://github.com/Green-Software-Foundation/real-time-cloud/blob/main/cloud-instance.csv and the plugin would just return the rest of the columns in that row as outputs.

The current cloud-metadata plugin basically does that logic internally, to replicate it we would have to use two instances of the lookup like above to get similar functionality but I think that's a pretty decent tradeoff to build a generic and simple to use plugin.

As the RTC team generates useful datasets as CSV files in the future, we can easily expose them to people who are computing emissions using the lookup plugin without the need to write more code.

How does that sound @adrianco and @jmcook1186?