Green-Software-Foundation/carbon-aware-sdk

[Feature Request]: Missing option to select WattTime model version (Auditability)

Closed this issue · 6 comments

Contact Details

GitHub/Slack: Willmish

What happened?

Currently, the CA SDK and API are both missing a Configuration option or alternatively an Endpoint parameter/CLI command parameter to select the version of the configured Data Sources (specifically - WattTime). This affects the auditability capability of the Carbon Aware SDK, as the users of it might not be able to reflect on decisions made using the SDK in the past, as new models might provide different data then returned by the Data Source previously.

FOR GSF members, see this thread for a further explanation: https://greensoftwarefdn.slack.com/archives/C023G5KJ6P5/p1674062862984559

As an example of how this might affect a real-world scenario, I ran this Python demo: https://github.com/Willmish/carbon-aware-sdk/tree/demo_sdk_notebook/samples/python-jupyter-notebook-webapi , back in October 2022 last year, and a few days ago. Here is a comparison of the Carbon intensity values for 2 different times of day, for the same Azure Region (westcentralus) and WattTime location (PACE), for the same time period (June 2022 - August 2022).

Run from October 2022:

Screenshot 2023-01-18 at 17 49 20

Run from January 2023:

new output

Clearly the resulting carbon intensity values differ significantly between those two runs.

As confirmed by @Henry-WattTime from WattTime, WattTIme updated their model for data prediction in November 2022, since the SDK is always using the most recent version of the model, it cannot access the older data.

Suggested solution

  • Add a version configuration option to the DataSources, to configure other DataSources pointing at older WattTIme/other service models.
  • Add a parameter to the Web API endpoints / CLI commands to specify Emissions and Forecast DataSource versions, also return those versions in the Carbon Aware SDK reply.

In my opinion, the second suggested option, although requires more code refactoring, is probably the one we would expect the SDK to have. Happy to hear out any alternatives.

client

WebAPI (Default)

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Took some time to look into this and just wanted to add notes here:

Note 1: Classification of this as a bug:
I would reconsider classifying this as a bug as we intentionally exposed the publicly available API with the accompanying model structure of that API to users of the CA SDK. This ensures users have the latest data and most accurate, up-to-date modeling that the data sources expose (such as WattTime). If we want to support non-public APIs (see more in note 2 about this), I think that would be a specific change we would want to make clear across the board in the SDK, and then provide clarity around when, why, and how a user can access older versions.

Note 2: Supporting a Non-Public version of WattTime API:
WattTime does not currently make previous versions of their API readily available and does not include any documentation or release notes around the changes between models and versions. If we did want access to that information, that would likely be through a "Non-public" API and specific routes/information that is exposed to the GSF. This probably would require having a steady contact/relationship within WattTime that can provide updates on when new versions are released and tracking of changes needed to be made in the repo because of that.

Note 3: Proposed Solution:
My $0.02 is that the first proposed solution may be better and more natural to how the versioning could occur with the data sources. In the config files, we set the URLs that users will ping, which is likely where we would note access to previous versions (WattTime for instance is currently on v2 of its api and that's what the url uses). This also maintains the consistency of the datasource throughout the different calls the user makes. - once a user configures a version for a datasource, that data source is static for all calls so the consistency of the data matches.

from #384: post v1.1 release, this issue is relevant again to rediscuss and rethink the approach to it

From #394 : Lets pick up the discussion and reconsider the issue for implementation, @YaSuenag if you could share your thoughts on this

This issue has not had any activity in 120 days. Please review this issue and ensure it is still relevant. If no more activity is detected on this issue for the next 20 days, it will be closed automatically.

This issue has not had any activity for too long. If you believe this issue has been closed in error, please contact an administrator to re-open, or if absolutly relevant and necessary, create a new issue referencing this one.

FYI @Willmish - please reopen and remove stale tag accordingly