BJReplay/ha-solcast-solar

Error 429 connecting to solcast on HA restart, zapped all solcast sensors

Closed this issue Β· 107 comments

Describe the bug
I restarted Home Assistant and when the Solcast integration started up it got an error 429 I believe from retrieving the rooftop sites info:
IMG_1030

This caused the rooftop sensor in HA to change to Unknown:
IMG_1032

And ALL the solcast PV forecast sensors to change to zero:
IMG_1031

Fortunately I spotted this pretty quickly and was able to resolve it by reloading the Solcast integration and all the sensors populated correctly

To Reproduce
Restart HA and if Solcast is busy its likely that the same will occur

Expected behavior
On HA start, if an error 429 occurs, use the previous solcast sensor values. Do not set any sensors to Unknown or zero.

Desktop (please complete the following information):
HA 2024.5.5
HAOS 12.3
Solcast v4.0.25

i don't know if this will help you but this helped me when i had the 429 error.

wait around a day without making a call to solcast. and then refresh at a random time, not on the hour.

if you have automation to refresh data from solcast exactly on the hour then add a delay to it. for example i am using 2 min 45 secs delay on my automation that updates solcast data.

wait around a day without making a call to solcast. and then refresh at a random time, not on the hour.

yes I have this in my solcast automation and I'm getting pretty reliable solcast polls now.

This problem occurred when Home Assistant was restarted (at a random time), the integration reached out to solcast to get the site data, got a 429 error and then that zero'd out all the solcast sensors. The enhancement request is to either prevent the initial solcast call, or trap the 429 error and not over
write the sensors if it occurs

The enhancement request is to either prevent the initial solcast call, or trap the 429 error

I am looking into adding a configure option to suppress the initial call to Solcast... With two PV arrays I run out of available calls real quick doing restarts.

The initial call is not easily suppressed, as from what I can tell it returns the 'sites' along with forecast data. So I'll need to cache the sites data to survive the restart, which will result in data still showing up on the Energy dashboard right after restart. Doable, but it'll take time.

One other integration I use has a toggle that you can enable/disable cloud update at integration startup and HA reboots. Obviously dont know if this could be translated over to this integration but worth mentioning.

For someone who knows about the technical side of things and python can check the PR here from the other integration to see if something can be borrowed into here.

So I'll need to cache the sites data to survive the restart

Just an alternative thought.. as hobbyist accounts can only have two rooftops, it might be simpler to capture two additional user-entered resource_ids in the configuration flow and if either/both are present then skip the startup discovery call to Solcast and use the supplied resource_id(s) instead?

I did some code examining only to find that the sites data is rigged to be definitely already cached by solcastapi.py. Then I did a few restarts of HA and checked API use count at solcast.com.au. It did not go up, so getting the sites must not contribute to API calls confirming the cache read of sites data.

However earlier today I did notice a 429 from Solcast after restarting, with forecast data unavailable in HA. The solar production yellow bars were also missing. Trying to check the log later did not have a 429 present because log clear on restart, so I need to wait until it happens again to see if it's the same issue reported at line 144.

An immediate restart of HA fixed the forecast and solar display.

So in summary, this should never happen on HA start with /config/sites.json (not solcast.json) present and the class variable apiCacheEnabled set True, because that section of code in the try/except block should read the cache and not do a REST call but apiCacheEnabled is always False.

Debug logging for individual components also seems to turn off on HA restart, so I might need to enable system-wide debug logging for a while. Ugh. I got logging sorted.

Delving further:

On startup, apiCacheEnabled is always False, hard-coded, so it's not caching the sites collection. It is caching forecasts only.

But it doesn't really need to cache sites collection, because it doesn't impact the count of daily API calls to do so. So no impact, you'd think? Kind of.

What I think is happening here is that Solcast limit API calls when things get too busy, probably only when you have a hobbiest account, and there is already much said by Oziee on that by guiding folks to not gather exactly on the hour. Many folks don't listen, so if your restart occurred near the top of the hour it might fail for that reason.

But it didn't. It failed at a quarter to five.

I am wondering whether some form of retry is in order on failure. Caching is also probably a better option so that the API call never happens. But what if sites change? Get added? This is probably why Oziee left it permanently disabled, to most likely cut down on the number of issues logged against his repository. Not everyone's a Python coder... And not everyone knows how to find and delete a file in Linux.

But he wrote the code to do so. It'd be amusing to try and make it work in the real world.

So I suspect that API limiting, because busy is behind this. I just did about twenty restarts in a row and none of them failed to get the sites from API, despite me having just one weird and seemingly random failure earlier today.

@gcoan, you could try out 490494c, which adds a retry mechanism, but testing my suspicion is going to be tough.

I am also going to expand on the retry concept, @gcoan. It seems sensible to me that if the rooftop sites collect fails, but there is a cache available to be read (because we restarted prior and all was good), then read it, log the circumstance with a warning, and move on.

I like that.

#26 just got merged. What could possibly go wrong? So it's another branch/PR for me to implement the belt/braces fall-back-on-cache.

@autoSteve yes that seems a sensible approach to use the cached sites data if we get a 429 error.

Do you want me to try the above release or is there another version coming with more changes?

Yeah, hold off @gcoan. I want the fall-back in there. Makes TOTAL sense.

I'm going to pull the pin for tonight (migraine kicking in to curl up and hide levels), but will (or you can, @autoSteve - welcome as a collaborator) build a release tomorrow (based on your fallback as noted above) when ready. Intention is to tidy up info.md as well (rather than just link to release notes) as well as update PR that requests this repo gets included in HACS (link to latest release and runs of HACS and hassfest actions.

I don't think I'm going to get around to implementing randomised polling before I travel, but I might also update the readme on making the randomised polling the first automation.

Ugh. I hate the flashy-flashy-migraines, @BJReplay. Much spekky, but. Until the pain kicks in... Growing out off them, and thankfully haven't had one for two years. (Not to mozz meself, so touch wood...)

Thanks for the welcome.

Would you believe it, looks like I had a power cut and HA restart last night, and Solcast 429 error on startup!
image

Same error, the sites entities are no longer being provided by solcast and all the other forecast entities are empty. Took two reloads before solcast came back, the first reload at 9:17:39 (that you would think would be a random enough time) failed with a 429.

Thanks @autoSteve and @BJReplay for looking into this. Hope you manage to hold off the migraine

Line 144. There she goes, @gcoan. That's what we're chasing...

b38c62a is good to test, @gcoan.

I'll hold off merging until someone other than me has tried it.

After a successful start you should find the cache (containing json) has been created, which will be used in subsequent failure scenarios.

image

@autoSteve do I need to manually copy the files from that github release into the appropriate HA folder ?

It's just for other integrations I download beta versions from HACS and I'm not seeing any for solcast

image

@autoSteve do I need to manually copy the files from that github release into the appropriate HA folder ?

Ummm... I've not got that clever. Do you know your way around Github to get from the commit file contents of solcastapi.py that's been modified and manually replace your current one with the one in the commit?

image

I may need to get more clever with beta releases...

Thanks @autoSteve I figured it out, finding the updated solcastapi.py file in the github release and copying that to /config/custom_components/solcast_solar (making a copy first!).

Restarted HA and the /config/sites.json file has been created and it appears to contain my sites data.
I have restarted HA 3 more times and on each restart I've not seen any solcast messages in the HA Core logfile, so not sure whether this means Solcast is not reporting 429 errors and so no retry/use of the cached sites.json is required, or whether its silently used the cached file. Should I see anything in the logfile so I can debug whether it is working or not?

Oh, Solcast entities remained OK throughout all these restarts

If a rooftop sites gather had been previously successful then the data from that gather will be used with a warning issued in the log should a 429 occur three times in a row with five second pauses between calls.

I have good news, and I have bad news.

So... the good news first, @gcoan. By restarting almost exactly on the hour I was able to replicate a 429 for get sites. The retry mechanism for getting sites data worked perfectly, warned of the occurrence in the log, and the integration used the cache data. So that bit of code checks out as tested.

image

The not-so-good news is that more caching needs to be done, with the call to GetUserUsageAllowance subsequently failing with a 429. This call is also not subject to a 10-call limit, pretty well confirming that this is only occurring at generally busy times for Solcast, and probably for hobbiest accounts only. This also needs a retry mechanism and separate cache to cope seamlessly with startup circumstances.

Back to some good news, though, should the call to GetUserUsageAllowance fail it will not interfere and have sensors go AWOL resulting in data gaps. It just makes my eye twitch, so it will be worked around...

It would also be interesting to potentially build in a much longer retry interval for getting forecast data periodically, say, one minute interval. If that coincidentally happens at a busy time, then I think that forecast update would be skipped until the next scheduled one occurs.

I'm on it. Problem number one first, then I'll move on to get forecast retries.

Thanks @autoSteve I had been planning on doing some more 'on the hour' restart testing myself, I might as well hold off since you've found a further issue.

Looking at the log fragment you shared, I noticed a typo in the error message, 'Responce' at the end, but moreover, I wonder whether the 429 as it stands should be logged as an Error because then the next line logs a warning that the cache will be used. Could this be confusing, to get an error that isn't really an error because then it's being trapped and processed as a warning (that the cache is being used)?

Not my typo πŸ˜‰ but fixed. Well spotted. There was another one in the code, too.

The log is in reverse time order. The warning about a 429 comes first, then the 429 error occurs that I'm about to work around.

Almost everyone knows an HTTP status 404 for "not found". It does beg the question: what does "status 429" mean to a real person? Weirdo developers like me know that this is the HTTP status code for "go away, I'm busy so try later...", but your average Joe needs to hit up Google to understand it, then raise an issue with Oziee, who then would facepalm for the twentieth time that month πŸ˜‚. "Status 429, too busy try later" or "Status 429, Solcast too busy", or similar might be well better.

Thanks for the explanation, yes I was forgetting that the log file is in reverse order.

Agree, 429 is an obscure code that most people won't know about, it's an obscure code. Personally I would have used 503 service unavailable which is a bit more well known, but for most people it's a google search. And as you say then they would have queried it with oziee when its not his fault

429 is a rate limiting status code.

429 Too Many Requests

The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests in a given amount of time ("rate limiting").

This is probably being thrown because the user has hit the API call limit. Although set at 10, multiple roofs would effectively half that (but there are other calls that further reduce the API call count)

(I love Oziee, and he's built a great integration, but technically it is his fault, @gcoan. It's a 429. By its very definition as "rate limiting" he should code a retry after a back-off timer, or use cached data if it's not a biggie, not just error out...)

@swests, the Solcast API will return this at times even if the API call limit has not been reached. Some calls are not subject to the hobbyist 10-call limit, like the call for "How many calls can I make?".

Example, when I hit this 429 earlier I was at 6/10 daily calls according to the Solcast web site.

This 429 is occurring at generally busy times, where the site will refuse any call from a hobbyist.

But thanks for the input.

Yeah I agree, the integration should be written defensively,trap errors when it can do, use caches and certainly not zero the data if it can't make a connection.

Unfortunately the growth of Home Assistant and home solar and probably therefore growth of this integration have been part of the cause of the server impact solcast is trying to limit. Personally I'd be happy to pay a small subscription or be directed by solcast as to the specific time to make my API calls so they balance the traffic over the day. Too many automations making calls at x:00:00 tipped them over the edge.
Solcast could also have reduced all hobbyist accounts to 10 API calls a day instead of leaving the older accounts still on 50.

But we make progress !

Totes. Take my (small amount) of money, @Solcast. I'd pay 5/10 bucks/quid/whatever per month for less restriction, but so would probably everyone else with HA using Solcast. = same problem for them, but some more dosh to solve the problem.

At least they haven't shut down to hobbyists altogether... their forecasts are actually better than our local Bureau of Met at predicting awful / slightly less-than-awful days.

Forgot to mention #12 in the commit description, @gcoan.

1b96dec expands retry/caching to the GetUserUsageAllowance call.

Should the call occur and receive a 429 then without a cache existing it gets real ugly, and a bit confusing (logged errors/blah, blah resulting in Github issues raised and facepalms, but everything will work), however if cache exists there should not be enough "o"s in smooth to describe the experience.

I even translate the 429 into "Solcast too busy".

Amusingly, the variables that this call sets do not seem to have any material effect on the operation of the integration, hence failure not having any impact. It is my belief that Oziee probably intended to monitor the count of calls vs. the call limit to prevent call quota over-use happening, and therefore getting 429. He never did so, but laid the foundation. It was his assumption that 429 is only ever associated with over-quota use that was his un-doing.

429s can happen at any given moment and for several reasons, so this code is entirely redundant, and I now see why it is completely useless and abandoned. It almost certainly doesn't even need to be called by init. I just wasted an hour of my life that I'll never get back, but the log is cleaner. 🀣

Ah, well. That was fun. Sometimes you got to focus on the bigger picture...

So far confirmed, @gcoan. I removed the call to GetUserUsageAllowance from init (my possibly wasted hour), where it is only ever called, and restarting on the hour resulted in 429 for gather sites, then the script used cache and moved on, and all was well.

Logic pretty well confirmed. I will continue to monitor when the sun comes up here, but I can't see any reason in code why not making the call could alter anything. (The GetUserUsageAllowance function remains for posterity, but might make a comeback.)

Tomorrow is to tackle the actual forecast 429 status replies should they occur with a backed-off retry... Methinks one/two/four/eight/sixteen minutes perhaps. Then give up.

I may reinstate something like the GetUserUsageAllowance for this. Every call to get the forecast results in usage data calculated internally, which is currently kept. I'm thinking if usage is clearly over the limit for the next call, then don't bother with the retry/back-off, or even a call at all. That would require knowing the limit (could default to 10 on initial GetUserUsageAllowance call failure), and the remembered call count already resets to zero in code at UTC midnight.

I like it. I'll code it. Tomoz.

Sounds like you are continuing to have fun @autoSteve, but more significantly I should hold off installing 1b96dec if this contains changes to GetUserUsageAllowance that you've already making further changes to.

Tomorrow is to tackle the actual forecast 429 status replies should they occur with a backed-off retry... Methinks one/two/four/eight/sixteen minutes perhaps. Then give up.

Sounds like a great improvement to Solcast if it can gracefully handle 429 errors with auto-retry. I see failures every now and again in the Traces for my own Solcast update Automation, some days every single update (at random trigger times) works and some days I get a failure to at least one call.
But can I suggest you delay 1+random seconds/2+random sections/ etc. I'm sure there are lots of people who still have automations that run at xx:00:00 and if Solcast pauses a precise minute we'll just keep moving the bottleneck forward in time.
I'm also wondering about how long Home Assistant will allow the Solcast service call to execute for. Looks like Service API calls were changed to remove timeouts in 2023.7 so in theory could just keep retrying forever....
16 minutes delay especially after preceding 1+2+4+8=15 minutes of tries and retries feels a bit long to me.

I'm thinking if usage is clearly over the limit for the next call, then don't bother with the retry/back-off, or even a call at all. That would require knowing the limit (could default to 10 on initial GetUserUsageAllowance call failure), and the remembered call count already resets to zero in code at UTC midnight.

Agreed, no point in making service calls to Solcast and further overloading their servers that we know will fail. Should log a warning message though when this happens.

One final thought, I suggest any cache files written clearly indicate which integration they come from. Worried about proliferation of files that are not identifiable to the end user and thus 'sites.json' doesn't sit well with me. Ideally would keep everything inside the same solcast.json file but if that's not practicable then can we prefix the filenames, e.g. 'solcast_sites.json'

Cheers

I suggest any cache files written clearly indicate which integration they come from.

Agreed. solcast-sites.json it is. You can rename your cache file after moving to code.next.

1+random seconds/2+random sections/ etc.

Agreed.

Tomorrow is to tackle the actual forecast 429 status replies should they occur with a backed-off retry... Methinks one/two/four/eight/sixteen minutes perhaps. Then give up.

❀️ I had been reading the thread and was going to chime in and suggest a back off and retry with jitter is your best bet. Great to see you both come to that same conclusion.

https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/

I don't know a lot about python but https://pypi.org/project/backoff/ seems like it might be put to good use here.

Righto.

Re. commit: 08b3b33

The integration will now try up to five times to fetch forecast data. Between each try is a delay of (30 seconds * try count), plus a random number of seconds between zero and 30.

The number of API calls is also monitored, and if the limit is reached it will not attempt to gather the forecast.

This is quite untested code, so I have changed my automation to gather on the hour, which should result in some 429s being forthcoming.

If others want to test in a similar way to increase our collective chances then by all means do so. Save a backup copy of solcastapi.py and __init__.py.

I have logging configured in configuration.yaml like this so I can see everything that's going on, and am monitoring home-assistant.log.

logger:
  default: info
  logs:
    custom_components.solcast_solar: debug

@BJReplay, we should definitely not merge #27 until some testing checks out.

Ok, I will grab and run as well.

I have also changed the API attempts per day in automation to be six instead of five for my dual array set up, which will definitely cause the API limit to be busted at some point.

So, I assume to test I:

  1. Grab init.py and solcastapi.py and sensor.py from this branch and copy into my custom_components/solcast_solar directory
  2. Reconfigure Solcast limits as appropriate (I'm an early adopter with a 50 limit, and another service that runs hourly at 11 minutes past the hour from sunrise to sunset so makes around 11 calls a day at the momnet)
  3. Reconfigure my automation to run on the hour every hour
  4. Add logging to configuration yaml
  5. Restart HA
  6. Test :)

On restart...

homeassistant  | 2024-06-15 15:17:13.479 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
homeassistant  | 2024-06-15 15:17:13.479 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
homeassistant  | 2024-06-15 15:17:13.479 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 3
homeassistant  | 2024-06-15 15:17:18.480 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Timeout gathering rooftop sites data, last call result: 429, using cached data if it exists
homeassistant  | 2024-06-15 15:17:18.480 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data Solcast.com http status Error 404 - Gathering rooftop sites data
homeassistant  | 2024-06-15 15:17:18.482 ERROR (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data Exception error: Traceback (most recent call last):
homeassistant  |   File "/config/custom_components/solcast_solar/solcastapi.py", line 177, in sites_data
homeassistant  |     raise Exception(f"SOLCAST - HTTP sites_data error: Solcast Error gathering rooftop sites data")
homeassistant  | Exception: SOLCAST - HTTP sites_data error: Solcast Error gathering rooftop sites data
homeassistant  |
homeassistant  | 2024-06-15 15:17:18.482 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - getting API limit and usage from solcast
homeassistant  | 2024-06-15 15:17:18.524 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing usage cache
homeassistant  | 2024-06-15 15:17:18.527 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API counter is 11/50
homeassistant  | 2024-06-15 15:17:18.527 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - load_saved_data site count is zero!
homeassistant  | 2024-06-15 15:17:18.527 INFO (MainThread) [custom_components.solcast_solar] Solcast Integration version number: v4.0.27
homeassistant  | 2024-06-15 15:17:18.527 DEBUG (MainThread) [custom_components.solcast_solar.coordinator] Finished fetching solcast_solar data in 0.000 seconds (success: True)
homeassistant  | 2024-06-15 15:17:18.537 INFO (MainThread) [homeassistant.components.sensor] Setting up solcast_solar.sensor
homeassistant  | 2024-06-15 15:17:18.547 INFO (MainThread) [homeassistant.components.select] Setting up solcast_solar.select
homeassistant  | 2024-06-15 15:17:18.547 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Solcast API data UTC times are converted to Australia/Sydney
homeassistant  | 2024-06-15 15:17:18.550 INFO (MainThread) [homeassistant.bootstrap] Home Assistant initialized in 26.65s

Ahh, I need to set up caching:

homeassistant | 2024-06-15 15:30:36.537 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migrating from version 7

homeassistant  | 2024-06-15 15:30:36.538 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migration to version 7 successful
homeassistant  | 2024-06-15 15:30:36.541 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST apiCacheEnabled=False, solcast-sites.json=False
homeassistant  | 2024-06-15 15:30:36.541 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - connecting to - https://api.solcast.com.au/rooftop_sites?format=json&api_key=REDACTED
homeassistant  | 2024-06-15 15:30:37.051 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
homeassistant  | 2024-06-15 15:30:37.051 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
homeassistant  | 2024-06-15 15:30:37.051 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 1
homeassistant  | 2024-06-15 15:30:42.064 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
homeassistant  | 2024-06-15 15:30:42.065 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
homeassistant  | 2024-06-15 15:30:42.065 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 2
homeassistant  | 2024-06-15 15:30:47.080 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
homeassistant  | 2024-06-15 15:30:47.080 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
homeassistant  | 2024-06-15 15:30:47.080 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 3
homeassistant  | 2024-06-15 15:30:52.082 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Timeout gathering rooftop sites data, last call result: 429, using cached data if it exists
homeassistant  | 2024-06-15 15:30:52.083 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data Solcast.com http status Error 404 - Gathering rooftop sites data
homeassistant  | 2024-06-15 15:30:52.087 ERROR (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data Exception error: Traceback (most recent call last):
homeassistant  |   File "/config/custom_components/solcast_solar/solcastapi.py", line 177, in sites_data
homeassistant  |     raise Exception(f"SOLCAST - HTTP sites_data error: Solcast Error gathering rooftop sites data")
homeassistant  | Exception: SOLCAST - HTTP sites_data error: Solcast Error gathering rooftop sites data
homeassistant  |
homeassistant  | 2024-06-15 15:30:52.087 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - getting API limit and usage from solcast
homeassistant  | 2024-06-15 15:30:52.099 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET GetUserUsageAllowance, retry 1
homeassistant  | 2024-06-15 15:30:57.100 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET GetUserUsageAllowance, retry 2
homeassistant  | 2024-06-15 15:31:02.102 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET GetUserUsageAllowance, retry 3
homeassistant  | 2024-06-15 15:31:07.103 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Timeout getting usage allowance, last call result: 429, using cached data if it exists
homeassistant  | 2024-06-15 15:31:07.104 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - loading cached usage
homeassistant  | 2024-06-15 15:31:07.110 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API counter is 11/50
homeassistant  | 2024-06-15 15:31:07.111 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - load_saved_data site count is zero!
homeassistant  | 2024-06-15 15:31:07.111 INFO (MainThread) [custom_components.solcast_solar] Solcast Integration version number: v4.0.27
homeassistant  | 2024-06-15 15:31:07.111 DEBUG (MainThread) [custom_components.solcast_solar.coordinator] Finished fetching solcast_solar data in 0.000 seconds (success: True)
homeassistant  | 2024-06-15 15:31:07.119 INFO (MainThread) [homeassistant.components.sensor] Setting up solcast_solar.sensor
homeassistant  | 2024-06-15 15:31:07.129 INFO (MainThread) [homeassistant.components.select] Setting up solcast_solar.select
homeassistant  | 2024-06-15 15:31:07.130 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Solcast API data UTC times are converted to Australia/Sydney

Nonetheless, it seems to be working well (I restarted as the bremor/bureau_of_meteorology integration had a fix for the blocking warnings, so I downloaded that.

We have a bug, Houston:

image

Excuse the long screenshot, but it shows the normal pattern of no polling overnight, and hourly polling during the day.

sensor.solcast_pv_forecast_api_last_polled is set to a null unix datetime

Also, I'm not sure how to set up caching.

Hmmm, third restart lucky?

homeassistant  | 2024-06-15 15:56:14.305 INFO (MainThread) [homeassistant.setup] Setting up solcast_solar
homeassistant  | 2024-06-15 15:56:14.305 INFO (MainThread) [homeassistant.setup] Setup of domain solcast_solar took 0.00 seconds
homeassistant  | 2024-06-15 15:56:14.305 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migrating from version 7
homeassistant  | 2024-06-15 15:56:14.306 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migration to version 7 successful
homeassistant  | 2024-06-15 15:56:14.309 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST apiCacheEnabled=False, solcast-sites.json=False
homeassistant  | 2024-06-15 15:56:14.309 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - connecting to - https://api.solcast.com.au/rooftop_sites?format=json&api_key=REDACTED
homeassistant  | 2024-06-15 15:56:14.594 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'dict'>
homeassistant  | 2024-06-15 15:56:14.594 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 200
homeassistant  | 2024-06-15 15:56:14.594 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing sites data cache
homeassistant  | 2024-06-15 15:56:14.608 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data: {'sites': [{'name': 'redacated', 'resource_id': 'redacted', 'capacity': 5, 'capacity_dc': 4.85, 'longitude': redacted, 'latitude': redacted, 'azimuth': -123, 'tilt': 5, 'install_date': '2012-04-18T14:00:00.0000000Z', 'loss_factor': 0.9, 'tags': ['Home ', 'redacted']}], 'page_count': 1, 'current_page': 1, 'total_records': 1}
homeassistant  | 2024-06-15 15:56:14.608 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - getting API limit and usage from solcast
homeassistant  | 2024-06-15 15:56:14.654 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing usage cache
homeassistant  | 2024-06-15 15:56:14.658 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API counter is 12/50
homeassistant  | 2024-06-15 15:56:14.668 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - load_saved_data file exists.. file type is <class 'dict'>
homeassistant  | 2024-06-15 15:56:14.682 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-15 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.698 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-16 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.704 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-17 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.705 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-18 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.707 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-19 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.712 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-20 contains all 48 records
homeassistant  | 2024-06-15 15:56:14.754 INFO (MainThread) [custom_components.solcast_solar] Solcast Integration version number: v4.0.27
homeassistant  | 2024-06-15 15:56:14.756 DEBUG (MainThread) [custom_components.solcast_solar.coordinator] Finished fetching solcast_solar data in 0.000 seconds (success: True)

The ugly exception is saying that the sites data could not be acquired, and nor did you have a cache created yet, @BJReplay. Sequence is 429 fail for all three tries at the API, then a 404 to say the cache is not found. Honestly, it could be a little prettier.

Now that you have a solcast-sites.json file that will not reoccur. However for someone first setting up the integration and hitting that it would be a head scratcher. Unlikely, but Murphy is not an optimist...

Maybe a readme note is advisable.

sensor.solcast_pv_forecast_api_last_polled is set to a null unix datetime

Did this go away after successful restart?

Did this go away after successful restart?

Yes, it did.

Seems to be working. Will monitor over the next day or so.

Honestly, it could be a little prettier.

I have added a further error message, which will be included in the next commit: SOLCAST - Solcast integration did not start correctly, as rooftop sites data is needed. Suggestion: Restart the integration

Further, if the cache is not available: SOLCAST - cached sites data is not yet available to cope with Solcast API being too busy - at least one successful API call is needed

A readme note is definitely advisable to explain what is happening should this be hit.

This makes me well pleased...

2024-06-15 17:00:28.059 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-15 17:00:28.059 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API polling for rooftop b68d-c05a-c2b3-2cf9
2024-06-15 17:00:28.059 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Polling API for rooftop_id b68d-c05a-c2b3-2cf9
2024-06-15 17:00:28.059 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code url - https://api.solcast.com.au/rooftop_sites/b68d-c05a-c2b3-2cf9/forecasts
2024-06-15 17:00:28.059 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-15 17:00:28.259 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 40 seconds before retry
2024-06-15 17:00:46.571 INFO (MainThread) [homeassistant.components.recorder.backup] Backup start notification, locking database for writes
2024-06-15 17:00:47.095 INFO (MainThread) [homeassistant.components.recorder.backup] Backup end notification, releasing write lock
2024-06-15 17:00:47.098 INFO (Recorder) [homeassistant.components.recorder.core] Database queue backlog reached 12 entries during backup
2024-06-15 17:01:08.261 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-15 17:01:08.352 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 67 seconds before retry
2024-06-15 17:02:15.354 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-15 17:02:15.848 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API returned data. API Counter incremented from 6 to 7
2024-06-15 17:02:15.855 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session returned data type is <class 'dict'>
2024-06-15 17:02:15.855 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session status is 200
2024-06-15 17:02:15.857 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data Returned: {'forecasts': [{'pv_estimate': 0, 'pv_estimate10': 0, 'pv_estimate90': 0, 'period_end':
 '2024-06-15T07:30:00.0000000Z', 'period': 'PT30M'}, {'pv_estimate': 0, 'pv_estimate10': 0, 'pv_estimate90': 0, 'period_end': '2024-06-15T08:00:00.0000000Z', 'period': 'PT30M'}, {'pv_estimate': 0, 'pv_estim

Exhausing API usage quota does not make me well pleased. Much error. More work required...

A readme note is definitely advisable to explain what is happening should this be hit.

Agreed, but the logging that I've captured (and I assume you have as well) gives me a fair bit to work from to explain how it should work.

I added a note to the readme already.

And this makes me well pleased once quota runs out...

Will commit. Stand by.

2024-06-15 18:02:00.149 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-15 18:02:00.149 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API polling for rooftop b68d-c05a-c2b3-2cf9
2024-06-15 18:02:00.149 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Polling API for rooftop_id b68d-c05a-c2b3-2cf9
2024-06-15 18:02:00.149 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code url - https://api.solcast.com.au/rooftop_sites/b68d-c05a-c2b3-2cf9/forecasts
2024-06-15 18:02:00.149 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API limit exceeded, not getting forecast
2024-06-15 18:02:12.714 INFO (MainThread) [hass_nabucasa.google_report_state] Connected

Changes to coordinator.py are purely superficial.

Looking back on your logs, @BJReplay, it becomes super apparent how often 429s are hit. I can accept the on-the-hour, and on-the-half-hour 429s, but 17 minutes past the hour??? I thought this would be way harder to test, but catching a 429 seems to be easier than fishing in a barrel.

I'm calling this retry/caching code essential!

Folks, I reckon PR #27 is good for a healthy test.

  • Sites data GET is cached, with five second retries. If you get errors and this is not cached yet? Restart until it works...
  • API usage count/limit data is cached. If the GET fails, then I am still to contemplate what happens. I think the integration assumes zero. Probs. needs more contemplation and work.
  • Forecast GETs are retried five times with a back-off delay, plus an element of semi-randomness.
  • Forecast GETs where the API limit has been exhausted will fail gracefully with a warning

Looking forward to feedback before this is released.

This has been released by @BJReplay as 4.0.28. Check it out, and please report any issues whatsoever. Cheers.

I saw HACS was promoting 4.0.28 as the new release, with well-justified shout-out to @autoSteve, so we're going to get a lot of in the field testing of it!

Renamed my sites file, installed 4.0.28, restarted, no messages at all in the HA log but I see solcast-usage.json was created.

Restarted again at 13:00 to try to trigger the timeout errors, got the following logged:
image

Restarted again, in the full log got a 429 on the sites data:

2024-06-15 13:11:21.854 INFO (MainThread) [homeassistant.setup] Setting up solcast_solar
2024-06-15 13:11:21.854 INFO (MainThread) [homeassistant.setup] Setup of domain solcast_solar took 0.00 seconds
2024-06-15 13:11:21.854 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migrating from version 7
2024-06-15 13:11:21.855 DEBUG (MainThread) [custom_components.solcast_solar] Solcast Migration to version 7 successful
2024-06-15 13:11:21.856 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST apiCacheEnabled=False, solcast-sites.json=True
2024-06-15 13:11:21.856 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - connecting to - https://api.solcast.com.au/rooftop_sites?format=json&api_key=REDACTED
2024-06-15 13:11:21.957 INFO (MainThread) [custom_components.tapo.hass_tapo] Detected model of 192.168.1.146: P110
2024-06-15 13:11:22.079 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
2024-06-15 13:11:22.080 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
2024-06-15 13:11:22.080 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 1
...
2024-06-15 13:11:28.430 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned data type is <class 'NoneType'>
2024-06-15 13:11:28.430 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data code http_session returned status 429
2024-06-15 13:11:28.430 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - will retry GET rooftop_sites, retry 2
...
2024-06-15 13:11:38.473 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Timeout gathering rooftop sites data, last call result: 429, using cached data if it exists
2024-06-15 13:11:38.473 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - loading cached sites data
2024-06-15 13:11:38.479 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - sites_data: {'sites': [{'name': '87 High Street (E)', 'resource_id': 'bc31-4d4c-e844-6a4e', 'capacity': 10, 'capacity_dc': 8.58, 'longitude': -0.163655, 'latitude': 52.114921, 'azimuth': -100, 'tilt': 37, 'install_date': '2023-01-07T00:00:00.0000000Z', 'loss_factor': 0.99, 'tags': ['UK', 'Bedfordshire', 'GivEnergy']}, {'name': '87 High Street (W)', 'resource_id': '23bb-0216-85c6-fd94', 'capacity': 9, 'capacity_dc': 6.34, 'longitude': -0.163704, 'latitude': 52.114912, 'azimuth': 80, 'tilt': 37, 'install_date': '2023-01-07T00:00:00.0000000Z', 'loss_factor': 0.98, 'tags': ['UK', ' Bedfordshire', ' GivEnergy']}], 'page_count': 1, 'current_page': 1, 'total_records': 2}
2024-06-15 13:11:38.480 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - getting API limit and usage from solcast
2024-06-15 13:11:38.810 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing usage cache
2024-06-15 13:11:38.829 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API counter is 6/10
2024-06-15 13:11:38.844 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - load_saved_data file exists.. file type is <class 'dict'>
2024-06-15 13:11:38.862 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-15 contains all 48 records
2024-06-15 13:11:38.863 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-16 contains all 48 records
2024-06-15 13:11:38.864 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-17 contains all 48 records
2024-06-15 13:11:38.864 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-18 contains all 48 records
2024-06-15 13:11:38.865 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-19 contains all 48 records
2024-06-15 13:11:38.867 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-20 contains all 48 records
2024-06-15 13:11:38.877 INFO (MainThread) [custom_components.solcast_solar] Solcast Integration version number: v4.0.28
2024-06-15 13:11:38.877 DEBUG (MainThread) [custom_components.solcast_solar.coordinator] Finished fetching solcast_solar data in 0.000 seconds (success: True)

So two retries for the sites file and then it used the cached version successfully.

I'll change my automation for the regular solcast updates to run at 00:00 and add a few extra polls that will go over the API limit, but its all looking great so far

The two site downloads at 14:00:00 both worked successfully, code 200 first time!

Will have a look at the next scheduled poll.

Did see this message in the debug for both site polls which is a bit strange:
2024-06-15 14:00:06.311 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast returned 336 records (should be 168)

Did find one bug I think, the solcast-usage.json file isn't being updated correctly.

In the debug log:

2024-06-15 14:00:05.777 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API returned data. API Counter incremented from 7 to 8
2024-06-15 14:00:06.305 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session returned data type is <class 'dict'>
2024-06-15 14:00:06.305 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session status is 200

The JSON file contains:
{"daily_limit": 10, "daily_limit_consumed": 6}

And was only updated when HA was last restarted

Good to know @gcoan. Technically the cache not updating is not a bug, and by design. But it's a stupid design I guess...

I'll fix it.

Thanks @autoSteve I called it out because I assumed it that the "run out of API calls" detection would be using this JSON file and it wouldn't be detecting when to stop polling if the file wasn't updated?

The file is only used on script restart should the API call get a 429 repeatedly. So it definitely should update every time there is a successful API forecast call. That way, should restart occur and a 429 be seen at the moment it's asking Solcast "how many calls I got left?" it will know where it was up to without actually receiving an answer.

Couple of quick further updates

I added extra solcast polls tonight after my API calls were exhausted, they all worked as expected, logging that the call couldn't be made.

At 01:00:00 the next update run ran, I deliberately moved it to the 00:00 time to check the retry logic, which worked flawlessly:


2024-06-16 01:00:00.509 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-16 01:00:00.510 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API polling for rooftop bc31-4d4c-e844-6a4e
2024-06-16 01:00:00.510 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Polling API for rooftop_id bc31-4d4c-e844-6a4e
2024-06-16 01:00:00.510 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code url - https://api.solcast.com.au/rooftop_sites/bc31-4d4c-e844-6a4e/forecasts
2024-06-16 01:00:00.510 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-16 01:00:00.579 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 31 seconds before retry
2024-06-16 01:00:31.583 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-16 01:00:31.637 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 85 seconds before retry
2024-06-16 01:01:56.639 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-16 01:01:56.702 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 99 seconds before retry
2024-06-16 01:03:35.703 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-16 01:03:36.691 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API returned data. API Counter incremented from 0 to 1
2024-06-16 01:03:37.214 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session returned data type is <class 'dict'>
2024-06-16 01:03:37.214 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session status is 200
2024-06-16 01:03:37.216 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data Returned: {'forecasts': [{'pv_estimate': 0, 'pv_estimate10': 0, 'pv_estimate90': 0, 'period_end': '2024-06-16T00:30:00.0000000Z', 

All good πŸ‘

Good to know @gcoan. Technically the cache not updating is not a bug, and by design. But it's a stupid design I guess...

I'll fix it.

f73c4ad

Did see this message in the debug for both site polls which is a bit strange:
2024-06-15 14:00:06.311 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast returned 336 records (should be 168)

I've been noticing that, too. I've not delved into why, but things seem to work.

v4.0.29 released, which addresses the bugs you uncovered, @gcoan.

Interesting occurrence: issue #31. Restart after 4.0.29 upgrade got log entries that say "you gotta restart", yet issue report raised... Hmmm. The integration was reporting exactly the right thing, and yet, could it do better? (Bloke subsequently restarted several times and all was good.)

So could it do better?

Probably.

I am investigating a back-off retry for the sites data, but then folks would probably report "Ummm. Gah.... My Solcast integration hasn't started as quickly as I'd want, and there are all these retry log entries..."

@autoSteve What you've added to Solcast (caching of sites and retrying solar forecasts) should fix a lot of the problems, but if there isn't any retry on the initial sites get then I guess we are going to occasionally get issues such as #31 until every Solcast integration user has a cached copy of the sites file.

I think it would be worthwhile to add the retry to the sites file. There is a risk that some might complain about the startup time but most people don't monitor individual integration startups and they're all asynchronous starts anyway.

Maybe in the logged message make it clear that it's a Solcast error.

eg

'Solcast busy, unable to retrieve sites data and no local cached copy, please reload the Solcast integration to retry'

I believe you can just reload the integration rather than having to do a full HA restart

Updated to V4.0.29, deleted the sites.json and solcast.json files and restarted HA and all sensors are zapped. Log shows the following:
solcast.log

Reverted to my copy of V4.0.23.1, deleted solcast related files and restarted HA and all sensors back again (2 roofs).

Also, I noticed the Solcast System Option "Enable polling for updates" was enabled after I upgraded to V4.0.29, but I can't be sure how/when it got enabled. Should this option be deleted going forward? I can raise an issue for it if needs be.

Updated to V4.0.29, deleted the sites.json and solcast.json files and restarted HA and all sensors are zapped. Log shows the following:
solcast.log

Reverted to my copy of V4.0.23.1, deleted solcast related files and restarted HA and all sensors back again (2 roofs).

4.0.29 is working fine for me, I don't think your failure to get it working was anything to do with the new version, it was just a solcast error.

Older versions of the integration make a call to solcast on initial startup to get the sites data, if that fails (eg error 429, solcast site is busy) then they would zap all the sensors and the integration fails. That's what I raised the issue on.

In 4.0.28 and 4.0.29 the logic has been changed, the integration now tries a few times to get the sites data, and if it fails then it uses the cached copy of sites (now called solcast-sites.json). If you look at the logfile you will see the repeated fails then it tries to use the cached file, which then fails. If you had just reloaded the Solcast integration, chances are it would have worked this time.
When you reverted back to 4.0.23, you were just lucky that Solcast server was able to provide your sites data.

Once the sites file has been successfully downloaded and cached you should never see this startup issue again.

v4.0.29 released, which addresses the bugs you uncovered, @gcoan.

Installed v4.0.29 yesterday morning but somehow I had exhausted my 10 API calls so wasn't able to test it all day. This morning with the API count reset I could see the integration polling successfully (no 429 errors) and the solcast-usage.json file is being updated each time. Nice work @autoSteve πŸ‘

I assume that restarting the integration/HA causes an API poll to occur on startup? This is I think why I went from 6 to 10 API calls yesterday morning. I should have been on 4 but think I must have had some extra test API calls in the schedule which used me up.

I assume that restarting the integration/HA causes an API poll to occur on startup?

That's not consistent with my testing. I have restarted countless times and the API count is not increased.

Yes, the integration does REST calls on startup, but these calls do not increment the API use count.

Yes, I appreciate I might have just been unlucky, then lucky, but I was in danger of running out of api calls (I only have 10), so thought I'd better bail out for today and make a report to HQ. I don't actually have Solar PV yet, so I'm not dependent on it working (like Predbat users). I'll try again with V4.0.29 (or latest) when my api limit resets and let you know how I get on. I'm also not convinced that a reload is as clean as a restart, so I'll opt for the latter when I give it a go, and I'll repeat what I did earlier and delete all the solcast/sites files before I do a HA restart.
Keep up the good work folks, it's nice to see the enthusiasm to improve this integration.

v4.0.29 has an issue.

Suggest installing v4.0.30 sooner rather than later.

I assume that restarting the integration/HA causes an API poll to occur on startup?
That's not consistent with my testing. I have restarted countless times and the API count is not increased.

Yes, the integration does REST calls on startup, but these calls do not increment the API use count.

Strange. Can't explain how I went from having 2 to 4 API calls around 2am on Sunday morning and then skipped from 6 to 10 API calls in the morning.

Today's pattern is how I expect the API count to increase, using up by 10 API limit with calls at 1am, 6am, 10am, 2pm, 6:30pm (2 sites).
image

Anyway, never mind, it's in the past. I've installed v4.0.31 and its working fine

Bizarre @gcoan.

I had a super odd incident yesterday (today?) where I coincidentally restarted the integration at exactly the same time that the automation to update forecast fired.

Weird stuff happened, with seeing three consecutive forecast gathers for my two arrays in the log, all one second apart. So I went from two calls for the day to eight on the brink in a blink.

Methinks: Is some kind of HA automation retry going on?

Only the gods know. πŸ˜‚ It broke my head, and like you I moved on.

But it did get me thinking.

If this is a thing, then surely building a belt/braces into the integration might not be a bad thing. Logic like:

"I've been asked to update forecast. Fine. ... But now I am being asked to update forecast again so soon? Ah, yeah, nah, I'm not doing that as the request is implausible..."

Within five minutes is aggressive, but would have saved me four precious calls... I'm thinking within ten minutes from last successful.

Makes sense, there's no point in making another forecast download call (and consuming an API call/two calls) so soon after the first one.

Arguably even 30 minutes would be reasonable. Those on 50 API call limits with one array would be calling at most every 30 minutes, or hourly with two arrays. And to be honest I'd say they shouldn't be making so many calls anyway, that's part of why Solcast is overloaded. I would imagine there is some sort of periodic update to the forecast

Addendum: just read that Solcast have a global system of satellites that track cloud coverage which I didn't know, and the forecast is updated every 5-15 minutes https://solcast.com/forecast-accuracy
So maybe 20 minutes?

I'll haggle ya down to fifteen, guv'na.

The night is but young, for me, so I'll take a look into it.

Deal !

You're on. Super experimental code, @gcoan.

72276b4

This has not had any testing at all. I am at 10/10 calls for today, given the earlier mishap (it was today 🀦🏻), so cannot test until GMT 'flies' past here in many more hours. Us down here in Oz who do most dev at night, being from the future and all that need to patiently wait until the globe gets with the programme.

My plan for testing: Create an automation that just fires the solcast_solar.update_forecasts super frequently, or just do it from a service launch manually a few times. Observe logs. If within fifteen, should get SOLCAST - not requesting forecast because time is within fifteen minutes of {get_last_updated_datetime()}, and if outside the fifteen minute window, BAU.

If you don't want to waste calls, then wait until your next refresh time, and then fire it up again within fifteen.

You're on. Super experimental code, @gcoan.

72276b4

This has not had any testing at all.

@autoSteve It doesn't work, nice series of crashes though:

2024-06-17 12:50:00.321 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 12:50:00.321 ERROR (MainThread) [homeassistant.components.automation.solcast_forecast_update] Solcast solar forecast update: Error executing script. Unexpected error for call_service at pos 1: name 'get_last_updated_datetime' is not defined
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step
response_data = await self._async_run_long_action(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action
return await long_task
^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2741, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2784, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/config/custom_components/solcast_solar/__init__.py", line 153, in handle_service_update_forecast
await coordinator.service_event_update()
File "/config/custom_components/solcast_solar/coordinator.py", line 67, in service_event_update
await self.solcast.http_data(dopast=False)
File "/config/custom_components/solcast_solar/solcastapi.py", line 589, in http_data
if get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'get_last_updated_datetime' is not defined. Did you mean: 'self.get_last_updated_datetime'?
2024-06-17 12:50:00.325 ERROR (MainThread) [homeassistant.components.automation.solcast_forecast_update] While executing automation automation.solcast_forecast_update
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/components/automation/__init__.py", line 723, in async_trigger
return await self.action_script.async_run(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1769, in async_run
return await asyncio.shield(create_eager_task(run.async_run()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 464, in async_run
await self._async_step(log_exceptions=False)
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 528, in _async_step
self._handle_exception(
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 558, in _handle_exception
raise exception
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step
response_data = await self._async_run_long_action(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action
return await long_task
^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2741, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2784, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/config/custom_components/solcast_solar/__init__.py", line 153, in handle_service_update_forecast
await coordinator.service_event_update()
File "/config/custom_components/solcast_solar/coordinator.py", line 67, in service_event_update
await self.solcast.http_data(dopast=False)
File "/config/custom_components/solcast_solar/solcastapi.py", line 589, in http_data
if get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'get_last_updated_datetime' is not defined. Did you mean: 'self.get_last_updated_datetime'?
2024-06-17 12:52:00.111 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 12:52:00.111 ERROR (MainThread) [homeassistant.components.automation.solcast_forecast_update] Solcast solar forecast update: Error executing script. Unexpected error for call_service at pos 1: name 'get_last_updated_datetime' is not defined
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step
response_data = await self._async_run_long_action(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action
return await long_task
^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2741, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2784, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/config/custom_components/solcast_solar/__init__.py", line 153, in handle_service_update_forecast
await coordinator.service_event_update()
File "/config/custom_components/solcast_solar/coordinator.py", line 67, in service_event_update
await self.solcast.http_data(dopast=False)
File "/config/custom_components/solcast_solar/solcastapi.py", line 589, in http_data
if get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'get_last_updated_datetime' is not defined. Did you mean: 'self.get_last_updated_datetime'?
2024-06-17 12:52:00.112 ERROR (MainThread) [homeassistant.components.automation.solcast_forecast_update] While executing automation automation.solcast_forecast_update
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/components/automation/__init__.py", line 723, in async_trigger
return await self.action_script.async_run(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 1769, in async_run
return await asyncio.shield(create_eager_task(run.async_run()))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 464, in async_run
await self._async_step(log_exceptions=False)
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 528, in _async_step
self._handle_exception(
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 558, in _handle_exception
raise exception
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 526, in _async_step
await getattr(self, handler)()
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 763, in _async_call_service_step
response_data = await self._async_run_long_action(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 726, in _async_run_long_action
return await long_task
^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2741, in async_call
response_data = await coro
^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/core.py", line 2784, in _execute_service
return await target(service_call)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/config/custom_components/solcast_solar/__init__.py", line 153, in handle_service_update_forecast
await coordinator.service_event_update()
File "/config/custom_components/solcast_solar/coordinator.py", line 67, in service_event_update
await self.solcast.http_data(dopast=False)
File "/config/custom_components/solcast_solar/solcastapi.py", line 589, in http_data
if get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'get_last_updated_datetime' is not defined. Did you mean: 'self.get_last_updated_datetime'?

@autoSteve I fixed the crash in solcastapi.py, changing the code to:

        if self.get_last_updated_datetime() + timedelta(minutes=15) < dt.now(timezone.utc):
            _LOGGER.warning(f"SOLCAST - not requesting forecast because time is within fifteen minutes of {self.get_last_updated_datetime()}")

(added "self." to each of the get_last_updated_datetime() calls)

Did a manual run, its decided not to update solcast, not sure that is correct as I haven't updated (successfully) since 10:15
image

2024-06-17 13:38:09.712 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 13:38:09.712 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 09:15:55.606974+00:00

Looking at the code I see it gets the 'last_updated' from itself. When I restarted HA to put this new code in, is this zero'd and so the calculation of 'its not 15 minutes elapsed' thus not working?
Either code for this or use sensor.solcast_pv_forecast_api_last_polled?

Belay that idea, the comparison isn't right somehow. Its 13:50, or 12:50 UTC. Last update was 10:15 (9:15 UTC). It appears to be checking against the correct last update time in UTC. I tried again and then left it so HA has been running 23 minutes but the comparison still fails:

2024-06-17 13:50:50.550 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 13:50:50.550 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 09:15:55.606974+00:00
2024-06-17 13:59:58.227 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 13:59:58.228 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 09:15:55.606974+00:00

Much. need. focus.

I've just been working around an issue for version= that oziee got us into...

Epic failure logs. I should frame that as my biggest Python fail. self. is "duh"...

Yeah, something is odd. The comparison is to dt.now(timezone.utc), so what the? I might need to log both the last, and the comparison time that happens.

Replace 590 with this. What do you get?

            _LOGGER.warning(f"SOLCAST - not requesting forecast because time is within fifteen minutes of {self.get_last_updated_datetime()}, time now is {dt.now(timezone.utc)}")

("self." ... Gah.)

OMG. It must be late.

Line 589, change < to >...

        if self.get_last_updated_datetime() + timedelta(minutes=15) > dt.now(timezone.utc):

@autoSteve so why did the Python interpreter not understand what you meant. Too literal these computers. <, >, its all very similar really...

I changed the two lines as above, restarted HA, got 3 successive fails at reading the sites file so it fell back to using the cache and started up OK.

Ran my automation to pull the forecast, got a 429 error, waited 59 seconds, then worked fine this time updating my forecast ;-)

Ran it again a few minutes later, it correctly decided it didn't want to run:

2024-06-17 20:02:16.315 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-21 contains all 48 records
2024-06-17 20:02:16.316 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Data for 2024-06-22 contains all 48 records
2024-06-17 20:04:17.383 WARNING (MainThread) [homeassistant.components.websocket_api.http.connection] [139689409521136] from 195.166.158.163 (Home Assistant/2024.5.1 (io.robbie.HomeAssistant; build:2024.688; iPadOS 17.5.1)): Disconnected: Did not receive auth message within 10 seconds
2024-06-17 20:05:38.856 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-17 20:05:38.896 WARNING (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - not requesting forecast because time is within fifteen minutes of 2024-06-17 19:02:16.296654+00:00, time now is 2024-06-17 19:05:38.896497+00:00

Couple of suggestions from seeing the above behaviours:

  • Suggest that the pause and retry in the initial load of sites file be increased. At the moment its only a 5 second delay between the 3 retries which could well not be enough to let any Solcast overloading to have passed. I know this will further extend the startup time of the Solcast integration but something around 30-60 seconds between retries seems more as if it will give Solcast a chance to sort itself out

  • In the logging of the 'not less than 15 minutes have passed', the last run time is in UTC. I suspect people won't realise this is the case and we'll get spurious issues raised that 'the time is wrong'. Convert to local time zone? (I hate the different date/time routines, far too many options and need to think carefully what time structure you are passing to them)

Just in case it helps:

  • Disabled my V4.0.23.1 install.
  • Cleared solcast json files from config directory.
  • Upgraded to V4.0.31.
  • Cleared logs.
  • Restarted HA.

Solcast started sweet as a nut.

  • All expected sensors populated.
  • System Option "Enable polling for updates stayed disabled.
  • Log shows
    2024-06-17 22:58:05.462 INFO (MainThread) [homeassistant.components.automation.solcast_pv_forecast] Initialized trigger Solcast PV Forecast
    2024-06-17 22:59:15.576 INFO (MainThread) [homeassistant.setup] Setting up solcast_solar
    2024-06-17 22:59:22.545 INFO (MainThread) [custom_components.solcast_solar] Solcast Integration version number: v4.0.31
    2024-06-17 22:59:22.565 INFO (MainThread) [homeassistant.components.sensor] Setting up solcast_solar.sensor
    2024-06-17 22:59:22.655 INFO (MainThread) [homeassistant.components.select] Setting up solcast_solar.select
    2024-06-17 22:59:22.656 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Solcast API data UTC times are converted to Europe/London

System Info:
Core 2024.6.3
Supervisor 2024.06.0
Operating System 12.3
Frontend 20240610.1

Happy days

last run time is in UTC

This should localise it.

_LOGGER.warning(f"SOLCAST - not requesting forecast because time is within fifteen minutes of {self.get_last_updated_datetime().astimezone(self._tz)}")

@autoSteve I messed about with the log messages a bit more to make them readable. Here's my final version of the code fragment which works perfectly, thanks

    async def http_data(self, dopast = False):
        """Request forecast data via the Solcast API."""
        if self.get_last_updated_datetime() + timedelta(minutes=15) > dt.now(timezone.utc):
            _LOGGER.warning(f"SOLCAST - not requesting forecast because time now ({dt.now(timezone.utc).astimezone(self._tz)}) is within fifteen minutes of last forecast update at {self.get_last_updated_datetime().astimezone(self._tz)}")
            return

I had made almost exactly the same change, @gcoan. Just dropped the time now because that is already referenced on the log entry.

My last forecast poll was very close to timing out... 366 seconds total, so very near the total async timeout of 480 seconds.

I have another idea for retries to further improve reliability.

Should the successive back-offs eventually fail, then possibly wait a further delay and try the back-off sequence again. No idea of the best way to actually implement that at the moment, because it is an HA automation that triggers Solcast collection.

I wonder whether somebody way smarter than me at automations could think about that with me.

The API Last Polled is exposed as a diagnostic, but no diagnostic info about the last failure to poll. If the last failure datetime were exposed, then I think writing an automation to trigger a retry after, say 20 minutes might be trivial. Randomness would not be required, because the prior attempt used randomness.

Logic like: if last failure to poll is greater than last successful poll, and twenty minutes has elapsed since last failure, then fire a call.

Thoughts?

2024-06-19 13:57:00.500 INFO (MainThread) [custom_components.solcast_solar] SOLCAST - Service call: update_forecasts
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API polling for rooftop b68d-c05a-c2b3-2cf9
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Polling API for rooftop_id b68d-c05a-c2b3-2cf9
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code url - https://api.solcast.com.au/rooftop_sites/b68d-c05a-c2b3-2cf9/forecasts
2024-06-19 13:57:00.500 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 13:57:00.652 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 47 seconds before retry
2024-06-19 13:57:07.790 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 13:57:47.654 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 13:57:47.691 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 86 seconds before retry
2024-06-19 13:58:10.258 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 13:59:12.693 INFO (MainThread) [homeassistant.components.automation.state_of_charge_to_mqtt] State of charge to MQTT: Running automation actions
2024-06-19 13:59:12.693 INFO (MainThread) [homeassistant.components.automation.state_of_charge_to_mqtt] State of charge to MQTT: Executing step call service
2024-06-19 13:59:13.693 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 13:59:13.787 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 96 seconds before retry
2024-06-19 13:59:50.048 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 14:00:49.788 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 14:00:49.874 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Solcast API is busy, pausing 138 seconds before retry
2024-06-19 14:00:59.326 INFO (MainThread) [homeassistant.components.recorder.backup] Backup start notification, locking database for writes
2024-06-19 14:00:59.879 INFO (MainThread) [homeassistant.components.recorder.backup] Backup end notification, releasing write lock
2024-06-19 14:00:59.881 INFO (Recorder) [homeassistant.components.recorder.core] Database queue backlog reached 2 entries during backup
2024-06-19 14:01:12.854 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 14:02:57.602 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2024-06-19 14:03:07.875 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - Fetching forecast
2024-06-19 14:03:08.382 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - API returned data. API Counter incremented from 4 to 5
2024-06-19 14:03:08.382 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - writing usage cache
2024-06-19 14:03:08.405 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session returned data type is <class 'dict'>
2024-06-19 14:03:08.405 DEBUG (MainThread) [custom_components.solcast_solar.solcastapi] SOLCAST - fetch_data code http_session status is 200

Thoughts?

Trying not to over-think it (as I spin up oracle cloud instances as ampere availability becomes available again - woot); my inclination is to just skip a poll.

Here's why; (other than on first start up of the integration), you've got a forecast; it might be seven days out of date, but it's there.

It might not be up to date, but it's there.

Now, in my use case, for my actual battery control, I'm looking for near term forecasts to work out whether or not to fine tune my battery charging behaviour as I approach the start of expensive power - but really, that's just (as my old boss used to say) pencil f_#%ing: I know what my load is going to be, I know what my insolation is going to be Β± cloud cover changes in the next hour or so, and any updated forecast is just going to slightly change my already planned charging activity slightly. Whether or not I get a new forecast won't change the fact that the wind might blow in a slightly different direction and make that updated forecast wrong in any case - so if I don't get it, I still might make the wrong decision.

Which is a really long way of saying: Try a few times, and if you can't get a result, skip that cycle.

Which is what my PV control algo has been doing for the last few years (blindly, mind you - it always uses the latest available forecast, and always tries to update the forecast, and doesn't know whether or not it got an updated forecast or not - it just always re-requests a forecast regularly through the day, and always re-adjusts charging targets based on remaining forecast insolation, load and state of charge).

From a practical point of view, I absolutely agree.

From an "I'm getting 429 error", so raise issue POV I think it has merit...

All the logging of retries is usually silently at debug level, so happens in the background.

I poll roughly once per hour (but not on the hour to avoid 429's). Not each hour as the evening is more important for me to get regular updates. I have an "Allowed API use" sensor for specific times of the day. Here is the template:

    - name: "Solcast API usage max allowed"
      state: >
        {% set hours = [[0,48],[300,6],[400,10],[500,12],[600,16],[700,20],[1200,24],[1500,32],[2000,40]] %}
        {% set t = states('sensor.time').replace(':','')|int %}
        {% set ns = namespace(level = 0) %}
        {% for h in hours if t >= h[0] %}
          {% set ns.level = h[1] %}
        {% endfor %}
        {{ ns.level }}

It returns the current max API calls that should have been allowed up until the current time. I then opportunistically check in another automation with a 30min offset of my normal polling automation if I am allowed to do another poll (ie. if API's used < sensor.solcast_api_usage_max_allowed), and if so, I will poll again (whilst making sure you stay in your allowed usage). This will help if polling soon after 429's is important.

Yeah, thinking about it as I was just mowing our little strip of nature strip, I can see a change to the readme, which I'll add to your draft PR along the lines of:

As noted below, we suggest you automate calling solcast_solar.update_forecasts regularly through the day. Internally, solcast_solar.update_forecasts will attempt several retries if it gets a 429 busy response from the Solcast servers. However, if, after a number of attempts, it is unable to retrieve a forecast, it will log this in the logs (which you will see if you enable debug logging) but will otherwise fail silently. You will still have a forecast for the period (if one had been retrieved earlier) rather than a gap - but it will not be as up to date as you might have hoped.

It is up to you to build a belt and braces approach if it is absolutely essential for you to retrieve updated forecasts on a regular basis, bearing in mind that the integration already caches the forecasts that it has retrieved and these forecasts - even if out of date - are available for display and analysis.

but will otherwise fail silently

It retries silently, but fails loudly. πŸ˜‚

but will otherwise fail silently

It retries silently, but fails loudly. πŸ˜‚

I think we should make it clear that sensor.solcast_pv_forecast_api_last_polled contains the last successful poll time, so if it fails to poll due to 429's then this sensor time won't be updated.

I agree with @BJReplay thoughts about there being limited criticality of having an absolutely up to date forecast anyway, at best its just a forecast and clouds and stuff happens that means that its never going to be 100% accurate. I update the forecast at 6am, 10am, 1pm, 6pm and 11pm to give Predbat (my battery automation software) the best chance to include a more refined view for start of solar day/afternoon peak/what tomorrow's solar will be etc, but if I miss an update its not the end of the world.
[As an aside I find I almost never hit the PV 50% estimate, it's over-optimistic and I usually achieve somewhere around 30% of the way between the PV 10% and PV50% estimates. Having 3 different arrays on East/West sides of the roof probably is a key factor in this, the forecast is thus a composite sum.]

But failing silently to do something that was requested doesn't feel good practice to me and I do feel we ought to log the failure in an accessible way so that those that feel so inclined can automate using the information.
We have sensor.solcast_pv_forecast_api_last_polled which is the last successful poll time, how about adding two attributes, last_poll_successful (True/False to indicate whether it did retrieve OK or not) and last_poll_time (time of actual last poll regardless of whether it was successful or not).

But failing silently to do something that was requested

Like I said, it fails loudly with an error, but retries at debug log level. Folks would have no idea that it had actually thoughtfully retried before the failure occurred unless debug logging is on. I could log the retries about Solcast being busy as warnings if you both feel that that's better.

Last Poll Time confuses me. We have one that's already called called API Last Polled, and can't rename it (it should be called something like Forecast Last Received), so another name for Last Poll Time I think. API Last Attempted Poll???

The logic would then go something like, if last poll not successful and current time = last attempted poll time + some minutes then trigger the poll.

Why not add another sensor to the integration that return a status, e.g. last_result {OK, CACHE, ERROR}.
OK = got fresh data
CACHE = Using the cached data
ERROR = No cache, or another API error.

This way it isn’t a silent fail and further (debug) investigation is needed…