500 Internal Server Error on the first day of the month
clowncracker opened this issue · 19 comments
Describe the issue
I have installed the integration via HACS. Starting today I've hit a weird error that I haven't had before: Tried to configure it the way I want, but then deleted and installed again with bare-minimum default configuration. When it loads, it shows on the Integrations page as the following:
Failed setup, will retry: 500, message='Internal Server Error', url=URL('https://api.pirateweather.net/forecast/xxxx/xxxx,xxxxunits=si&extend=hourly&version=2')
I uninstalling an resinstalling via HACS, tried creating a new API key, and I tried using the web URL with my latitude/longtitude and it does not work. When I made both of the numbers positive with the web url, it showed me a location in China. It looks like it's an issue with my location specifically.
Home Assistant version
2024.6.4
Integration version
1.5.2
Troubleshooting steps
- I have updated my Home Assistant installation to the latest version.
- I have updated the Pirate Weather Integration to the latest version.
- I have gone through the documentation before opening this issue.
- I have searched this repository and API Repository to see if the issue has already been reported.
- I have restarted my Home Assistant installation.
- I have queried the API in my browser to confirm the issue is not with the API.
- I have written an informative title.
I'm also seeing this behavior, I am on version 1.5.3 though. I backed up to 1.5.2 and had the same issue on that version as well.
I was seeing this, and was in the middle of writing an update to this issue saying as much and then I reloaded HA (for the n’th time whilst encountering this) and it’s seems to be back now… So it looks like this may be fixed?
Like skymoo said, this morning refreshing my HA dash and PirateWeather is returning data so no longer having an issue myself.
This issue is an API issue which occurs on the first of every month and it seems to fix itself after a number of hours. (See #242 and #208) I've transferred this issue over to the API repository and will leave open for @alexander0042 to look into and fix.
Shoot, I thought I fixed this last time, but clearly not! It’s a bug with the date time conversion, but really should have been fixed, so this is frustrating!
Regardless, this sort of downtime isn’t acceptable! Let’s keep this issue open and high priority until I get the test working again
Shoot, I thought I fixed this last time, but clearly not! It’s a bug with the date time conversion, but really should have been fixed, so this is frustrating!
What I find weird is that even though the fix didn't fully fix the issue from popping up again it seemed to be able to recover from the issue itself.
Yea, it has to do with the datetime conversion when going back to the start of the day for the high/ low values. Shouldn't be difficult to fix, but irritating to get working correctly
Ah, that would explain why it would eventually fix itself after a period of time/
@alexander0042 Currently seeing an Internal Service Error for my location again currently. Seems to only affect locations in the HRRR domain atm.
Seeing it too- nothing related to ingest, so looking into other causes now
Seems to be working again on my end now. I know there was some downtime yesterday evening around 6pm EDT and it sorted itself out shortly after I noticed which maybe happened here as well?
Yea, same root cause, and was actually ingest. Every so often one of the forecast files doesn't download, so I end up with the wrong sized file. I though I'd added checks to every script, but missed the 0-18h HRRR. Added it now and checked the others, so this particular glitch should hopefully be closed for good! Couple other thoughts:
- I'm updating the status page to query a different location. It's currently querying somewhere outside of HRRR (0,0), which means it misses things like this.
- Going to push out a 2.0.11 with a new fallback to GFS instead of 500 if there's anything wrong with HRRR
Good to know this should be fixed going forward. I'll leave this issue open since it's pertaining to an issue at the start of the month,
- I'm updating the status page to query a different location. It's currently querying somewhere outside of HRRR (0,0), which means it misses things like this.
Would it make sense then to have the status page query multiple different locations? 1 in the NBM domain but not in the HRRR domain, 1 in the HRRR domain and 1 in the GFS domain
EDIT: Looking at the status page it seems to show the development endpoint as being down since June 15. Maybe whatever location its using is having issues? I think you said you use 0,0?
- Going to push out a 2.0.11 with a new fallback to GFS instead of 500 if there's anything wrong with HRRR
In this case wouldn't a better fallback be NBM and then use GFS if there are issues with NBM and HRRR? Or would that be too complicated and just falling back to GFS be easier?
Yea, ideally it falls back to HRRR/NBM, and in most cases it should, this is just a massive try catch around a bunch of code as a backup!
Testing out the auto update approach, so 2.0.11 should propagate slowly later today!
@alexander0042 Since today is the last day of July just checking in to see if this has been fixed at all. Will the fallback solution added in 2.0.11 solve this issue in the short term while a long term fix is worked on?
I think it's back
Yup, just checked and the API is down currently. The workaround to exclude HRRR still works and the issue will sort itself out in a few hours.
I guess the fix in 2.0.11 didn't fix this issue. Also checked the dev endpoint which is running 2.1 and it's also down.
Just commenting that the API is back up and running again this morning. Will ping @alexander0042 to look into this issue to hopefully solve it by the time September rolls around.
With the release of V2.1 this should finally be fixed. Will close this for now but we can always re-open it again if it occurs again.