python-jsonschema/check-jsonschema

OverflowError: mktime argument out of range

balihb opened this issue ยท 12 comments

balihb commented

Trying to use the following schema URL:

https://raw.githubusercontent.com/microsoft/AI/master/.ci/listing_config_schema.json

Gives the following error on the 2nd run:

Error: Unexpected Error building schema validator
OverflowError: mktime argument out of range
  in "C:\Users\balihb\.cache\pre-commit\repos2k4wtzx\py_env-python3.11\Lib\site-packages\check_jsonschema\checker.py", line 52
  >>> return self._schema_loader.get_validator(

Thanks for the bug report!

I wasn't able to reproduce this. In fact, I'm not able to use that schema at all, as it does not appear to be a valid JSON Schema. Checking it with --check-metaschema fails with a long list of errors.

Maybe the schema has changed between the time of your report and my test? I don't have a lot of bandwidth to look into this right now, but if the schema is valid, that would mean something is wrong with the validation currently done under --check-metaschema.

balihb commented

I was trying on Windows. might it be a problem with platform support?

It shouldn't be, but also it shouldn't fail at all so I'm remaining open-minded. ๐Ÿ˜‰

Could you post the results of running one of the failing commands with --traceback-mode full?
My current suspicion is that something is amiss with the handling of Last-Modified times for file downloads, but it's not something I've seen before, so I need a bit more info to track it down.

balihb commented

@balihb I have seen OverflowError pop up with datetime operations on Windows when it's a 32-bit version of Python. Would you run the following to help identify whether this is 32-bit or 64-bit Python?

python -c "import sys; print(sys.version)"

That should help focus the troubleshooting effort, by either identifying -- or eliminating -- one possible cause for that OverflowError. Thanks!

balihb commented

Thanks @balihb, and have a great weekend!

balihb commented

ok, I have two solution:

    def _lastmod_from_response(self, response: requests.Response) -> float:
        if "last-modified" in response.headers:
            return time.mktime(
                time.strptime(
                    response.headers.get("last-modified"),
                    self._LASTMOD_FMT,
                )
            )
        else:
            return time.mktime(time.gmtime(2*86400)) - 2*86400

but I think this should be enough:

    def _lastmod_from_response(self, response: requests.Response) -> float:
        if "last-modified" in response.headers:
            return time.mktime(
                time.strptime(
                    response.headers.get("last-modified"),
                    self._LASTMOD_FMT,
                )
            )
        else:
            return 0
balihb commented

similar issue I've found:
neo4j/neo4j-python-driver#302
also from the time doc:
The earliest date for which it can generate a time is platform-dependent.

Thanks for running this issue down and sharing your research!

@kurtmckee and I chatted about this a little bit, and I'm going to take a slightly different approach to fixing this, using try-except to handle a few different cases.

I'll post a PR shortly, so you'll be able to see the fix with tests, but the additional scenarios we wanted to handle include:

  • Last-Modified was present but malformed (unparseable)
  • Last-Modified was present with a value which triggers the OverflowError

@balihb, I'll make sure to credit you for your work in the changelog! ๐Ÿ‘

This should be fixed in v0.23.2 which I just released.

Please let me know if you still see this problem or run into other issues!