spotify/pythonflow

Nondeterministic string hashing in Python(>3.3)

Cjen1 opened this issue · 1 comments

Cjen1 commented

I was running into some weird issues with incorrect caching to file a function applied to a string.

This is because python(>3.3) salts its hashing function. (for strings at least)
Specifically:

> python -c "print(hash('asdf'))"
-8690208562067163084
> python -c "print(hash('asdf'))"
-4220296486527231708

The fix for this is to pass in PYTHONHASHSEED=1.
The 'proper' fix would be to substitute the internal hash function for something more suitable, however I couldn't immediately see the right place to inject that.

PYTHONHASHSEED=1 python -c "print(hash('asdf'))"
-5132432945605986887
PYTHONHASHSEED=1 python -c "print(hash('asdf'))"
-5132432945605986887

Thanks for reporting. This should only be a problem if results are cached across different processes. Do you have a reproducible code snippet to illustrate the issue?