inducer/islpy

Must not use Pybind's implicit object-identity `__hash__`

kaushikcfd opened this issue ยท 8 comments

To Reproduce

  1. Create a file: islpy_hash.py as:
import islpy as isl


a = isl.BasicSet("{[i,j]: 0<=i<=j<10}")

print(hash("hashing of islpy objects isn't Python compliant"))
print(hash(a))
print(hash(a))
  1. Call it as PYTHONHASHSEED=3 python islpy_hash.py.

Observed behavior

(py311_env) [line@line ~]$ PYTHONHASHSEED=3 python islpy_hash.py 
-5820926828285319218
8778101896947
8778101896947
(py311_env) [line@line ~]$ PYTHONHASHSEED=3 python islpy_hash.py 
-5820926828285319218
8727734557427
8727734557427
(py311_env) [line@line ~]$ PYTHONHASHSEED=3 python islpy_hash.py 
-5820926828285319218
8787476382451
8787476382451

Expected behavior

The message printed to stdout must be the same across interpreter runs.

Does __hash__ have documented cross-run semantics?

FWIW, isl_set_get_hash returns deterministic results as its seed is a constant.

But then something's weird. set_get_hash is a direct wrapper of isl_set_get_hash:

        uint32_t  set_get_hash(set const &arg_self)
        {
          isl_ctx *islpy_ctx = nullptr;

                    if (!arg_self.is_valid())
                      throw isl::error(
                        "passed invalid arg to isl_set_get_hash for self");
                    

                        islpy_ctx = isl_set_get_ctx(arg_self.m_data);
                        
if (islpy_ctx) isl_ctx_reset_error(islpy_ctx);
uint32_t result = isl_set_get_hash(arg_self.m_data);
return result;
        }

and

wrap_set.def("__hash__", isl::set_get_hash, "get_hash(self)\n\n:param self: :class:`Set`\n:return: uint32_t \n\n.. warning::\n\n    This function is not part of the officially public isl API. Use at your own risk.");

Aah I see. Set.__hash__ is consistent across runs (irrespective of PYTHONHASHSEED). However, BasicSet.__hash__ isn't. This is because ISL does not define isl_basic_set_get_hash and so isl.BasicSet.__hash__ is pybind11_builtins.pybind11_object.__hash__

Using pybind11_object.__hash__ is even more problematic as:

(py311_env) [line@line temp]$ cat understand_bset_hash.py 
import islpy as isl

a1 = isl.BasicSet("{[i]: 0<=i<512}")
a2 = isl.BasicSet("{[i]: 0<=i<512}")
print(hash(a1))
print(hash(a2))
print(a1 == a2)
print({a1: "hello", a2: "world"})

(py311_env) [line@line temp]$ python understand_bset_hash.py 
8766500567951
8766500487043
True
{BasicSet("{ [i] : 0 <= i <= 511 }"): 'hello', BasicSet("{ [i] : 0 <= i <= 511 }"): 'world'}

I wonder if we should use the upcasting logic for such types for which ISL does not define isl_xxx_get_hash.

See #103 for a first stab. I'm excited for all the ways in which this breaks loopy. ๐Ÿ™‚

Not sure if this is useful, but while looking at this issue, I noticed that isl does offer isl_basic_set_get_hash (see https://repo.or.cz/isl.git/blob/HEAD:/isl_map.c#l11315), but does not expose it in its headers, which is why islpy doesn't pick it up.

Manually adding it to the header and rebuilding islpy seems to make the two examples in this PR run as expected:

$ PYTHONHASHSEED=3 python islpy_hash.py
-2348628950087003433
1863046788
1863046788
$ PYTHONHASHSEED=3 python islpy_hash.py
-2348628950087003433
1863046788
1863046788
$ PYTHONHASHSEED=3 python islpy_hash.py
-2348628950087003433
1863046788
1863046788

$ PYTHONHASHSEED=3 python understand_bset_hash.py 
181834274
181834274
True
{BasicSet("{ [i] : 0 <= i <= 511 }"): 'world'}
$ PYTHONHASHSEED=3 python understand_bset_hash.py 
181834274
181834274
True
{BasicSet("{ [i] : 0 <= i <= 511 }"): 'world'}
$ PYTHONHASHSEED=3 python understand_bset_hash.py 
181834274
181834274
True
{BasicSet("{ [i] : 0 <= i <= 511 }"): 'world'}