data-apis/python-record-api

We don't know where ufuncs are from!

saulshanabrook opened this issue · 6 comments

During the tracing, it's helpful to know not only which methods on the ufuncs class are called (__call__, reduce, etc) but also which ufuncs themselves are used (add, multiple, etc).

Currently, we are presenting the results of those, not as the product of those two features, but as their union. i.e. we should stats for the reduce method on the ufunc class, but we don't show how many times reduce was called on add vs multiple. That's one "issue", but the other more pressing one is we don't know where ufuncs come from!

All we know is their names. Up until now, I had been assuming they are all defined in the numpy module. However, scipy for example has many that are not.

We should somehow figure out how to understand where they were defined, or what module they were imported from.

I guess to do this, we would have to do some kind of traversal of imported modules, to understand where they are defined? This also could be helpful for the related problem of recording, which module, exports a certain type instead of which module it was defined in.

Another issue, that I touch on briefly, is that we don't differentiate between the call signatures for different ufuncs.

The issue is really that we are trying to represent the calls to different ufunc types. So we want to say something like:

"When the ufunc name is "sin" we called "call" with these args".

But how do we write a type definition for that? How do we represent that in our current type hierarchy, where we talk about classes and methods?

i.e. if we look at the typing for ufuncs, they are similar to the ones we generate, but are not seperated by ufunc name: https://github.com/numpy/numpy-stubs/pull/44/files#diff-542b8065c42915076a70d8a091c6f08c

We should somehow figure out how to understand where [ufuncs] were defined, or what module they were imported from.

This is tricky when ufunc objects don't have a __module__ attribute! The best solution we've got so far in Hypothesis is to just check known modules which might define it... this actually works pretty well, but it would be nice to have a more principled way to do it.

@Zac-HD Ha yeah, we will have to do something similar, hopefully in some way that isn't hard coded though ideally.

Since we already have tracing, I think we can do this, just looking at all imports or getattr (on modules) bytecode executions to see when they return a ufunc, then we know where it came from.

I am curious, what do you use the modules for?

I needed module names to write import statements for the Hypothesis Ghostwriter, which outputs the source code for a property-based test! Actual output example:

$ hypothesis write numpy.matmul
import hypothesis.extra.numpy as npst
import numpy
from hypothesis import given, strategies as st

@given(
    data=st.data(),
    shapes=npst.mutually_broadcastable_shapes(signature="(n?,k),(k,m?)->(n?,m?)"),
    types=st.sampled_from(numpy.matmul.types).filter(lambda sig: "O" not in sig),
)
def test_gufunc_matmul(data, shapes, types):
    input_shapes, expected_shape = shapes
    input_dtypes, expected_dtype = types.split("->")
    array_st = [npst.arrays(d, s) for d, s in zip(input_dtypes, input_shapes)]

    a, b = data.draw(st.tuples(*array_st))
    result = numpy.matmul(a, b)
    assert result.shape == expected_shape
    assert result.dtype.char == expected_dtype

@Zac-HD Wow, that is amazing!