Geometry
Closed this issue · 9 comments
The handling of the Geometry
object within the embeddings needs some work & some consistency. I've done a bit of tweaking in 3fcc3d8, but there are still problems. In particular:
- we need a way to instantiate the estimator without passing the data.
- we should be able to pass a dictionary of geometry properties rather than a geometry object
- each embedding algorithm seems to treat missing geometries differently. Some call
fit_geometry
, some use a manual default, and some fail (I've fixed some of this in 3fcc3d8)
@jakevdp how should the init() geometry function handle the **kwargs for the adjacency, affinity & laplacian methods? e.g
def init(self,
adjacency_method='auto', adjacency_ags = None,
affinity_method='auto', adjacency_ags = None,
laplacian_method='auto', adjacency_ags = None):
or
def init(self,
adjacency_method='auto', affinity_method='auto', laplacian_method='auto',
**kwargs):
Edit: we shouldn't pass a nbhd radius or affinity radius directly either since they will depend on the adjacency and affinity (for example if it's number of neighbors instead of radius).
In a similar vein with respect to the following situation:
- Initialize geometry object, pass data matrix
- call get_adjacency_matrix(distance_method, **kwargs)
- do something else
- call get_adjacency_matrix(distance_method, **kwargs) again
Since kwargs can be of an arbitrary length I don't think it makes sense to store these and then check and see if they are the same so that if someone calls get_adjacency_matrix() twice then it calculates twice -- even if they have the same arguments. We should therefore encourage users to use the attribute geom.adjacency_matrix to get the existing matrix and only call get_adjacency_matrix() only when they want to re-calculate it.
Same with affinity & Laplacian. This simplifies things I think.
My thought would be:
def init(self, adjacency_method='auto', adjacency_kwds=None,
affinity_method='auto', adjacency_kwds=None,
laplacian_method='auto', adjacency_kwds=None):
Regarding the computation of the matrix... how about we change get_adjacency_matrix
to compute_adjacency_matrix
, so that it's more explicit? Then we could use an adjacency_matrix
attribute (or perhaps a property) to access the already-computed value.
I agree. One more quick questions. If I outline it like this:
def __init__(self, adjacency_method = 'auto', adjacency_kwds=None,
affinity_method = 'auto', affinity_kwds=None,
laplacian_method = 'auto', laplacian_kwds=None):
[...]
def compute_adjacency_matrix(self, copy = False, **kwargs):
[...]
self.adjacency_matrix = compute_adjacency_matrix(self.X, self.adjacency_method,
self.adjacency_kwds, **kwargs)
will the self.compute_adjacency_matrix work as intended passing both the adjacency_kwds passed at initialization AND the **kwargs passed to self.compute_adjacency_matrix() itself?
No, I think you have a danger of getting a repeated argument there. I'd do something like this:
kwds = self.adjacency_kwds.copy()
kwds.update(kwargs)
then pass **kwds
to the function.
(in Python 3.5 you could do kwds = {**self.adjacency_kwds, **kwargs}
but we need backward compatibility for the time being 😄)
I figured as much. Thanks again Jake! Things are going well and look much cleaner. I should be able to push the pull request tonight.
Awesome! I'm working on the other parts of the refactor, and I think I'll have it done tonight as well. Tomorrow we can work on merging our two sets of changes.
@jakevdp just an FYI the following functions inside test_geometry:
test_laplacian_unknown_normalization
test_laplacian_create_A_sparse
test_equal_original
are testing affinity/laplacian methods.
re-factor done.