PythonOT/POT

fgw barycenter often returns matrix with same entries

Closed this issue · 11 comments

Describe the bug

fgw_barycenters function often ( more than half of my tests) returns a matrix whose entries are the same. Is this considered to be the normal behavior ? In the case of the application on graphs, I always get a complete graph with self loops.

Screenshots

image

Code sample

n = 50
g1 = nx.erdos_renyi_graph(n,0.5)
g2 = nx.erdos_renyi_graph(n,0.5)

c1 = nx.adjacency_matrix(g1).toarray()
c2 = nx.adjacency_matrix(g2).toarray()

_,bary1 = fgw_barycenters( n , [[[1]]*n]*n , [c1,c2] , alpha = 0.5)

print(bary1)

Expected behavior

To have a non homogeneous matrix more often.

Environment (please complete the following information):

  • OS (e.g. MacOS, Windows, Linux):MacOs
  • Python version: Python 3.11.5
  • How was POT installed (source, pip, conda):pip

Output of the following code snippet:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import ot; print("POT", ot.__version__)

macOS-12.5.1-arm64-arm-64bit
Python 3.11.5 (main, Aug 24 2023, 15:09:32) [Clang 14.0.0 (clang-1400.0.29.202)]
NumPy 1.26.0
SciPy 1.11.3
POT 0.9.1

Additional context

I tried to test it with different values of alpha , even 1 which completely ignores the feature. I also tested with different features (not only all ones) and I still have the same issue. I suppose this has to do with the fact that it can converge to a local minima but I don't think it is supposed to happen this often.

I did not really know if this should have been reported as an issue or as a discussion post.

Hello @youssef62,
I believe it is exactly the same principle than in your previous opened issue #530 ...

Hello @youssef62, I believe it is exactly the same principle than in your previous opened issue #530 ...

I am not sure of that because i don't get integers as output this time.
I also still get the same results even after adding astype(np.float64) to c1 , c2 (even with float features also).

By default, I would advise to set input structure and feature matrices as np.array with float types. To be sure that you do not have degenerated intermediate transport plans because of the type. Then, indeed it can be a matter of poor local optimum because of poor initialization of the barycenter.

  • If you are purely interested in the structure matrices, I do not see the point of using the FGW solver especially without an adequate alpha. Results will clearly be sensitive to the alpha value c.f experiments in the original FGW paper.
  • I would advise to provide meaningful random initialization to init the barycenter structure. For instance, could you check how behaves the fgw loss when providing init_C whose entries are uniformly sampled in [0, 1] and then set symmetric ? And also to have a thoughtful evaluation of the function outputs by using log=True and computing the resulting (f)gw loss.

Thank you for your answer and tips.

  • I do care about features. I just omitted them to showcase the bug.
  • I modified the sample to fit your explanations and the same thing still happens.
    Here's the code:
n = 5
g1 = nx.erdos_renyi_graph(n,0.5)
g2 = nx.erdos_renyi_graph(n,0.5)

c1 = nx.adjacency_matrix(g1).toarray().astype(np.float64)
c2 = nx.adjacency_matrix(g2).toarray().astype(np.float64)


init_c = np.random.random((n,n)).astype(np.float64)

f = np.random.random((2,n,2)).astype(np.float64)

_,bary1,log = fgw_barycenters( n , f, np.array([c1,c2]) , alpha = .5 , init_C= init_c , symmetric=True , log=True)


print(bary1)
print(log)

the ouput is

[[0.36 0.36 0.36 0.36 0.36]
 [0.36 0.36 0.36 0.36 0.36]
 [0.36 0.36 0.36 0.36 0.36]
 [0.36 0.36 0.36 0.36 0.36]
 [0.36 0.36 0.36 0.36 0.36]]
{'err_feature': [1.8001614649348003, 0.0], 'err_structure': [1.7216815043238423, 0.0], 'Ts_iter': `...`

for 'Ts_iter' I have 3 matrices with 0.04 in all the entries , for the transport map 'T' it is also a matrix with only 0.04. For 'Ms' i get

Ms: [array([[0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
       [0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
       [0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
       [0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119],
       [0.04697823, 0.04457876, 0.03974605, 0.48802576, 0.12923119]]), array([[0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
       [0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
       [0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
       [0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472],
       [0.15718075, 0.1456845 , 0.0750637 , 0.1852994 , 0.24760472]])]

I don't think this is particularly useful to you.
Can you provide me with a sample where this function works as expected for you ( multiple times in preference ) ? So I can test it locally.
Thanks in advance

Hello @youssef62,

Thank you for your feedback.
After doing my own experiments, better default settings for these functions should clearly be considered to avoid poor local optimum.
i) we set by default the features of the FGW barycenter to zero making the first computations of FGW distances independent to the barycenter features. For your own experiments I advise you to set init_X different from 0, e.g matrix of ones or uniformly sampled in the range of obversed features. A good option used in papers, if affordable, would be to perform kmeans on concatenated input features with k being the number of nodes in the barycenter.
ii) 'warmstart=False' clearly seem to have a negative impact of the solver. I advise you to set 'warmstart=True'.
I will soon fix this in a PR.

Remark: you should not provide an asymmetric matrix init_C while setting 'symmetric=True'.

Best,
Cédric

I like the ability to use kmeans init, we should do thator at least provide this as possible init!

I ran some more tests and this issue comes up much less in version 0.8.2 than in 0.9.1 ( approximately half the of the time as exposed to always , using the code above ).

We did some debugging of optimization around 0.9 so it should work better (maybe it staus stuck in initilization place more because of this), @cedricvincentcuaz we need to look into this please (in addition to the new init).

Hello @youssef62 and @rflamary,

So building on the previous examples, I compared in many settings the GW and FGW barycenter solvers in both versions 0.82 and 0.91.
In all cases, there can be slight differences in solvers' results between both versions because of different rounding errors. This is because we removed some unnecessary computations in fgw and gw solvers after 0.9 that considerably increased speed. This is the only difference between both versions when compared with exactly the same hyper-parameters, affecting both gw and fgw barycenter solvers. And can randomly allows to avoid to be stuck at the 1st BCD iteration if a local optimum is provided at initialization.

potential Improvements:

  • In both versions, 'fgw_barycenter' overrides the initialization 'init_C' and 'init_X' within the first iteration of the BCD loop. By computing the closed-forms over the barycenter structure $C(T_0)$ and feature $F(T_0)$ using non-informative transport plans $T_0 = p.q_i^\top$. It makes the fgw solver using $T_0$ as init stuck at the first step (checked on adjacency matrices only). Plus, this is not consistent with the 'gromov_barycenters' solver that uses 'init_C' within the calls to the gw solvers at the first BCD iteration. So I suggest to change that in the 'fgw_barycenter' solver.
  • It is not a bug but I find that the log of (f)gw barycenters is a bit mis-leading. To not change the API, we should add the value of the objective function at each step of the BCD.

Thanks @cedricvincentcuaz , I think your analysis is spot on. The way it is now is that init_C is completely ignored because we update C with T and T is initialised independently from init_C.

In my tests , locally I updated T before C so the new T takes init_C into account. I put the following block :

if warmstartT:
            T = [fused_gromov_wasserstein(
                Ms[s], C, Cs[s], p, ps[s], loss_fun=loss_fun, alpha=alpha, armijo=armijo, symmetric=symmetric,
                G0=T[s], max_iter=max_iter, tol_rel=1e-5, tol_abs=0., verbose=verbose, **kwargs) for s in range(S)]
        else:
            T = [fused_gromov_wasserstein(
                Ms[s], C, Cs[s], p, ps[s], loss_fun=loss_fun, alpha=alpha, armijo=armijo, symmetric=symmetric,
                G0=None, max_iter=max_iter, tol_rel=1e-5, tol_abs=0., verbose=verbose, **kwargs) for s in range(S)]
       

Before this one :

if not fixed_structure:
            T_temp = [t.T for t in T]
            if loss_fun == 'square_loss':
                C = update_square_loss(p, lambdas, T_temp, Cs)

            elif loss_fun == 'kl_loss':
                C = update_kl_loss(p, lambdas, T_temp, Cs)

And then I did some experiments and the bug seems to be gone now.
I think it makes sense to do that for features too.

The PR has been merged, I'm closing the issue, thanks to you both.