chembl/FPSim2

similarity matrix id don't match smi ids.

Nick-Mul opened this issue · 1 comments

Hello, I really like the package but recently started to play around with the similarity matrix, which I've change into a edge list using networkx. However I've been finding that the id in the edge list don't make sense and the ids have got scrambled somewhere along the line... for example two molecules that are meant to be identical in the edge list are very clearly different if I look up the id in the smiles file that the .h5 file was generated from.

sorry for the slow reply. When the h5 file is created the fps are sorted by the number of "on bits" each molecule fingerprint has. To get the ids in the new order:

ids = fpe.fps[:, 0]

This is explained in the docs: https://chembl.github.io/FPSim2/source/user_guide/sim_matrix.html

closing the issue but feel free to reopen it in case of more doubts