gem-pasteur/macsyfinder

[BUG] miscalculation of alignment coverage

Closed this issue · 8 comments

Hi, I encounter an error with DefenseFinder. I don't know if it's truly related to DefenseFinder or to MacSyFinder. So I cite the issue here.
issue#68

Hi Jérôme,

I don't know either if it is DefenseFinder or MacSyFinder.

Did you ever encounter this kind of error when using MacSyFinder? For instance when using TXSScan or CasFinder? I will investigate if you do. In that case, it would be nice if you can send us an example.

Best,

Sophie

Hi,
When I used the --preserve_raw option of DF, I could look at the res_hmm_extract file from MSF.
So now I think I figured it out. The problem is coming from the HMM of DF. So, if I'm right, MSF is not in cause here.

Hello, I reopen the issue here, since it comes from how macsyfinder compute the coverage of the profile when there are multiple profile within a .hmm file.

see mdmparis/defense-finder#68 (comment) (second part) to have an exemple of the problem.

The bug is likely here :

def get_profile_len(path: str) -> int:
"""
Parse the HMM profile to extract the length and the presence of GA bit threshold
:param path: The path to the hmm profile used to produce the hmm search output to analyse
:return: the length, presence of ga bit threshold
"""
with open(path) as file:
for line in file:
if line.startswith("LENG"):
length = int(line.split()[1])
break
return length

where the length of the profile is the first encountered in the file, and not associated with a given profile.

the problem is that you provide several profile in one file while msf expected one profle per file
This profile MUST have the same name as the name of the gene mentioned in the definition. For instance, a component named “GeneA” in the macsy-model would correspond by default to a HMM profile “GeneA.hmm” enclosed in the macsy-model package.

Thank @bneron.
I arrived to the same conclusion.

@jeanrjc here is the documentation for model profile: https://gensoft.pasteur.fr/docs/macsyfinder/2.1.1/modeler_guide/modeling.html#providing-hmm-profiles

ah thanks we missed this part of the doc, we'll update our profiles. Maybe there could be a check somewhere to prevent this error ? Like when pushing on macsydata, or when running macsyfinder ?

I add some controls in command macsydata check <path to package>
now I check if

  • profile file is not empty
  • there is only one profile per hmm file

This feature is available on master version and will be in the next release.

Consider to use macsydata init to create a package skeleton.
From msf v 2.1.4, this command also install a pre-push git hook that run macsydata check and prevent to publish bad package.
To use this version of msf pip install --upgrade macsyfinder[model]
for older version of package you can install manually the pre-push git hoot (see msf documentation)