How is the sequence ID calculated in an efficient manner
Opened this issue · 1 comments
eric-jm-lang commented
Hello,
In your excellent paper, a key asspect used is the sequence identity between the artificial and any known natural sequences.
May I ask how this sequence identity is calculated in an effective manner? As it requires to screen all the databases for each sequences.
Many thanks in advance
jeffreyruffolo commented
These values are calculated using the MMseqs2 tool to find the closest matches between the generated sequences and the protein databases. We report the identity to the top database hit for each generated sequence.