Dependency free, Cython-compatible scoring matrices to use with biological sequences.
Scoring Matrices are matrices used to score the matches and mismatches between two characters are the same position in a sequence alignment. Some of these matrices are derived from substitution matrices, which uses evolutionary modeling.
The scoring-matrices
package is a dependency-free, batteries included library
to handle and distribute common substitution matrices:
- no external dependencies: The matrices are distributed as-is: you don't need the whole Biopython ecosystem, or even NumPy.
- Cython compatibility: The
ScoringMatrix
is a Cython class that can be inherited, and the matrix data can be accessed as either a raw pointer, or a typed memoryview. - most common matrices: The package distributes most common matrices, such as those used by the NCBI BLAST+ suite, including:
scoring-matrices
can be installed directly from PyPI,
which hosts some pre-built wheels for the x86-64 architecture (Linux/OSX/Windows)
and the Aarch64 architecture (Linux/OSX), as well as the code required to
compile from source with Cython:
$ pip install scoring-matrices
Otherwise, scoring-matrices
is also available as a Bioconda
package:
$ conda install bioconda::scoring-matrices
- Import the
ScoringMatrix
class from the installed module:from scoring_matrices import ScoringMatrix
- Load one of the built-in matrices:
blosum62 = ScoringMatrix.from_name("BLOSUM62")
- Get individual matrix weights either by index or by alphabet letter:
x = blosum62[0, 0] y = blosum62['A', 'A']
- Get a row of the matrix either by index or by alphabet letter:
row_x = blosum62[0] row_y = blosum62['A']
- Access the matrix weights as raw pointers to constant data:
from scoring_matrices cimport ScoringMatrix cdef ScoringMatrix blosum = ScoringMatrix.from_name("BLOSUM62") cdef const float* data = blosum.data_ptr() # dense array cdef const float** matrix = blosum.matrix_ptr() # array of pointers
- Access the
ScoringMatrix
weights as a typed memoryview using the buffer protocol in more recents versions of Python:from scoring_matrices cimport ScoringMatrix cdef ScoringMatrix blosum = ScoringMatrix.from_name("BLOSUM62") cdef const float[:, :] weights = blosum
Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.
Contributions are more than welcome! See
CONTRIBUTING.md
for more details.
This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.
This library is provided under the MIT License. Matrices were collected from the MMseqs2, Biopython and NCBI BLAST+ sources and are believed to be in the public domain.
This project was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller team.