/scoring-matrices

Dependency free, Cython-compatible scoring matrices to use with biological sequences.

Primary LanguageCythonMIT LicenseMIT

🧬🔠 scoring-matrices Stars

Dependency free, Cython-compatible scoring matrices to use with biological sequences.

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source Issues Docs Changelog Downloads

🗺️ Overview

Scoring Matrices are matrices used to score the matches and mismatches between two characters are the same position in a sequence alignment. Some of these matrices are derived from substitution matrices, which uses evolutionary modeling.

The scoring-matrices package is a dependency-free, batteries included library to handle and distribute common substitution matrices:

  • no external dependencies: The matrices are distributed as-is: you don't need the whole Biopython ecosystem, or even NumPy.
  • Cython compatibility: The ScoringMatrix is a Cython class that can be inherited, and the matrix data can be accessed as either a raw pointer, or a typed memoryview.
  • most common matrices: The package distributes most common matrices, such as those used by the NCBI BLAST+ suite, including:
    • PAM matrices by Dayhoff et al. (1978).
    • BLOSUM matrices by Henikoff & Henikoff (1992).
    • VTML matrices by Muller et al. (2002).
    • BENNER matrices by Benner et al. (1994).

🔧 Installing

scoring-matrices can be installed directly from PyPI, which hosts some pre-built wheels for the x86-64 architecture (Linux/OSX/Windows) and the Aarch64 architecture (Linux/OSX), as well as the code required to compile from source with Cython:

$ pip install scoring-matrices

Otherwise, scoring-matrices is also available as a Bioconda package:

$ conda install bioconda::scoring-matrices

💡 Usage

Python

  • Import the ScoringMatrix class from the installed module:
    from scoring_matrices import ScoringMatrix
  • Load one of the built-in matrices:
    blosum62 = ScoringMatrix.from_name("BLOSUM62")
  • Get individual matrix weights either by index or by alphabet letter:
    x = blosum62[0, 0]
    y = blosum62['A', 'A']
  • Get a row of the matrix either by index or by alphabet letter:
    row_x = blosum62[0]
    row_y = blosum62['A']

Cython

  • Access the matrix weights as raw pointers to constant data:
    from scoring_matrices cimport ScoringMatrix
    
    cdef ScoringMatrix blosum = ScoringMatrix.from_name("BLOSUM62")
    cdef const float*  data   = blosum.data_ptr()    # dense array
    cdef const float** matrix = blosum.matrix_ptr()  # array of pointers
  • Access the ScoringMatrix weights as a typed memoryview using the buffer protocol in more recents versions of Python:
    from scoring_matrices cimport ScoringMatrix
    
    cdef ScoringMatrix     blosum  = ScoringMatrix.from_name("BLOSUM62")
    cdef const float[:, :] weights = blosum

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the MIT License. Matrices were collected from the MMseqs2, Biopython and NCBI BLAST+ sources and are believed to be in the public domain.

This project was developed by Martin Larralde during his PhD project at the Leiden University Medical Center in the Zeller team.