Family 7 cellulases (Cel7s), or glycoside hydrolases (GH7s), are principal enzymes for cellulose degradation, both in nature and in industry. In this work, machine learning (ML)is applied to relate the amino acid sequence of GH7s to function by identifying key sequence features utilized by the ML algorithms that correlate with functional subtypes.
The strategies utilized in this work may be adapted to uncover sequence-function relationships in other protein families.
: Use hidden Markov models (HMM) to discriminate GH7 functional subtypes (CBH vs EG)
: Use supervised machine learning to discriminate GH7 functional
: Derive position-specific classification rules for discriminating GH7 functional
: Supervised ML to predict the presence of carbohydrate binding modules (CBM) in GH7s.
: contains adhoc functions for bioinformatic
: for analyzing results and plotting the figures in the manuscript
If you find this work useful, please cite this paper:
Gado, J.E., Harrison, B.E., Sandgren, M., Ståhlberg, J., Beckham, G.T., and Payne, C.M. Machine learning reveals sequence-function relationships in family 7 glycoside hydrolases. Submitted to FEBS (2020).