The python script to encode protein sequences using different encoding methods.
- Python 2 or 3
- json
- argparse
./SequenceEncoding.py -seq="XXXXXX" [options]
positional arguments:
seq The protein sequence to be encoded.
optional arguments:
-h, --help show this help message and exit
--encoding_type ENCODING_TYPE
The encoding type. [One_hot, One_hot_6_bit,
Binary_5_bit, Hydrophobicity_matrix,
Meiler_parameters, Acthely_factors, PAM250, BLOSUM62,
Miyazawa_energies, Micheletti_potentials, AESNN3,
ANN4D, ProtVec].
Default: One_hot.
--overlap The word extraction strategy for ProtVec encoding: overlap.
--non-overlap The word extraction strategy for ProtVec encoding: non-overlap.
Xiaoyang Jing, Qiwen Dong, Daocheng Hong, Ruqian Lu. Amino acid encoding methods for protein sequence: a comprehensive review and assessment. Submitted, 2019.