The Quadtree is a gradient-boosted decision tree model used to predict guanine quadruplexes in DNA sequences. It's developed on top of the LightGBM python library. Each sequence base is encoded based on a given encoding prescription. The model was trained to be used with a sliding window and analyses the whole sequence. Machine learning model can be used as python script or thru preview website quadtree.vercel.app
quadtree
└─ web -> preview website source code
└─ python
└─ model -> lightgbm model params
└─ train -> example files how training was performed
└─ quadtree.py -> predictor
- lightgbm==3.3.2
- numpy==1.21.2
Before using install the requirements:
pip install -r requirements.txt
from quadtree import Quadtree
model = Quadtree()
- sequence as a string (maximum length is not limited)
- threshold (recommended values is 0.2)
- quadnet model file path
result = quadtree.analyse(
sequence='ATTAATACTTTTAACAATTGTAGTATATAAAAAAGGGAGTAACC...',
model_path='/path/to/quadnet_model.txt',',
score_threshold=0.1
)
Results are then returned in given form which can be loaded into pandas DataFrame.
import pandas as pd
df = pd.DataFrame(result)
index | position | sequence | length | |
---|---|---|---|---|
0 | 0 | 907 | GCAACAATGGCTGATCCAGAAGGTACAGACGGGGAGGGCACGGGTTGTAACGGCTGGTTTTATGTACAAGCTATTGTAGACAAAAAAACAGGAGATGTAATATCA | 105 |
1 | 1 | 1184 | GAGGCAGCACAGAAAACAGTCCATTAGGGGAGCGGCTGGAGGTGGATACAGAGTTAAGTCCACGGTTACAAGAAATATCTTTAAATAGTGGGCAGA | 96 |
2 | 2 | 1389 | ATGTAGTGGCGGCAGTACGGAGGCTATAGACAACGGGGGCACAGAGGGCAACAACAGCAGTGTAGACGGTACAAGTGACAATAGCAATATAGAAAATGTAAATCCAC | 107 |
3 | 3 | 1635 | AGATTGGGTTACAGCTATATTTGGAGTAAACCCAACAATAGCAGAAGGATTTAAAACACTAATACAGCCATTTAT | 75 |
4 | 4 | 2229 | AATAGATGAAGGGGGAGATTGGAGACCAATAGTGCAATTCCTGCGATACCAACAAATAGAGTTTATAACATTTTTAG | 77 |
These parameter were used to train lightgbm model
LGBM Classifier | value |
---|---|
colsample bytree | 0.817574864502621 |
learning rate | 0.03744835808549148 |
max bin | 127 |
min child sample | 3 |
number of estimators | 1000 |
number of leaves | 74 |
regularization alpha | 0.0033803043003857677 |
regularization lambda | 0.7013136087939289 |
objective | binary |
- Patrik Kaura - Main developer - patrikkaura
This project is licensed under the MIT License - see the LICENSE file for details. # quadtree