This work is to update and upgrade SWAAT. SWAAT is a workflow that uses structural properties of missense variants to predict the likely outcome (deleterious, benign). ADME genes are those involved in key facets of drug metabolism. Variants in these genes may result in adverse reactions, which may be related to drug safety or efficacy. Classifying variants in terms of protein impact (Deleterious, Neutral, Benign) could be used to help prevent adverse reactions or assist in drug safety protocols.
Regarding upgrading - there are three main areas to be addressed.
SWAAT uses depreciated Transvar to map genomic to protein coordinates Task - Match the amino acids in a sequence of a human protein to their genome coordinates.
We needed an output that looks like this:
AA_ID chr AA start end
1 6 R 12321 12323
2 6 H 12324 12326
Given a ensembl proteinID, we have developed an R script to retrieve the amino acid sequence of that protein, convert that to a dataframe and add the numerical sequence. With this sequence, we can query for the genomic coordinate (GRch38) and add it to that dataframe, and output a csv file.
Example of output for CYP2D6 (GRch38)
"ID","RefAminoAcid","CHR","start","end"
1,"M",22,42130789,42130791
2,"G",22,42130786,42130788
3,"L",22,42130783,42130785
Next steps - we can retrieve the seq - we can use this to create a table of position and AA which can match to the coords retrieved
SWAAT uses a random forest machine learning model to classify variants based on structural features. While this model outperforms genomic classifiers, it could still be improved in terms of accuracy and specificity.
Investigate other ML methods to improve SWAAT Predictions, code is hosted currently in this R script.
naive bayes classifier and NBC with PCA Multiclass modeling (splitting data into categories)
NB:: The accuracy for the multiclass is quite low (0.5861) more effort should be channelled here to get a better accuracy. Tried to improve w PCA - Accuracy : 0.602
Next Steps: There are some issues with data transformation, specifically regarding negative integers as part of data to assess.
Generate a basic version of a web app. VCF input > treatment at the back-end (SWAAT) > report (rendered)/raw data cvs/tsv
A basic HTML/CSS code for the user interface (front end) has been built. Back end, deploying a Flask app to handle input and run SWAAT.
Next steps:
Houcem Othman
Isabel Mensah
Tsaone Tamuhla
Yagoub Adam
Jorge da rocha
Kais Ghedira
Yaniv Swiel