Visualizing Protein Residue Modifications

This is project is about getting insight from the protein data using visualization. We thank our client Laura Garrison for trusting us with their data.

The dataset

The protein dataset consists of 9 columns and 8,354 rows. The following are the attributes of the data:

UniAcc: Protein ID in UniProt database (this is just a unique identifier for the protein for a given species, e.g., human).

RES: Amino acid residues - there are 20 different amino acids, each have a unique single letter ID.

POS: The location ID of a given residue in a given protein.

MOD: Specific modification that is occurring at that site.

Entry: Similar to UniAcc except more human-readable, identifies the gene+species of protein together as one string.

Gene: Name of gene that encodes the resulting protein.

Species: Species that protein comes from, e.g., human, mouse.

Classification: High-level classification of the type of modification that is occurring at the site. This is a more coarse-grained classification relative to MOD attribute.

PathogenicMutation: Boolean. If True, means that this site is associated with a disease.

The link to the dataset is here

Project closure

Meet our team

Zahra, Iran
Yeganeh, Iran
Leo, Argentina
Ezekiel, Nigeria

Presentation deck

The link to the presentation deck is here.

yeganeh1212/ICTP_workshop

Visualizing Protein Residue Modifications

The dataset

Project closure

Meet our team

Presentation deck