/ICTP_workshop

Primary LanguagePythonMIT LicenseMIT

Visualizing Protein Residue Modifications

This is project is about getting insight from the protein data using visualization. We thank our client Laura Garrison for trusting us with their data.

The dataset

The protein dataset consists of 9 columns and 8,354 rows. The following are the attributes of the data:

UniAcc: Protein ID in UniProt database (this is just a unique identifier for the protein for a given species, e.g., human).

RES: Amino acid residues - there are 20 different amino acids, each have a unique single letter ID.

POS: The location ID of a given residue in a given protein.

MOD: Specific modification that is occurring at that site.

Entry: Similar to UniAcc except more human-readable, identifies the gene+species of protein together as one string.

Gene: Name of gene that encodes the resulting protein.

Species: Species that protein comes from, e.g., human, mouse.

Classification: High-level classification of the type of modification that is occurring at the site. This is a more coarse-grained classification relative to MOD attribute.

PathogenicMutation: Boolean. If True, means that this site is associated with a disease.


The link to the dataset is here

Project closure

During project closing phase

Meet our team

  • Zahra, Iran

  • Yeganeh, Iran

  • Leo, Argentina

  • Ezekiel, Nigeria

Presentation deck

The link to the presentation deck is here.