This is project is about getting insight from the protein data using visualization. We thank our client Laura Garrison for trusting us with their data.
The protein dataset consists of 9
columns and 8,354
rows. The following are the attributes of the data:
UniAcc: Protein ID in UniProt database (this is just a unique identifier for the protein for a given species, e.g., human).
RES: Amino acid residues - there are 20
different amino acids, each have a unique single letter ID.
POS: The location ID of a given residue in a given protein.
MOD: Specific modification that is occurring at that site.
Entry: Similar to UniAcc except more human-readable, identifies the gene+species of protein together as one string.
Gene: Name of gene that encodes the resulting protein.
Species: Species that protein comes from, e.g., human, mouse.
Classification: High-level classification of the type of modification that is occurring at the site. This is a more coarse-grained classification relative to MOD attribute.
PathogenicMutation: Boolean. If True, means that this site is associated with a disease.
The link to the dataset is here
-
Zahra, Iran
-
Yeganeh, Iran
-
Leo, Argentina
-
Ezekiel, Nigeria
The link to the presentation deck is here.