Comparative Analysis of the Interactome of both β-Arrestin Isoforms
Conformational flexibility of β-arrestins – how these scaffolding proteins guide and transform the functionality of GPCRs
Raphael Silvanus Haider*[1,2] , Mona Reichel*[1], Edda Sofie Fabienne Matthees[1], Carsten Hoffmann[1]
[1]: Institut für Molekulare Zellbiologie, CMB – Center for Molecular Biomedicine, Universitätsklinikum Jena, Friedrich-Schiller-Universität Jena, Hans-Knöll Straße 2, D-07745 Jena, Germany
[2]: Division of Physiology, Pharmacology and Neuroscience, School of Life Sciences, Queen's Medical Centre, University of Nottingham, Nottingham, UK
[*] contributed equally
This code was written and used to perform the comparative interactome analysis of β-arrestin1 and 2 found in DOI.
Resource data
Details about enriched GO term clustering
Short description
Interacting proteins were retrieved from STRINGdb 11.5 (Szklarczyk et al. 2018). GO enrichment analysis was performed using the online tool Database for Annotation, Visualization and Integrated Discovery (DAVID) 6.8 (Huang et al. 2009, Sherman et al. 2022). Enriched GO terms were clustered and visualized using the R/Bioconductor package simplifyEnrichment (Gu & Huebschmann 2022).
Workflow
1. Retrieving interacting proteins of β-arrestin1 and 2 from STRINGdb
get_stringDB.py
- retrieves all proteins interacting with β-arrestin1 or 2 from STRINGdb using STRINGdb API
- entries with confidence score < 0.5 are removed
- returns xlsx file with ENSP of interacting proteins, confidence score of this interaction and the "uniqueness" of the interaction (whether this interaction is unique to one of the β-arrestin isoforms)
reformat_stringPPIdf.py
- splits xlsx file based on the "uniqueness" column
- returns xlsx files for β-arrestin1, β-arrestin2 and interactors of both isoforms
(Note, that proteins interacting with both isoforms are found twice in these exports)
2. Translating ENSP IDs to uniprot accession number
ENSP_to_uniprotID.py
- translates ENSP IDs exported from STRINGdb to uniprot accession number using the uniprot ID mapping tool via the API
- Note, that order of fetched IDs do not match order of requested IDs - to prevent mistranslation, the script creates a translation dictionary which is then applied to the imported ENSP IDs
- returns input xlsx with additional column containing uniprot accession IDs
3. Extract information from uniprot entries
extract_uniprot_info.py
- uniprot accession number duplicates in input table are removed (which are all proteins interacting with both β-arrestins as described above)
- using the uniprot accession number, selected information from each uniprot entry is retrieved via APIs (to test how certain information can be accessed from xml entry, test.xml was created)
- according to the uniprot Keyword 'G-protein coupled receptor', resulting table is separated into interactors
which are GPCRs and non-GPCRs
-
these xlsx files are part of this repository:
interactors_stringDB_ID_gpcrs.xlsx, interactors_stringDB_ID_nogpcrs.xlsx
-
GO_list.py
is unused for analysis, but gave an initial overview
- retrieves GO terms of all three ontologies from uniprot using APIs
- lists each GO term separately with corresponding proteins
4. GO enrichment analysis via DAVID
GO enrichment analysis of the biological process ontology was performed using the online tool DAVID 6.8. Uniprot accession numbers of non-GPCR proteins obtained from extract_uniprot_info.py were used as input.
In detail, functional annotation analysis was performed using the uniprot accession IDs of non-GPCR proteins interacting with exclusively one β-arrestin isoform (gene list input) against all non-GPCR interacting proteins of this isoform (background).The functional annotation chart created by DAVID was downloaded and used in the last step visualizeEnrichment.R.
5. Clustering and visualization of enriched GO terms
visualizeEnrichment.R
-
uses functional annotation chart created by DAVID 6.8 as input
-
enriched GO terms for β-arrestin1-only or β-arrestin2-only interacting proteins are clustered according to their similarity and displayed as heatmap using
simplifyGOFromMultipleLists()
of the R/Bioconductor package simplifyEnrichment (Gu & Huebschmann 2022) -
GO terms found in each cluster are exported as xlsx file
- part of this repository: GO_in_clusters_list.xlsx
-
additionally, proteins which contributed to each cluster due to their GO term assignment are exported as xlsx file
- these files are part of this repository: proteins_in_clusters_bArr1.xlsx, proteins_in_clusters_bArr2.xlsx
(Note, that one protein can contribute to several clusters as each protein has several GO terms assigned)
This project is licensed under the terms of the MIT license.