The features extracted for the database of membrane protein dimer analysis can be replicated through this repository.
-
INSTALLATION REQUIREMENTS: First you will need to run
python setup.py
in your terminal to install all the dependencies necessary for feature extraction. Excluded from these dependencies are PSI-Blast and AutoDockTools that you need to install independently. Additionally the non-redundant (nr) database must be downloaded from NCBI (https://ftp.ncbi.nlm.nih.gov/blast/db/). -
Before feature extraction, you should perform a PRE-PROCESSING of the PDB files. For that you need to:
- Trim non-transmembrane residues
- Remove heteroatoms
- Mutate exotic amino acids
- Model incomplete structures
- Dimer extraction from the structure Files
- Add hydrogens
To see additional details in how to perform data pre-processing, please see our review - "Structural Characterization of Membrane Protein Dimers" published in Methods in Molecular Biology - Protein Supersecondary Structures (https://www.springer.com/us/book/9781493991600).
- run.py deploys all the below features as well as the needed libraries to attain the output files. It will look for information in the intermediate file mensadb_fetcher.py. To attain all the features run:
python run.py [pdbid] [chains]
Example:
python run.py 1a0t PQ
-
dssp_features.py extracts the features from a dssp output file. Also requires the corresponding pdb file. To attain the dssp output file use the DSSP executable and run:
dssp -i [pdb_name.pdb] >[output_name.txt]
, in windows, ormkdssp -i [pdb_name.pdb] > [output_name.txt]
, in UNIX based operating systems. To attain DSSP features, you can runpython dssp_features.py
, obtaining the following:- DSSP index
- Amino acid number
- Amino acid code
- Chain
- Secondary Structure
- BP
- ASA
- NH-->O_1_relidx
- O-->NH_1_relidx
- NH-->O_1_energy
- O-->NH_1_energy
- TCO
- KAPPA
- Alpha
- Phi
- Psi
- X-CA
- Y-CA
- Z-CA
-
features_pssm.py extracts the pssm "jsd" features from psi-blast output file. To retrieve the pssm files needed you will require the psiblast local installation, the non-redundant (nr) database and your input file, with this, run:
psiblast -query [fasta_file.fasta] -evalue 0.001 -num_iterations 3 -db [nr] -outfmt 5 -out pssm_output_name.txt -out_ascii_pssm [output_name.pssm] -num_threads 6"
. Running this step can be very time-consuming, depending on the computer and the protein. To attain PSSM "jsd" features output, you can run:python features_pssm.py
. -
process_binana.py extracts the features from the BINding ANAlyser output file (BINANA - to download go to http://rocce-vm0.ucsd.edu/data/sw/hosted/binana/#download). To attain the BINANA output, you can run:
python binana_1_2_0.py -receptor /path/to/receptor.pdbqt -ligand /path/to/ligand.pdbqt -output_file /path/to/output.pdb
, as stated in the website of this software. To use this command, you will need their binana_1_2_0.py script, as well as the ".pdbqt" input files. To attain the selected features from the BINANA output, you can run:python process_binana.py
. A single csv will be written for each of the possible features. These features are related to a dimer, specifically.- Below 2.5 Angstrom residues
- Below 4 Angstrom residues
- Hydrogen Bonds
- Hydrophobic contacts
- Pi-Pi bond stack
- T - stack
- Cation - Pi interaction
- Salt-bridges
-
generate_class.py uses vmd to extract the interfacial and surface classification for each residue. Makes use of 5 other scripts that are located on the "mensa_class" folder. To use these scripts is required the installation of python based vmd. This can be done with:
conda install -c conda-forge vmd-python
. The whole code can be run withgenerate_outputs(input_pdb).joint_call(autodock, autodock_2)
. Check the path list and replace with your locations. The possible classes are:- non-interface and non-surface: 0
- non-interface and surface: 2
- interface and surface: 3
References
- Preto A.J., Matos-Filipe P., Koukos P.I., Renault P., Sousa S.F., Moreira I.S. (2019) Structural Characterization of Membrane Protein Dimers. In: Kister A. (eds) Protein Supersecondary Structures. Methods in Molecular Biology, vol 1958. Humana Press, New York, NY
Please cite
- Matos-Filipe P., Preto A.J., Koukos P.I., Mourão J., Bonvin A.M.J.J., Moreira I.S. MENSADB: A Thorough Structural Analysis of Membrane Protein Dimers. Available in arXiv:1902.02321 (https://arxiv.org/pdf/1902.02321.pdf)