MENSAdb

The features extracted for the database of membrane protein dimer analysis can be replicated through this repository.

INSTALLATION REQUIREMENTS: First you will need to run python setup.py in your terminal to install all the dependencies necessary for feature extraction. Excluded from these dependencies are PSI-Blast and AutoDockTools that you need to install independently. Additionally the non-redundant (nr) database must be downloaded from NCBI (https://ftp.ncbi.nlm.nih.gov/blast/db/).
Before feature extraction, you should perform a PRE-PROCESSING of the PDB files. For that you need to:
- Trim non-transmembrane residues
- Remove heteroatoms
- Mutate exotic amino acids
- Model incomplete structures
- Dimer extraction from the structure Files
- Add hydrogens

To see additional details in how to perform data pre-processing, please see our review - "Structural Characterization of Membrane Protein Dimers" published in Methods in Molecular Biology - Protein Supersecondary Structures (https://www.springer.com/us/book/9781493991600).

A - Obtain all the features using a single PDB file as input.

run.py deploys all the below features as well as the needed libraries to attain the output files. It will look for information in the intermediate file mensadb_fetcher.py. To attain all the features run: python run.py [pdbid] [chains]

Example: python run.py 1a0t PQ

B - Obtain each feature individually using a single PDB file as input.

dssp_features.py extracts the features from a dssp output file. Also requires the corresponding pdb file. To attain the dssp output file use the DSSP executable and run: dssp -i [pdb_name.pdb] >[output_name.txt], in windows, or mkdssp -i [pdb_name.pdb] > [output_name.txt], in UNIX based operating systems. To attain DSSP features, you can run python dssp_features.py, obtaining the following:
- DSSP index
- Amino acid number
- Amino acid code
- Chain
- Secondary Structure
- BP
- ASA
- NH-->O_1_relidx
- O-->NH_1_relidx
- NH-->O_1_energy
- O-->NH_1_energy
- TCO
- KAPPA
- Alpha
- Phi
- Psi
- X-CA
- Y-CA
- Z-CA
features_pssm.py extracts the pssm "jsd" features from psi-blast output file. To retrieve the pssm files needed you will require the psiblast local installation, the non-redundant (nr) database and your input file, with this, run: psiblast -query [fasta_file.fasta] -evalue 0.001 -num_iterations 3 -db [nr] -outfmt 5 -out pssm_output_name.txt -out_ascii_pssm [output_name.pssm] -num_threads 6". Running this step can be very time-consuming, depending on the computer and the protein. To attain PSSM "jsd" features output, you can run: python features_pssm.py.
process_binana.py extracts the features from the BINding ANAlyser output file (BINANA - to download go to http://rocce-vm0.ucsd.edu/data/sw/hosted/binana/#download). To attain the BINANA output, you can run: python binana_1_2_0.py -receptor /path/to/receptor.pdbqt -ligand /path/to/ligand.pdbqt -output_file /path/to/output.pdb, as stated in the website of this software. To use this command, you will need their binana_1_2_0.py script, as well as the ".pdbqt" input files. To attain the selected features from the BINANA output, you can run: python process_binana.py. A single csv will be written for each of the possible features. These features are related to a dimer, specifically.
- Below 2.5 Angstrom residues
- Below 4 Angstrom residues
- Hydrogen Bonds
- Hydrophobic contacts
- Pi-Pi bond stack
- T - stack
- Cation - Pi interaction
- Salt-bridges
generate_class.py uses vmd to extract the interfacial and surface classification for each residue. Makes use of 5 other scripts that are located on the "mensa_class" folder. To use these scripts is required the installation of python based vmd. This can be done with: conda install -c conda-forge vmd-python. The whole code can be run with generate_outputs(input_pdb).joint_call(autodock, autodock_2). Check the path list and replace with your locations. The possible classes are:
- non-interface and non-surface: 0
- non-interface and surface: 2
- interface and surface: 3

References

Preto A.J., Matos-Filipe P., Koukos P.I., Renault P., Sousa S.F., Moreira I.S. (2019) Structural Characterization of Membrane Protein Dimers. In: Kister A. (eds) Protein Supersecondary Structures. Methods in Molecular Biology, vol 1958. Humana Press, New York, NY

Please cite

Matos-Filipe P., Preto A.J., Koukos P.I., Mourão J., Bonvin A.M.J.J., Moreira I.S. MENSADB: A Thorough Structural Analysis of Membrane Protein Dimers. Available in arXiv:1902.02321 (https://arxiv.org/pdf/1902.02321.pdf)

MoreiraLAB/mensadb-open

MENSAdb

The features extracted for the database of membrane protein dimer analysis can be replicated through this repository.

A - Obtain all the features using a single PDB file as input.

B - Obtain each feature individually using a single PDB file as input.