/pisa-analysis

Analysis of interfaces for entry assembly with pisa-lite and writing json files

Primary LanguagePythonOtherNOASSERTION

Assembly interfaces analysis

Basic information

This python package works with PISA-Lite to analyse data for macromolecular interfaces and interactions in assemblies.

The code will:

  • Analyse macromolecular interfaces with PISA
  • Create Json dictionary with assembly interactions/interfaces information
git clone https://github.com/PDBe-KB/pisa-analysis

cd pisa-analysis 

Dependencies

The process runs PISA-Lite as a subprocess and requires apriori compilation of PISA. For more information on how to compile PISA-LITE visit our internal page:

PISA-Lite documentation

To make your life easier when running the process, you can set two path environment variables for PISA:

An evironment variable to the binary 'pisa':

export PATH="$PATH:your_path_to_pisa/pisa-lite/build"

A path to the setup directory of PISA:

export PISA_SETUP_DIR="/your_path_to_pisa/pisa-lite/setup"

Additionally, it is required that PISA setup directory contains a pisa configuration template named pisa_cfg_tmp

cp pisa_cfg_tmp your_path_to_pisa/pisa-lite/setup

Other dependencies can be installed with:

pip install -r requirements.txt

See requirements.txt

For development:

pre-commit usage

pip install pre-commit
pre-commit
pre-commit install

Usage

pisa-analysis/pisa_utils/run.py [-h] -i INPUT_CIF_DIR --pdb_id PDB_ID --assembly_id ASSEMBLY_CODE -o OUTPUT_DIR_JSON --output_xml OUTPUT_DIR_XML

OR

pisa-analysis/pisa_utils/run.py --input_cif INPUT_CIF_DIR  --pdb_id PDB_ID --assembly_id ASSEMBLY_CODE --output_json OUTPUT_DIR_JSON --output_xml OUTPUT_DIR_XML

OR install module pisa_analysis:

cd pisa-analysis/

python setup.py install

usage:

pisa_analysis [-h] -i INPUT_CIF_DIR --pdb_id PDB_ID --assembly_id ASSEMBLY_CODE -o OUTPUT_PATH_JSON --output_xml OUTPUT_DIR_XML

Other optional arguments are:

--input_updated_cif  
--force  
--pisa_setup_dir
--pisa_binary

input_updated_cif: updated cif for pdbid entry

force : Always runs PISA-Lite calculation

pisa_setup_dir : Path to the 'setup' directory in PISA-lite

pisa_binary : Binary file for PISA-lite

The process is as follows:

  1. The process first runs PISA-Lite in a subprocess and generates two xml files:

    • interfaces.xml
    • assembly.xml

    The xml files are saved in the output directory defined by the --output_xml argument. If the xml files exist and are valid, the process will
    skip running PISA-Lite unless the --force is used in the arguments.

  2. Next, the process parses xml files generated by PISA-Lite and creates a dictionary that contains all assembly interfaces/interactions information.

  3. While creating the interfaces dictionary for the entry, the process reads Uniprot accession and sequence numbers from an Updated CIF file using Gemmi.

  4. The process also parses xml file assembly.xml generated by PISA-Lite and creates a simplified dictionary with some assembly information.

  5. In the last steps, the process dumps the dictionaries into json files. The json files are saved in the output directory defined by the -o or --output_json arguments. The output json files are:

    xxx-assemX_interfaces.json and xxx-assemblyX.json

    where xxx is the pdb id entry and X is the assembly code.

Expected JSON files

Documentation on the assembly interfaces json file and schema can be found here:

https://pisalite.docs.apiary.io/#reference/0/pisaqualifierjson/interaction-interface-data-per-pdb-assembly-entry

The simplified assembly json output looks as follows:

{
   "PISA": {
      "pdb_id": "1d2s", 
      "assembly_id": "1", 
      "pisa_version": "2.0", 
      "assembly": {
         "id": "1", 
         "size": "8", 
         "macromolecular_size": "2", 
         "dissociation_energy": -3.96, 
         "accessible_surface_area": 15146.45, 
         "buried_surface_area": 3156.79, 
         "entropy": 12.09, 
         "dissociation_area": 733.07, 
         "solvation_energy_gain": -41.09, 
         "number_of_uc": "0", 
         "number_of_dissociated_elements": "2", 
         "symmetry_number": "2", 
         "formula": "A(2)a(4)b(2)", 
         "composition": "A-2A[CA](4)[DHT](2)"
      }
   }
}

Versioning

We use SemVer for versioning.

Authors

See all contributors here.

License

See LICENSE

Acknowledgements