/bioStructureM

a basic matlab package for analysis protein structure

Primary LanguageMATLABMIT LicenseMIT

bioStructureM

a basic matlab package with VMD-like selection snytax for analysis protein structure

Quick Start


Quick Start

Import path

Import bioStructureM path to Matlab

addpath('your_bioStructureM_root/core');  
addpath('your_bioStructureM_root/atomselector');  

Read local pdb file (ex. 1BFG).

The pdbStruct is a MATLAB structure array contain several fields :

Field Name Data Type Description
alternate char Alternate location indicator
atomno double Atom serial number
atmname char Atom name. The max length of atmname is 4 Characters.
bval double Temperature factor (b-factor)
charge char Charge on the atom
coord 3 x 1 double Coordinates in Angstroms.
elementSymbol char Element symbol
iCode char Code for insertion of residues
occupancy double Occupancy
record char Record name can be either ATOM or HETATM
resname char Residue name
resno char Residue sequence number
segment char Segment identifier
subunit char Chain identifier
pdbStruct = readPDB('1BFG.pdb');

Get coordinate data

The return of getCoord is a n by 3 array. Where the "n" is number of atoms of the pdbStruct.

crd = getCoord(pdbStruct);

Get other attributes of the pdb

For the double format data

bfactor = [pdbStruct.bval];  % bfactor is an double array.

For the characters format data

atomName = {pdbStruct.atmname}; % atom Name is a cell array.  

Get center of geometry

gcenter = getGeometrycenter(pdbStruct);  

Get center of mass

Before using getCenterOfMass, assigning mass to each atom is needed.

pdbStruct = assignMass(pdbStruct);
mcenter = getCenterOfMass(pdbStruct);  

Atom selection (as)

Use VMD-like syntax to select specific atoms.
Select by atom name.

CaStruct = as('name CA',pdbStruct);

Select by residues id

T73 = as('resi 73',pdbStruct);

Select protein or water

protein  = as('protein',pdbStruct);
water = as('water',pdbStruct);  

The return of "as" is a structure array that has same fields as original structure

Simple Selection

  • name atomname {selected-atom-names} Using space as delimiter to separate the different atom names.

      as('name CA C O N',pdbStruct)
      as('atomname CA C O N',pdbStruct)
    

"name" and "atomname" support simple regular expression. For example "name H*", this command will select all the atoms which the names is H1,H2,HD1... etc.

   as('name H*',pdbStruct)

Because of supporting regular expression, the command to select the "H*" atoms should be "H\*".

  • resi resid residue {selected-resids}
    select by residue ids

      as('resi 73 80',pdbStruct)
      as('resid 73 80',pdbStruct)
    

select sequence residue ids: start:step:end or start:end

    as('resi 19:90',pdbStruct)
    as('resi 19:2:90',pdbStruct)
    as('resi 19:31 40:60 144',pdbStruct)

  • record {ATOM|HETATM}

      as('record HETATM',pdbStruct)
    

  • insertion {single-character}
    select by insertion code (iCode) of residues

      as('insertion A',pdbStruct)
      as('insertion A \s',pdbStruct)
    

"\s" is used to select the atoms that the insertion code is empty.

  • bval beta {<|<=|>|>=|=}{value}
    select by specific Temperature factors

      as('bval >40',pdbStruct)
      as('bval =40',pdbStruct)  
    

Note: There should have extra space between "bval" and "condition". ex. "bal>40" is a wrong represontation.

  • resn restype {residue-names}

      as('resn ALA',pdbStruct)
      as('resn ALA TYR',pdbStruct)  
    

  • seq sequence {protein-sequence}  

      as('seq GGFFLRIHPDGRVD',pdbStruct)
      as('sequence GGFFLRIHPDGRVD',pdbStruct) 
    

  • chain c. {one-character-chain-ID}

      as('c. A',pdbStruct)
      as('c. A B',pdbStruct)
      as('c. \s',pdbStruct)
    

"\s" is used to select the atoms that the chain ID is empty.

  • segment segid {segids}

      as('segid PROA',pdbStruct)
      as('segment PROA WAT',pdbStruct)
    

  • atomnum atomicnumber {atom-indexes}
    similar as resid
  • x y z {<|<=|>|>=|=}{value}
    similar as bval

Single keywords

    as('protein',pdbStruct)
    as('all',pdbStruct)  

keywords:

  • all
  • protein
  • backbone
  • water wat
  • nucleic
  • het. HETATM

Selection Operator

    as('protein or water',pdbStruct)
    as('(protein and c. A) or water',pdbStruct)

  • and &
    Select the intersection of two selections

      as('resi 73 and name CA CB',pdbStruct)
      as('resi 73 and name CA CB and bval >10',pdbStruct)
    

  • or |
    Select the union of two selections

      as('protein or water',pdbStruct)
    

  • not
    Select all atoms not in selection

      as('not water',pdbStruct)
    

  • within {distance} of

      as('water within 4 of protein',pdbStruct)  
    

Let's call water as sel1 and protein as sel2. The command would select any atoms in sel1 which are wihin 4 Angstroms of any atom in sel2.

  • ()
    Change the priority of selection command.
    This command would select the O atoms of water only.

      as('protein and name CA or water and name O',pdbStruct)  
    

After add () to the command, it can select CA atoms in protein and O atoms in water

    as('(protein and name CA) or (water and name O)',pdbStruct)  

  • byres
    Extend selection to complete residues

      as('byres (protein within 4 of resi 73)',pdbStruct)
    

  • bychain
    Extend selection to whole atoms in same chain.

      as('bychain resi 73',pdbStruct)  
    

Set attribute by selection

This section shows how to set values to specific field and atoms.

asSetAttribute('protein',pdbStruct,'segment','PROA')

Change the segment field of protein to "PROA".

asSetAttribute('all',pdbStruct,'bval',0)
newCABval = ones(1,144)*100
asSetAttribute('protein and CA',pdbStruct,'bval',newCABval)  

Set CA atoms bfactor to 100 and set all others to zero.
Note: The number of assigned values should be same as number of atoms or a sigle value.

Save a new PDB file

save pdbStructure as a .pdb file

createPDB(pdbStruct,'output_path.pdb')