This project is under development and solely for internal use. Many parts are in flux, and no guarantees about correctness or stability can be made.
bentopy uses Rust to speed up some I/O operations of large files. Hence, a Rust compiler is required during installation. To check whether this is the case, you can run
cargo --version
If it is not present, you can install it by any means you prefer. Installation through rustup is very convenient.
If you don't care about peeking into the sources and just want access to the program, this is the quickest option.
python3 -m venv venv && source venv/bin/activate # Not required, but often convenient.
pip3 install git+https://github.com/marrink-lab/bentopy
git clone https://github.com/marrink-lab/bentopy
cd bentopy
python3 -m venv venv && source venv/bin/activate
pip3 install .
bentopy currently features four subcommands, pack, render, mask, and grocat.
You can learn about the available options through the help information.
bentopy --help
bentopy pack --help
A typical bentopy workflow may look like this.
bentopy grocat -> bentopy mask -> bentopy pack -> bentopy render -> bentopy grocat
What follows is a brief explanation and example invocation of these subcommands. A more detailed walkthrough can be found in the example section.
pack provides the core functionality of bentopy. Given an input configuration file, a packing of the input structures within the specified space is created.
bentopy pack input.json --rearrange --seed 5172
Pack a system defined in input.json
. Prior to packing, rearrange the
specified structures according to a size heuristic to improve the possible
density and set the random seed to 5172.
This packing is stored as a placement list, which is a json
file that
describes which structures at what rotations are placed where. In order
to create a structure file (and topology file) from this placement list that
can be read by molecular visualization and simulation programs, the render
subcommand can be used.
bentopy render placements.json structure.gro -t topol.top
Render placements.json
created by pack to a gro
file at structure.gro
and write a topology file to topol.top
.
To set up a configuration for pack, you must define a space into which the
structures will be packed. This space can be defined according to an analytical
function, such as a sphere. But, bentopy is also capable of packing arbitrary
spaces provided as voxel masks. Any boolean numpy array stored as a compressed
file (.npz
) of the correct dimensions can function as a valid
mask.
The mask subcommand provides a convenient and powerful means of setting up such masks based on your existing structures from the command line. mask can be used to automatically or manually select different compartments as determined by mdvcontainment.
bentopy mask chrom_mem.gro mask.npz --autofill
Determine the compartments in chrom_mem.gro
and automatically select the
innermost compartment (--autofill
). From that selected compartment, write a
mask to mask.npz
As the name suggests, grocat is a tool for concatenating gro
files. Though
this is a relatively simple operation, grocat provides a convenient way of
telling apart different sections of large models by optionally specifying a new
residue name for a whole file in the argument list by appending :<residue name>
to a file path.
bentopy grocat chromosome.gro:CHROM membrane.gro:MEM -o chrom_mem.gro
Concatenate chromosome.gro
and membrane.gro
into chrom_mem.gro
, setting
the residue names of the chromosome atoms to CHROM
and those of the membrane
to MEM
in the concatenated structure.
Let's try to pack a spherical system that is full of lysozyme structures.
First, we want a structure to pack, so we can download the structure for
3LYZ
. We place it in a structures
directory to stay organized.
wget https://files.rcsb.org/download/3lyz.pdb
mkdir structures
mv 3lyz.pdb structures
Now we can set up our input configuration, which we will call
3lyz_input.json
:
{
"space": {
"size": [100, 100, 100],
"resolution": 0.5,
"compartments": [
{
"id": "main",
"shape": "spherical"
}
]
},
"output": {
"title": "3lyz",
"dir": "output",
"topol_includes": [
"forcefields/forcefield.itp",
"structures/3lyz.itp"
]
},
"segments": [
{
"name": "3lyz",
"number": 6500,
"path": "structures/3lyz.pdb",
"compartments": ["main"]
}
]
}
We set the space up to a size of 100×100×100 nm, with a resolution of 0.5 nm. The mask—the volume that defines where structures can be placed—is set to be derived from a spherical analytical function.
In case you want to use a custom mask like you may set up with bentopy mask, you could specify the space in the following manner.
"compartments": [
{
"id": "main",
- "shape": "spherical"
+ "voxels": { "path": "mask.npz" }
}
]
Here, voxels and the associated path point to a precomputed voxel mask.
This mask can be any data that can be loaded by np.load()
to be
interpreted as a three-dimensional boolean mask. The provided mask must have
the same size as specified in the space section's dimensions divided by
the resolution.
Constraining compartments.
A compartment definition can also take a constraint parameter. Currently, only the axis predicate is available, which constrains all placements in that compartment such that only placements with the specified value for that axis are considered valid. The following example of a compartment definition accepts placements as valid if and only if the z-component of a placement is at 50 nm.
{
"id": "flat",
"constraint": "axis:z=50.0",
"shape": "cuboid"
}
In output, we set a title and directory to write the placement list
to. With the optional field topol_includes, we can specify what itp
files files are to be included if the placement list produced
from this config is written to a topology file (.top
).
Note
For this example, we filled this field with dummy paths.
Finally, in the segments section, we define a list of structures to place. In our case that is only one: which we give the name "3lyz", and we set the number of segments to place to 6500. The path points pack to where the structure file for this segment can be found.
Important
The name record must be selected carefully. If you want to write out a
valid topology file using bentopy render, the value of name must
correspond to the names in the itp
files.
Constraining segment rotations and setting a center adjustment.
For some structures, it can be helpful or necessary to constrain the rotation
of certain segments. The rotation_axes parameter takes a string with the
axes over which a structure may be randomly rotated. Any axes that are not
mentioned will not be rotated. For instance, the axes definition "xyz"
indicates full rotational freedom and is the tacit default (rotation is allowed
over x, y, and z axes), while "z"
constrains the rotation such that it
may only occur over the z-axis, leaving x and z rotation as provided in
the structure file.
The center parameter can be used to provide an offset in nm. When ommitted,
its default value is "auto, auto, auto"
, which defines the center as the
geometric center of the structure. Any of the three values can be replaced by a
floating point value, which sets an adjustment from the auto
center.
See #24, which tracks the
development of an additional keep
parameter, which would respect the center
for some axis as its zero-value in the structure file.
{
"name": "1a0s",
"number": 100,
"path": "structures/1a0s.pdb",
"rotation_axes": "z",
"center": "auto, auto, -1.2",
"compartments": ["flat"]
}
With the above segment definition, up to a 100 instances of some structure will be placed according to some compartment with the id "flat", with a -1.2 nm offset to its geometric center over the z-axis, while only allowing rotation over its z-axis.
Now, we are ready to pack the system. We could simply do this as follows.
bentopy pack 3lyz_input.json
In order to make the procedure deterministic, the --seed
parameter can be
set. This means that the same command will produce the same output between
runs.
bentopy pack --seed 1312 3lyz_input.json
In case we want to pack multiple structures, we may want to pass the
--rearrange
flag, as well. This will re-order the structures such that large
structures are placed first, and small structures are placed last. This
placement heuristic can lead to denser packings. When it is not set, the order
of the structures in the input configuration is respected.
After the command finishes, we will find that output/3lyz_placements.json
has
been created. This is a single-line json
file, which can be hard to inspect.
If you are curious, you can use a tool such as jq
to look at what was
written in a more readable form.
jq . output/3lyz_placement.json
The output may look like this (some lines have been cut and adjusted for legibility).
{
"title": "3lyz",
"size": [ 100, 100, 100 ],
"topol_includes": [ ... ],
"placements": [
{
"name": "3lyz",
"path": "structures/3lyz.pdb",
"batches": [
[
[
[ 1.0, 0.0, 0.0 ],
[ 0.0, 1.0, 0.0 ],
[ 0.0, 0.0, 1.0 ]
],
[
[ 8, 46, 68 ],
[ 26, 62, 88 ],
... many many more of such lines ...
]
],
[
[
[ 0.3658391780537972, -0.3882572475566672, -0.8458238619952991 ],
[ -0.8851693094147572, -0.4258733932991502, -0.18736901171236636 ],
[ -0.28746650147647396, 0.8172442490465064, -0.49947457185455224 ]
],
[
[ 31, 41, 56 ],
[ 61, 53, 4 ],
... many many more of such lines ...
]
]
... and on and on and on ...
]
}
]
}
render reads in the placement list and writes out a gro
file
(and optionally, a [top
topology file][top]). This is a separate operation,
since the packed systems can become very large. Storing the placement list as
an intermediate product decouples the hard task of packing from the simple work
of writing it into a structure file.
We want to render out the placement list we just created into a structure file
called 3lyz_sphere.gro
. Additionally, we would like to produce topology file
(topol.top
) that Gromacs uses to understand how the structure file is built
up.
bentopy render output/3lyz_placements.json 3lyz_sphere.gro -t topol.top
You can now inspect the 3lyz_sphere.gro
structure in a molecular
visualization program of your preference.
But beware! We just created big structure, and some programs may have a hard time keeping up.
Luckily, _bentopy render_ has some additional tricks up its sleeve to ease this load.
In case you want to inspect only a small part of a very large placement list,
the --limits
option allows you to select a cuboid within the volume defined
by the placement list from which the placed structures will be rendered. The
volume that is cut out is defined by a sequence of six comma-separated values
in the order minx,maxx,miny,maxy,minz,maxz
. If a value is a number, it is
interpreted as a dimension in nm. If it is not a number (the phrase 'none' is
conventional) no limits are set on that dimension.
For example, to only render a 10×10×10 nm cube extending from the point (40, 40, 40) to (50, 50, 50), we can pass the following limits.
bentopy render output/3lyz_placements.json 3lyz_small_cube.gro --limits 40,50,40,50,40,50
Perhaps we would like to see a pancake instead! To do this, we can define the limits only for the z-direction.
bentopy render output/3lyz_placements.json 3lyz_pancake.gro --limits none,none,none,none,45,55
Using --limits
, we can cut out a part of the packed structure, but perhaps
you want to inspect the total structure without loading as many atoms.
For this, you can try the --mode
option, which gives you the ability to only
render out certain atoms (backbone
, alpha
carbon) or beads (representing
each residue
, or even only one per structure instance
). By default, the
mode is full
, and we have just seen its output. Let's try alpha
, now.
[!WARNING] Some of these options (
backbone
) may not be functional right now.
bentopy render output/3lyz_placements.json 3lyz_alpha.gro --mode alpha
Now, we can compare the sizes of the files.
wc -l 3lyz_sphere.gro 3lyz_alpha.gro
Reducing the number of atoms that are rendered out can improve the time it takes to inspect a packing, if necessary.
[!NOTE] Using modes other than
full
(the default) is obviously not relevant beyond inspection and analysis of the packed structure. To reflect this, the option to write a topology file and setting a mode are mutually exclusive.
In case you want to render out a structure based on a placement list that you
or a colleague have created in a different environment, it can be useful to
direct render to read the input structures from a different directory. To do
this, you can set a root path for the structures with the --root
option. This
path will be prepended to any relative structure path that is defined in the
placement list.