de.bioforscher.jstructure is a light-weight library for structural bioinformatics. It provides a stream-based API to work with macromolecular structures in PDB format.
de.bioforscher.jstructure hierarchy is close to that BioJava of: a Structure
contains any
number of Chain
objects which contain any number of Group
objects which
contain any number of Atom
objects. Each element of this hierarchy implements
a Container
interface, so it is possible to retrieve all registered children
in a standardized manner. A Group
is an AtomContainer
, so it provides
a function Stream<Atom> atoms()
which is used to get access to all Atoms
linked to this particular Group
. A Chain
is a GroupContainer
as well
as an AtomContainer
, thus one can retrieve all Group
objects by calling
Stream<Group> groups()
and all Atom
objects by Stream<Atom> atoms()
.
Selection
provides a more fine-grained retrieval of child elements in a nature similar to
that of a step-wise builder and will recognize the level of the selection.
Furthermore, each element is aware of its parent container. This reference is
automatically set, when a children element (such as an Atom
) is added to a
parent container (in that case a Group
). This allows for individual atoms
to return their PDB representation as an ATOM record (which depends
information on the parent Group
as well as the parent Chain
). In
consequence every element of the hierarchy can compose its PDB representation
by gathering all ATOM records of all Atom
objects associated to it.
To any of these classes arbitrary data can be attached to.
// fetch a structure or load from local PDB if setup
Structure structure = StructureParser.fromPdbId("1brr").parse();
// select a chain
Chain chainB = structure.select()
.chainId("B")
.asChain();
// or a residue
AminoAcid aminoAcid1 = chainB.select()
.residueNumber(60)
.asAminoAcid();
// and another one
AminoAcid aminoAcid2 = chainB.select()
.residueNumber(100)
.asAminoAcid();
// compute their distance
System.out.println("distance of " + aminoAcid1 + " and " + aminoAcid2 + ": " +
StandardFormat.format(aminoAcid1.calculate()
.centroid()
.distance(aminoAcid2.calculate()
.centroid())));
// access amino acid-specific atoms
chainB.select()
.aminoAcids()
.groupName("TRP")
.asFilteredGroups()
.map(Tryptophan.class::cast)
.map(tryptophan -> tryptophan + " CG position: " +
Arrays.toString(tryptophan.getCg().getCoordinates()))
.forEach(System.out::println);
// compute features on-the-fly and resolve dependencies
// e.g. assign some random value to each amino acid
structure.aminoAcids()
.forEach(aminoAcid -> aminoAcid.getFeatureContainer().addFeature(new Feature(new Random().nextDouble())));
chainB.aminoAcids()
.map(aminoAcid -> aminoAcid + " random feature: " +
StandardFormat.format(aminoAcid.getFeature(Feature.class).getValue()))
.forEach(System.out::println);
System.out.println("averages among chains:");
structure.chainsWithAminoAcids()
.map(chain -> chain.getChainIdentifier() + "'s average random feature: " +
StandardFormat.format(chain.aminoAcids()
.map(aminoAcid -> aminoAcid.getFeature(Feature.class))
.mapToDouble(Feature::getValue)
.average()
.getAsDouble()))
.forEach(System.out::println);
Several FeatureProvider
implementations are provided which allow the computation or
annotation of values such as secondary structure information, accessible surface area values,
membrane topology, evolutionary information or UniProt data.
Dependencies between them are resolved automatically and the user can request features which will be computed on-the-fly, when they are not already present.