/rdkitcffi

Rust wrapper for the RDKit using CFFI

Primary LanguageC++MIT LicenseMIT

rdkitcffi

This is an 🚧 experimental 🚧 rust wrapper for some functionality of the great RDKit cheminformatics library.

It makes use of its new C Foreign Function Interface (CFFI), see also this blog post.

Use it at your own risk, its not recommended yet for productive use :-)

Please note, that only a limited functionality is being exposed via cffi by RDKit and not all of this is available yet via this interface.
Have a look at the examples below and the test functions.

There are still some dependencies to specific version of boost and rdkit (some headers & the shared lib), that imply some manual work, see also the installation section.
Currently those deps are added directly to the repository for the sake of simplicity, of course this should be done in a better and more dynamic way.

Currently, only linux is supported, however support for macos should also be viable.

Examples

Basic usage:

use rdkitcffi::Molecule;

let smiles = "OCCC#CO";
let mol = Molecule::new(smiles, "").unwrap();

let natoms = mol.get_numatoms();

Additional arguments can be passed via json

use rdkitcffi::Molecule;

let json_args = "{\"removeHs\":false,\"canonical\":false}";
let mol = Molecule::new("c1cc(O[H])ccc1", json_args).unwrap();

Working with SD files and filtering invalid molecules:

use rdkitcffi::{Molecule,read_sdfile};

let mut mol_opt_list : Vec<Option<Molecule>>= read_sdfile("data/test.sdf");
let mut mol_list: Vec<Molecule> = mol_opt_list.into_iter().filter_map(|m| m).collect();
mol_list.iter_mut().for_each(|m| m.remove_all_hs());

Dealing with invalid molecules

use rdkitcffi::Molecule;

let result = Molecule::new("OCCO", "");
match result {
   Some(m) => println!("Result: {:?}", m),
   None => println!("Could not get molecule!"),
};

Getting a JSON represenation (via serde_json):

use rdkitcffi::Molecule;

let mol = Molecule::new("OCCO", "").unwrap();
println!("json: {:?}", mol.get_json(""));

Neutralizing a zwitterion

use rdkitcffi::Molecule;

let mut mol = Molecule::new("C(C(=O)[O-])[NH3+]", "").unwrap();
mol.neutralize("");
println!("{:?}", mol.get_smiles(""));

Computing RDKit descriptors

use rdkitcffi::Molecule;

let mol = Molecule::new("CCCN", "").unwrap();
let desc = mol.get_descriptors_as_dict();
let nrot = desc.get("NumRotatableBonds");
let logp = desc.get("CrippenClogP");

Creating a polars dataframe:

use rdkitcffi::Molecule;
use polars::prelude::*;
use polars::df;

let mut mol_list : Vec<Molecule> = rdkitcffi::read_smifile_unwrap("data/test.smi");
let a: Vec<_> = mol_list.iter().map(|m| m.get_smiles("")).collect();
let df = df!( "smiles" => a).unwrap();

Installation

Currently only linux is supported.
In some cases you may have also to install some additional packages for installation:

sudo apt-get install build-essential
sudo apt-get install libclang-dev

Download the repo:

git clone https://github.com/chrissly31415/rdkitcffi.git  

If you have a rust/cargo installation, just run

cd rdkitcffi
cargo build  
cargo test --lib  

After installation update your LD_LIBRARY_PATH in order to run binaries without cargo, e.g.:

export LD_LIBRARY_PATH=/home/username/rdkitcffi/lib/rdkitcffi_linux/linux-64/:$LD_LIBRARY_PATH