/bedrs

bedtools-like functionality for interval sets in rust

Primary LanguageRustMIT LicenseMIT

MIT licensed actions status codecov Crates.io docs.rs

bedrs

bedtools-like functionality for interval sets in rust

Summary

This is an interval library written in rust that takes advantage of the trait system, generics, monomorphization, and procedural macros, for high efficiency interval operations with nice quality of life features for developers.

It focuses around the [Coordinates] trait, which once implemented on and arbitrary interval type allows for a wide range of genomic interval arithmetic.

It also introduces a new collection type, [IntervalContainer], which acts as a collection of [Coordinates] and has many set operations implemented.

Interval arithmetic can be thought of as set theoretic operations (like intersection, union, difference, complement, etc.) on intervals with associated chromosomes, strands, and other genomic markers.

This library facilitates the development of these types of operations on arbitrary types and lets the user tailor their structures to minimize computational overhead, but also remains a flexible library for general interval operations.

Usage

The main benefit of this library is that it is trait-based. So you can define your own types - but if they implement the [Coordinates] trait they can use the other functions within the library.

For detailed usage and examples please review the documentation.

[Coordinates] Trait

The library centers around the [Coordinates] trait.

This trait defines some minimal functions that are required for all set operations. This includes things like getting the chromosome ID of an interval, or the start and endpoints of that interval, or the strand.

This can be implemented by hand, or if you follow common naming conventions used in the library (chr, start, end, strand) then you can [derive(Coordinates)] on your custom interval type.

use bedrs::prelude::*;

// define a custom interval struct for testing
#[derive(Default, Coordinates)]
struct MyInterval {
    chr: usize,
    start: usize,
    end: usize,
}

Interval Types

While you can create your own interval types, there are plenty of 'batteries-included' types you can use in your own libraries already.

These include:

  • [Bed3]
  • [Bed4]
  • [Bed6]
  • [Bed12]
  • [BedGraph]
  • [Gtf]
  • [MetaInterval]
  • [StrandedBed3]

These are pre-built interval types and can be used in many usecases:

use bedrs::prelude::*;

// An interval on chromosome 1 and spanning base 20 <-> 40
let a = Bed3::new(1, 20, 40);

// An interval on chromosome 1 and spanning base 30 <-> 50
let b = Bed3::new(1, 30, 50);

// Find the intersecting interval of the two
// This returns an Option<Bed3> because they may not intersect.
let c = a.intersect(&b).unwrap();

assert_eq!(c.chr(), &1);
assert_eq!(c.start(), 30);
assert_eq!(c.end(), 40);

Interval Operations

  • [Overlap]
  • [Distance]
  • [Intersect]
  • [Segment]
  • [Subtract]

Interval Set Operations

Set operations are performed using the methods of the [IntervalContainer].

We can build an [IntervalContainer] easily on any collection of intervals:

use bedrs::prelude::*;

let set = IntervalContainer::new(vec![
    Bed3::new(1, 20, 30),
    Bed3::new(1, 30, 40),
    Bed3::new(1, 40, 50),
]);

assert_eq!(set.len(), 3);

For more details on each of these and more please explore the [IntervalContainer] for all associated methods.

  • Bound
  • Closest
  • Complement
  • Find
  • Internal
  • Merge
  • Sample
  • Intersect
  • Segment
  • Subtract

Other Work

This library is heavily inspired by other interval libraries in rust which are listed below:

It also was motivated by the following interval toolkits in C++ and C respectively: