/bioclojure

Seed of a bioclojure library

Primary LanguageClojure

BioClojure

BioClojure is (the seed of) a collection of functions that provides bioinformatics functionality to the clojure programming language. Clojure is a young functional programming language that runs on the JVM. BioClojure uses incanter to provide R-like statistical functionality.

Initial focus

Initial focus is next-generation sequencing and sequence variation. This includes:

  • handling FASTQ and bam files (todo; probably interfacing with Picard)
  • handling VCF files (done)

Installation

Either just download the latest jar file, or compile the jar yourself:

lein deps
lein compile
lein uberjar

“lein deps” will automatically download all dependencies.

Example

Convert VCF file to tab-delimited

This library includes a function to convert VCF files to tab-delimited format. This conversion includes all fields that are defined in the file; no information is omitted (i.e. all INFO tags are included).

Caution: this does not work for VCFv4 yet. That version allows for genotype fields to be dropped entirely. (Is simple to implement, but no time)

./scripts/vcf2tsv NA12878.vcf > NA12878.tsv

Load VCF file into incanter and show histogram of quality scores

cljr repl
> (use 'bioclojure)
> (ns bioclojure)
> (def a (load-vcf "./data/sample.vcf"))
> (with-data a
>   (view (histogram ($ :QUAL))))

Example histogram

What about log-scales?

At the moment incanter charts don’t provide logarithmic scales yet. To do this yourself, do

> (with-data a
>   (view (histogram (map #(log %) ($ :QUAL)))))

Dependencies

Are automatically installed with “lein deps”

  • clojure-contrib
  • incanter

Disclaimers

  • As I’m completely new to clojure, this library will be slow to grow, at least initially.
  • As I don’t use clojure (much) in real-life work (yet), this library will be slow to grow, at least initially.
  • As I have to focus on real work, focus will be what I need: NGS.

Copyright © 2010 Jan Aerts
Licensed under the same terms as clojure