An Introduction To Applied Bioinformatics

Bioinformatics, as I see it, is the application of the tools of computer science (things like programming languages, algorithms, and databases) to address biological problems (for example, inferring the evolutionary relationship between a group of organisms based on fragments of their genomes, or understanding if or how the community of microorganisms that live in my gut changes if I modify my diet). Bioinformatics is a rapidly growing field, largely in response to the vast increase in the quantity of data that biologists now grapple with. Students from varied disciplines (e.g., biology, computer science, statistics, and biochemistry) and stages of their educational careers (undergraduate, graduate, or postdoctoral) are becoming interested in bioinformatics.

I teach bioinformatics at the undergraduate and graduate levels at Northern Arizona University. This repository contains some of the materials that I've developed in these courses, and represents an initial attempt to organize these materials in a standalone way. If you'd like to read a little more about the project, see my blog post on microbe.net.

Disclaimer

This project is in very early development stage. It's not ready for prime-time by any means, but I fall firmly into the "publish early, publish often" mindset, hence its public availability. I am very interested in feedback in the form of email (gregcaporaso@gmail.com) or pull requests.

The code in the iab module is not sufficiently tested, documented, or optimized for production use. As code reaches those quality standards it will be ported to scikit-bio. I do not recommend using the code in the iab module outside of these notebooks. In other words, don't import iab outside of the notebooks - if you want access to the functionality in your own code, you should import skbio.

Currently, the best example of where I'm hoping to go with these materials is the multiple sequence alignment chapter.

Outline

To browse the book, start here.

Getting started
Fundamentals
Pairwise alignment (contains an exercise)
Database searching and determining the statistical significance of an alignment
Phylogeny reconstruction: distances, distances matrices and hierarchical clustering with UPGMA
Multiple sequence alignment (contains an exercise)
Read mapping and clustering
Applications
Studying biological diversity

How to read the book

There are two ways to read An Introduction To Applied Bioinformatics:

The recommended way to read the book is to download and run the IPython notebooks interactively. You can do this by cloning the GitHub repository, installing the package and its dependencies, and running the notebooks interactively. Instructions for doing this are provided below in the Installation section.
The easiest way to read the book is to view the static notebooks online using nbviewer. You should start here.

If you're new to using IPython or the IPython Notebook, you can find more information at the IPython website, IPython Notebook website, and the IPython Notebook example gallery.

Installation

If you're going to read the book interactively (recommended), you'll need to clone this repository, install some dependencies, and launch the IPython Notebook. For example, the following commands should work for Linux and Mac OS X users:

git clone https://github.com/gregcaporaso/An-Introduction-To-Applied-Bioinformatics.git
cd An-Introduction-To-Applied-Bioinformatics
pip install numpy
pip install .

Finally, launch the IPython Notebook to get started (be sure that you're in the An-Introduction-To-Applied-Bioinformatics directory when you run this command):

ipython notebook --pylab inline Index.ipynb

That's it!

If you'd like to install the book's dependencies manually (or some other way than using pip), here's what you'll need:

Python 2.7
numpy >= 1.7
scipy >= 0.13.0
matplotlib >= 1.1.0
pandas >= 0.13.1
IPython >= 2.0.0
tornado
pyzmq
jinja2
scikit-bio == 0.1.4
biom-format < 2.0.0 (temporarily, soon to be 2.0.0)
pyqi
future

Note that even if you have all of the above dependencies installed, you should still run pip install . as there is a small, required codebase included with the book (the iab module).

More information

These materials are primarily being developed by Greg Caporaso (GitHub: @gregcaporaso) in the Caporaso Lab at Northern Arizona University. You can find information on the courses I teach on my teaching website and information on my research and lab on my lab website.

See the repository's contributors page for information on who has contributed to the project.

Acknowledgements

Development of An Introduction to Applied Bioinformatics was supported in part by Arizona's Technology and Research Initiative Fund. The style of the project was inspired by Bayesian Methods for Hackers.

I want to thank the IPython Developers for all of their work on the IPython Notebook, as well as the QIIME developers and scikit-bio developers for the countless discussions over the years that helped me develop my understanding of the material presented here. This project wouldn't be possible without all of you, and I look forward to many more years of productive, fun and exciting work together!

License

An Introduction to Applied Bioinformatics by The Caporaso Laboratory is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://github.com/gregcaporaso/An-Introduction-To-Applied-Bioinformatics.

christopheryoung/An-Introduction-To-Applied-Bioinformatics