/bioplotz

A plot package for bioinformatics

Primary LanguagePythonBSD 2-Clause "Simplified" LicenseBSD-2-Clause

Introduction

This is a package for plotting some images for bioinformatics.

Dependencies

Python modules:
    numpy
    matplotlib
    pandas

Installation

Install via pip

pip install bioplotz

Install from source code

pip install git+https://github.com/sc-zhang/bioplotz.git --user

Usage

Manhattan Plot

import bioplotz as bp

fig, ax = bp.manhattan(data, threshold=0, color=['orange', 'green'], threshold_line_color='blue', log_base=0,
                       reverse=False, xtick_labels=True, ytick_labels=True, ax=None, marker='.', s=1, **kwargs)
parameter value type explain
data dict
list
dict key: block name
    value: [[x1,x2,...,xn], [y1,y2,...,yn]]
list is a list like: [[x1,y1], [x2, y2], ..., [xn, yn]]
threshold value
list
value if only one threshold line to plot, Notice: if log_base was set, threshold values should be calculated with same log_base manunally, if reverse is True, threshold values should be set to its opposite number
list if more than one threshold line need to plot, a list can be used for different lines, like: [threshold_value1, threshol_value2]
color list color is a list used for blocks, if the count of block greater than color count, it will be used circularly
threshold_line_color value
list
value if threshold is a single value
list if threshold is a list
threshold_line_width value value the line width of threshold lines
block_line_width value value if there are only one color, the block line will display as border, the width is set by this parameter
log_base value log_base = 0 means not calucate value with log
log_base != 0 means log base for log values with it
reverse Boolean if all data lower than 0, you may use it to show opposite values
other parameters value same with parameters used in pyplot.scatter

Chromosome Plot

import bioplotz as bp

fig, ax, clb = bp.chromosome(chr_len_db, chr_order, bed_data, centro_pos, value_type="numeric", orientation="vertical", **kwargs)
parameter value type Optional Default explain
chr_len_db dict No - key: chromosome name
value: chromosome length
chr_order list Yes None list: the custom chromosome order, like: ["Chr1", "Chr3", "Chr2"]
must same with keys in chr_len
bed_data list Yes None list: two dimension list, like: [[chrome name, start pos, end pos, value/color]]
centro_pos dict Yes None key: chromosome name
value: middle position of centromere
value_type str Yes numeric numeric: the 4th column of bed_data should be value
color: the 4th column of bed_data is color
marker: different with other two types, it need 5 columns, the 4th column of bed_data is marker, the 5th column is color (marker is same with the parameter which be used in pyplot.scatter)
orientation str Yes vertical "vertical" or "horizontal"
cmap str Yes gist_rainbow cmap for colorbar
cmap_parts int Yes 100 how many parts for splitting cmap
s float or array-like, shape(n,) Yes None same with parameter s use in pyplot.scatter
other parameters value Yes None same with parameters used in pyplot.plot
  • If value_type is numeric, the return value clb will be colorbar, else None

Gene Cluster Plot

import bioplotz as bp

fig, ax = bp.genecluster(gene_list)
parameter value type Optional Default explain
gene_list list No - list: 2-dimension list, like [[gene name, start pos, end pos, direct(+/-), color], ..., [gene name, start pos, end pos, direct(+/-), color]]
edgecolor list
str
Yes None list: same length with gene_list, like: ["green", "blue", ..., "red"]
str: common edge color for all genes
edgewidth int Yes 1 edge width for all genes
lw int Yes 3 line width to show the genome backbone

Notice, the best figsize should be (gene count, 1), for example: plt.figure(figsize=(16, 1)), and the bbox_inches parameter which in savefig should be 'tight'.

Multi Alignment Plot

import bioplotz as bp

fig, ax = bp.multialign(data)
parameter value type Optional Default explain
data dict No - key: gene name
value: alignment sequence
match_color str Yes blue color of matched bases
match_background_color str Yes None background color of matched bases
mismatch_color str Yes red color of mismatched bases
mismatch_background_color str Yes None background color of mismatched bases
base_per_line int Yes 80 base count to display for each line
highlight_positions list Yes None positions for highlighting, 0-base
highlight_color str Yes green color of highlighting positions in highlight_positions
highlight_background_color str Yes None background color of highlighting positions in highlight_positions
**kwargs any Yes - same with which use in ax.text

Notice, the figsize should be (base_per_line/10, x) where x=align_length/base_per_line*gene_count/5, and the font must be monospaced, like "Courier New", that sometimes user need add codes as following.

import matplotlib as mpl
import bioplotz as bp

basefont = mpl.font_manager.FontProperties(fname="/path/to/font.ttf")
fig, ax = bp.multialign(data, fontproperties=basefont)

or

plt.rcParams['font.sans-serif'] = 'Courier New'