/RNAseq_figure_plotter_python

Require one command line! Generate nine common RNAseq figures from RNAseq result table by python.

Primary LanguagePython

RNAseq_figure_plotter_python

Generate nine different plots (bar, box, density, dot, heatmap, histogram, line, scatter, or violin) from RNAseq result table using seaborn program.

This software runs in python 2.7 environment. Please type this code "conda install -c anaconda seaborn=0.9.0" to update seaborn to use rnaseq_figure_plotter software.

It is python codes and use "python rnaseq_figure_plotter.py -i input_file -t bar -o output_file -g gene_list_file ... -c 5 -s 6" to run!

parameter of rnaseq_figure_plotter

HELP		-h, --help		show this help message and exit

required function

INPUT		-i, --input		input file name

TYPE		-t, --type		choose plot types (bar, box, density, dot, heatmap, histogram, line, scatter, or violin)

general optional function

OUTPUT		-o, --output		default output; output file name

GENE		-g, --gene		file name of specific gene ID list; generate "output"_gene_selection.txt file

LOG2		-l, --log		default None; calculate log value (log2; 2, log10; 10, loge; e)

LOG2_NUMBER	-lgn, --log_number		default 0.000000001; add number to avoid -inf for log value

XAXIS		-x, --xaxis		default samples; choose x-axis (gene, sample, or value)

YAXIS		-y, --yaxis		default data; choose y-axis (gene, sample, or value)

ZAXIS		-z, --zaxis		default gene; choose z-axis (gene, sample, or value)

COLOR		-c, --color		default 1; choose color type (1-10)

FIGURE_SAVE_FORMAT	-f, --figure_save_format		default pdf; choose format of figures (eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, or tiff)

optional parameter for individual plot types

STYLE		-s, --style		default 1; choose style of figures (1-8)

ZSCORE		-zs, --zscore		default None; apply z-score transformation in heatmap. Z-score application in column or row is --xaxis (column); 1, and --zaxis (row); 2)

CLUSTER_COLUMN	-cc, --cluster_column		default None; apply column cluster function for heatmap (on; 1)

CLUSTER_ROW	-cr, --cluster_row		default None; apply row cluster function for heatmap (on; 1)

SCATTER_COLUMN	-sc, --scatter_column		default None; type column of two samples for comparison in dot plot. Split samples by comma(,). (example "sample1,sample2")

SCATTER_ROW	-sr, --scatter_row		default None; type row of two genes for comparison in dot plot. Split genes by comma(,). (example "geneA,geneB")

input file format (-i, --input)

Input file requires to be tab delimited file. First column and row should be gene ID and sample name, respectively. Gene expression value starts from second columns and rows.

Example of input file looks like followings;

		sample1	sample2	sample3	sample4	sample5
	geneA	1	3	5.5	7	2
	geneB	100	267	55	79	62
	geneC	0.3	0.65	9.5	0.87	2.1
	geneD	205	356	78	67	2900
	geneE	1001	3001	5500	7001	2001
	geneF	2	2	2	2	2
	geneG	0.01	0.03	0.5	0.07	0.02

type of plots (-t, --type)

There are nine types of plot you can choose from bar, box, density, dot, heatmap, histogram, line, scatter, or violin.

All plots are generated by using Seaborn (https://seaborn.pydata.org).

output file name (-o, --output)

Provide output file name.

specific gene id list file format (-g, --gene)

Gene ID should be in first row and split by \n.

Example of specific gene ID list file looks like followings;

	geneA
	geneD
	geneG

(-g, --gene) function automatically selects expression value consistent with provided specific gene ID, and provides "output"_gene_selection.txt file.

Example of "output"_gene_selection.txt file looks like followings;

	geneA	1	3	5.5	7	2
	geneD	205	356	78	67	2900
	geneG	0.01	0.03	0.5	0.07	0.02

log2 transformation (-l, --log) and (-lgn, --log_number)

Provide log2, log10, or loge transform for gene expression value by type 2, 10, or e, respectively in (-l, --log) function. Default of (-l, --log) function is off (None).

To avoid -inf for log2 value for generating plots, (-lgn, --log2_number) function add tiny values (defalut 0.000000001). You can customize this value by type number (example 0, 0.000001, 0.000000000000000001, etc...).

axis (-x, --xaxis), (-y, --yaxis), and (-z, --zaxis)

Default of x-axis, y-axis, and z-axis are sample, data, and gene, respectively. Sample, data, and gene refer to sample name, gene expression value, and gene ID, respectively.

Following table shows which axis you can modify.

plots		x-axis	y-axis	legend
bar		x	y	z*
box		x	y
density		x*
dot		x	y	z*
heatmap		x*	z*	
histogram	x*
line		x*	y(data)	z*
scatter
violin		x	y

*(sample or gene)

color settings (-c, --color)

Seaborn color palette (https://seaborn.pydata.org/tutorial/color_palettes.html) is using for color setting. Setting is followings;

settings	palette			color description
1		RdBu_r (default)	red to blue 
2		Reds			red to white
3		Blues			blue to white
4		RdYlBu_r		red to yellow to blue
5		RdGy_r			red to glay
6		Paired			read seaborn website
7		cubehelix		read seaborn website
8		muted			read seaborn website
9		hls			read seaborn website
10		Set2			read seaborn website

save figure format (-f, --figure_save_format)

Provided save figure format. Default is pdf, you can also choose eps, jpeg, jpg, pgf, png, ps, raw, rgba, svg, svgz, tif, or tiff

style settings (-s, --style)

Seaborn set_style and set_context (https://seaborn.pydata.org/tutorial/aesthetics.html) is using for style setting. Setting is followings;

set_style and set_context are background settings and size (paper; small and talk; large), respectively.

settings	set_style	set_context
1		whitegrid	paper
2		whitegrid	talk
3		white		paper
4		white		talk
5		darkgrid	paper
6		darkgrid	talk
7		dark		paper
8		dark		talk

z-score transformation (-zs, --zscore)

(-zs, --zscore) function can be used for heatmap. Z-score application for column (-x, --xaxis) and row (-z, --zaxis) are 1 and 2, respectively.

cluster function for heatmap (-cc, --cluster_column) and (-cr, --cluster_row)

Apply clustering in column and/or row by type 1.

scatter plot two dataset setting (-sc, --scatter_column) and (-sr, --scatter_row)

Type two dataset settings for column (sample) and row (gene) by (-sc, --scatter_column) and (-sr, --scatter_row) function, respectively. This code is required for scattered plot.

(-sc, --scatter_column) and (-sr, --scatter_row) function required dataset "x-axis,y-axis" for scattered plot and split samples or genes by comma(,). Example of (-sc, --scatter_column) and (-sr, --scatter_row) are "sample1,sample3" and "geneA,geneG", respectively. Color cannot change in scatter plot function.