DEG_workflow

Workflow to perform Differential Expression Gene analysis from raw fastq files.

USAGE:

Initializing experiment:

Create manage_input_files.sh manually, from which all raw samples must be linked with a suitable name to another custom folder (e.g. "raw_files", "samples" or "renamed_sample"). This step must be performed for customizate samples according to the experiment and keep original sample names in order to facilitate sample backtracking (In case of failure) at same time. New sample names must be easy to interpret and it is recommended short and non redundant names.

After that create samples_to_process.lst: Is a file containing a list of new sample names. IMPORTANT: Sample name must not include paired-end related information and file extension.

mRNA-seq or miRNA-seq differential expression analysis

1º: Create an experiment design table. Experiment design table must include sample names as first column (with header: sample_name) and all features as other columns, and it also must include a header with feature names.

Experiment design example:

sample_name	id	clinical_status	age_group	location
ch_dis_1	patien1	diseased	child	usa
ad_dis_2	patien2	diseased	adult	europe
ch_hl_3	patien3	healthy	child	europe
ad_hl_4	patien4	healthy	adult	usa
ch_dis_5	patien5	diseased	child	usa
ad_hl_6	patien6	healthy	adult	europe
ad_dis_7	patien7	diseased	senior	usa
ad_dis_8	patien8	diseased	adult	usa
ch_hl_9	patien9	healthy	child	usa

2º: Create all target (comparison) files. The target comparison file is used in DEGenesHunter comparison. It consist in a table with 3 columns (sample name, replicate number and treatment). To generate target files it is recommended to use ./aux_parsers/generate_targets.rb. This script takes the experiment design table as input (-e), and it generate target by parsing a config string. The config string is given by -t flag and must be designed manually and it can generate all possible targets according filtering criteria (explanined later). The config string consist in information for one or more targets (divided by ';': "TargetA;TargetB"). In target information target name and conditions mus be indicated. Target name and conditions are separated by '>' and control and treatment features mus be indicated as 'feature_name:control_feature1/control_feature2,treatment_feature1,treatment_feature2'. Example. For healthy vs disease and child vs others comparisons according example experimen design table, the target string should be 'hl_vs_dis>clinical_status:healthy,diseased;ch_vs_other>age_group:child,adult/senior'.

Filtering criteria can also be used. You can select only samples that fulfill a specific feature using -f flag with an argument like "FEATURE_NAME=feature". For select only USA samples must be used -f "location=usa". This script also admit a blacklist file (one column table) with samples that will not be included in experiment using -b flag (This is very useful for reject noisy samples).

3º: Change name of config_daemon_default to config_daemon

4º: Customize all parametres from config_daemon (there you will find a more detailed description).