Understand gene expression regulation with ROGER (Roche Omnibus of Gene Expression Regulation)
- Python 3 or greater
- Install Python 3 (https://www.python.org/downloads/) on your system or create an virtual environment
- Clone this repository to your preferred ROGER installation directory
git clone git@github.com:bedapub/roger.git
- Switch to the cloned root directory of ROGER and install ROGER through
pip
:
pip install -e .
- Create a configuration file called
.roger_config.cfg
in your user home directory and add your SQLAlchemy property there. An example configuration file could look like this:
# http://docs.sqlalchemy.org/en/latest/core/engines.html
SQLALCHEMY_DATABASE_URI="sqlite:///roger-schema.db"
ROGER_DATA_FOLDER="some_folder" #default is /tmp or other temporary directory
- Initialize ROGER with the following command:
roger init
- Populate ROGER with additional data by using its command line interface. See setup_example.sh for more information
- Install additional R packages:
- ribiosROGER, ribiosIO and ribiosExpression from RIBIOS
- Bioconductor and the limma package.
After installation, ROGER can only consume expression data from human. You can use the add-species
command add support for other
species in ROGER. For example, the following commands:
roger add-species "rnorvegicus_gene_ensembl" 10116
roger add-species "mmusculus_gene_ensembl" 10090
Would enable ROGER to consume data from rat and mouse as well. Internaly, ROGER will downlaod gene identifiers from the Ensembl BioMart service and assign them with internal identifiers to allow efficient indexing.
Every data set used for the following examples can be found in test_data.
-
Create a new microarray dataset:
roger add-ds-ma test_data/ds/ma-example-signals.gct 10090 affy_mouse430_2
This will add the normalized expression matrix "ma-example-signals.gct" to the database. ROGER will automatically annotate the feature names inside the expression matrix based on the given taxon id (here mouse) and feature symbol type (here AFFY Mouse430A 2 prope set). Additionally, ROGER will use the GCT file name as study name if nothing else is specified by parameter
--name
. You can useroger show-symbol-types 10090
to see a list of supported feature types / probe sets -
Add a design matrix
roger add-design "test_data/ds/ma-example-design.txt" ma-example-signals
Use
roger remove-design ma-example-design ma-example-signals
to remove the design matrix fromma-example-signals
-
Add a contrast matrix
roger add-contrast test_data/ds/ma-example-contrast.txt ma-example-design ma-example-signals
-
Execute limma on added data set:
roger run-dge-ma ma-example-contrast ma-example-design ma-example-signals
-
Execute CAMERA for limma:
roger run-gse ma-example-contrast ma-example-design ma-example-signals limma
-
Export results to files:
roger export-dge-table ma-example-contrast ma-example-design ma-example-signals limma output.txt roger export-gse-table ma-example-contrast ma-example-design ma-example-signals limma CAMERA output.txt
The DGE analysis for RNAseq data works similar compared to the DGE analysis for microarray data. Only the commands for importing data sets and running the DGE analysis are different, because these commands provide slightly different optional parameters
roger add-ds-rnaseq test_data/ds/rnaseq-example-readCounts.gct 9606 entrezgene
roger add-design test_data/ds/rnaseq-example-DesignMatrix.txt rnaseq-example-readCounts
roger add-contrast test_data/ds/rnaseq-example-ContrastMatrix.txt rnaseq-example-DesignMatrix rnaseq-example-readCounts
roger run-dge-rnaseq rnaseq-example-ContrastMatrix rnaseq-example-DesignMatrix rnaseq-example-readCounts
roger run-gse rnaseq-example-ContrastMatrix rnaseq-example-DesignMatrix rnaseq-example-readCounts edgeR
Use roger --help
to get a more detailed description of all available roger commands