LinkageIO/Camoco

min_single_sample_expr Filter

Closed this issue · 3 comments

It seems although the default min_single_sample_expr threshold is 0.3, the code is actually filtering the dataset using threshold 5. See program log and summary below.

[LOG] Wed Dec 14 17:08:07 2016 - ------------Quality Control
[LOG] Wed Dec 14 17:08:07 2016 - Raw Starting set: 26851 genes 23 accessions
[LOG] Wed Dec 14 17:08:43 2016 - Found out 0 genes not in Reference Genome: maize - AGPv4 - maize
[LOG] Wed Dec 14 17:08:43 2016 - Filtering expression values lower than 0
[LOG] Wed Dec 14 17:08:46 2016 - Found 0 genes with > 0.2 missing data
[LOG] Wed Dec 14 17:08:50 2016 - Found 3340 genes which do not have one sample above 5
[LOG] Wed Dec 14 17:08:52 2016 - Found 0 accessions with > 0.3 missing data
[LOG] Wed Dec 14 17:08:52 2016 - Genes passing QC:
has_id                 26851
pass_membership        26851
pass_missing_data      26851
pass_min_expression    23511
PASS_ALL               23511
            COB Dataset: grn23
                Desc: 1.0
                RawType: RNASEQ
                TransformationLog: raw->quality_control->arcsinh
                Num Genes: 23,511(88%)
                Num Accessions: 23
                Num Edges: 276,371,805

            Raw
            ------------------
            Num Raw Genes: 26,851
            Num Raw Accessions: 23

            QC Parameters
            ------------------
            min expr level: 0
                - expression below this is set to NaN
            max gene missing data: 0.2
                - genes missing more than this percent are removed
            max accession missing data: 5
                - Accession missing more than this percent are removed
            min single sample expr: 0.3
                - genes must have at least this amount of expression in
                  on accession

Hey @orionzhou, this was a bug in the summary function. It should have reported max accession missing data as 0.3 and min single sample expr as 5. The actual filters are correct, they just got swapped in the print function.

Also, this is fixed in the dev branch.

The dev branch was merged with the master branch.