min_single_sample_expr Filter
Closed this issue · 3 comments
orionzhou commented
It seems although the default min_single_sample_expr threshold is 0.3, the code is actually filtering the dataset using threshold 5. See program log and summary below.
[LOG] Wed Dec 14 17:08:07 2016 - ------------Quality Control
[LOG] Wed Dec 14 17:08:07 2016 - Raw Starting set: 26851 genes 23 accessions
[LOG] Wed Dec 14 17:08:43 2016 - Found out 0 genes not in Reference Genome: maize - AGPv4 - maize
[LOG] Wed Dec 14 17:08:43 2016 - Filtering expression values lower than 0
[LOG] Wed Dec 14 17:08:46 2016 - Found 0 genes with > 0.2 missing data
[LOG] Wed Dec 14 17:08:50 2016 - Found 3340 genes which do not have one sample above 5
[LOG] Wed Dec 14 17:08:52 2016 - Found 0 accessions with > 0.3 missing data
[LOG] Wed Dec 14 17:08:52 2016 - Genes passing QC:
has_id 26851
pass_membership 26851
pass_missing_data 26851
pass_min_expression 23511
PASS_ALL 23511
COB Dataset: grn23
Desc: 1.0
RawType: RNASEQ
TransformationLog: raw->quality_control->arcsinh
Num Genes: 23,511(88%)
Num Accessions: 23
Num Edges: 276,371,805
Raw
------------------
Num Raw Genes: 26,851
Num Raw Accessions: 23
QC Parameters
------------------
min expr level: 0
- expression below this is set to NaN
max gene missing data: 0.2
- genes missing more than this percent are removed
max accession missing data: 5
- Accession missing more than this percent are removed
min single sample expr: 0.3
- genes must have at least this amount of expression in
on accession
schae234 commented
Hey @orionzhou, this was a bug in the summary function. It should have reported max accession missing data as 0.3 and min single sample expr as 5. The actual filters are correct, they just got swapped in the print function.
schae234 commented
Also, this is fixed in the dev branch.
schae234 commented
The dev branch was merged with the master branch.