more flexible file designation for sailfish
Closed this issue · 6 comments
Trying to import sailfish results that aren't in the typical dir/quant.sf
and dir/stats.tsv
convention, because I've moved/renamed some things:
> sf <- tximport("data/sailfish-txps.txt", type="salmon", gene2tx=grch38_gt)
reading in files
1 Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'data/stats.tsv': No such file or directory
Coming from this bit of code:
tmp2 <- read.table(file.path(dirname(x), "stats.tsv"),
tximport is strictly looking in the same path as the results file for stats.tsv
. Suggest allowing explicitly specifying this file as an argument, but default back to this?
Yeah ill have to rework this
I just commented it out for now
Moving forward I may be able to help simplify this. The reason for the (somewhat tortured) structure of the output (e.g. effective lengths being in a separate file) was to maintain backward compatibility with prior versions of Sailfish & Salmon. However, if it would be helpful here (and / or in other contexts), starting in the next release I'd be willing to break backward compatibility of the output format and put the effective length (and any other useful information) directly into the quant.sf
file. Also, I think it might be useful to remove the comment character #
in front of the line that names the columns so that they can be read in more easily with the typical tools (e.g. read.table
and pandas.read_table
). The default could then be to read the effective lengths directly from the quant.sf
file and fallback to this strategy if e.g. the input is from an older version of Sailfish or Salmon. Thoughts?
+1 for both of those suggestions.
+1 for both as well.
Simplifying here will help me with something else i want to do, which is make it easy for users to swap in readr::read_table which is 50x faster
I was also going to suggest readr. Faster, no stringsAsFactors, tbl_df goodness, etc.
I think we're set here. Rather than go looking for stats.tsv, the effective length will be in the quant.sf file for future versions of Sailfish/Salmon, and tximport will now autodetect if its a old or new version.