Reading files with float values
Closed this issue · 2 comments
What would be the Tidier equivalent of this tidyverse code? The second column contains "NA", and values like 123423.6.
read_pheno <- function(col, phenofile){
read_tsv(phenofile) |>
select(ID = sona_id, pheno = col)
}
Thanks @ymer. This is totally doable, but right now we don't have file read-write capabilities built into Tidier. I've been waiting to see whether it's worth trying to build this, but I'm increasingly convinced that it's worth it after playing with existing packages' APIs.
For now, we can rely on the CSV.jl package.
Here's how we would define the function:
using TidierData, CSV
function read_pheno(col, phenofile)
@chain begin
CSV.read(phenofile, DataFrame; missingstring = "NA")
@select(!!col)
end
end
And here's how we would call it, assuming that we want to select a column named x
and a filename named filename
.
read_pheno(:x, filename)
Notice that when we call the function, we refer to the column as a symbol :x
. Inside of @select()
, the symbol :x
is converted to the bare column name x
by using the bang-bang !!
operator. Also, notice that the @chain
block can start directly with a begin
rather than a data frame, which is nice for situations where the first part of the chain is a long line of code.
Also, CSV.jl can auto-detect the delimiter, but if you wanted to be really specific, you could include the delimiter like this: CSV.read(phenofile, DataFrame; delim = '\t', missingstring = "NA")
.
Going to close this issue for now. Eventually, I hope to add a TidierFiles.jl package which is intended for reading and writing data to files.