TidierOrg/TidierData.jl

Reading files with float values

Closed this issue · 2 comments

ymer commented

What would be the Tidier equivalent of this tidyverse code? The second column contains "NA", and values like 123423.6.

read_pheno <- function(col, phenofile){
       read_tsv(phenofile) |>
           select(ID = sona_id, pheno = col)
  }

Thanks @ymer. This is totally doable, but right now we don't have file read-write capabilities built into Tidier. I've been waiting to see whether it's worth trying to build this, but I'm increasingly convinced that it's worth it after playing with existing packages' APIs.

For now, we can rely on the CSV.jl package.

Here's how we would define the function:

using TidierData, CSV

function read_pheno(col, phenofile)
    @chain begin
        CSV.read(phenofile, DataFrame; missingstring = "NA")
        @select(!!col)
    end
end

And here's how we would call it, assuming that we want to select a column named x and a filename named filename.

read_pheno(:x, filename)

Notice that when we call the function, we refer to the column as a symbol :x. Inside of @select(), the symbol :x is converted to the bare column name x by using the bang-bang !! operator. Also, notice that the @chain block can start directly with a begin rather than a data frame, which is nice for situations where the first part of the chain is a long line of code.

Also, CSV.jl can auto-detect the delimiter, but if you wanted to be really specific, you could include the delimiter like this: CSV.read(phenofile, DataFrame; delim = '\t', missingstring = "NA").

Going to close this issue for now. Eventually, I hope to add a TidierFiles.jl package which is intended for reading and writing data to files.