Address column mismatches
leeper opened this issue · 3 comments
leeper commented
Copied from gesistsa/rio#110 (@billdenney):
My method for generating .csvy files is via Perl, and the header output order may not match the file output order exactly.
It would be helpful if the fields were matched by column name and the fields 'name' value rather than simply in order.
I think this would just be a change to the following code in .import.rio_csvy (lines 122-124 of import_method.R, currently):
for (i in seq_along(y$fields)) {
attributes(out[, i]) <- y$fields[[i]]
}
becomes
already.matched <- rep(FALSE, ncol(out))
for (i in seq_along(y$fields)) {
idx.match <- (1:ncol(out))[names(out) %in% y$fields[[i]]$name]
if (length(idx.match) == 0) {
warning("Field name ", y$fields[[i]]$name, " is not found in the input file; please check your YAML header.")
} else if (length(idx.match) > 1) {
warning("Field name ", y$fields[[i]]$name, " is found more than once in the input file; please check your .csv header.")
} else if (already.matched[idx.match]) {
warning("Column ", idx.match, " already has a field name match; please check your YAML header.")
}
attributes(out[, idx.match]) <- y$fields[[i]]
}
leeper commented
@billdenney: I've just sent an update to GitHub for this. Can you try again using csvy directly? I'll push this into rio momentarily.
billdenney commented
This one is fixed!
But, the new version has some limitations on column naming. I'll open a new issue for that.
leeper commented
Great!