leeper/csvy

Address column mismatches

leeper opened this issue · 3 comments

Copied from gesistsa/rio#110 (@billdenney):

My method for generating .csvy files is via Perl, and the header output order may not match the file output order exactly.

It would be helpful if the fields were matched by column name and the fields 'name' value rather than simply in order.

I think this would just be a change to the following code in .import.rio_csvy (lines 122-124 of import_method.R, currently):

for (i in seq_along(y$fields)) {
    attributes(out[, i]) <- y$fields[[i]]
}

becomes

already.matched <- rep(FALSE, ncol(out))
for (i in seq_along(y$fields)) {
  idx.match <- (1:ncol(out))[names(out) %in% y$fields[[i]]$name]
  if (length(idx.match) == 0) {
    warning("Field name ", y$fields[[i]]$name, " is not found in the input file; please check your YAML header.")
  } else if (length(idx.match) > 1) {
    warning("Field name ", y$fields[[i]]$name, " is found more than once in the input file; please check your .csv header.")
  } else if (already.matched[idx.match]) {
    warning("Column ", idx.match, " already has a field name match; please check your YAML header.")
  }
  attributes(out[, idx.match]) <- y$fields[[i]]
}

@billdenney: I've just sent an update to GitHub for this. Can you try again using csvy directly? I'll push this into rio momentarily.

This one is fixed!

But, the new version has some limitations on column naming. I'll open a new issue for that.

Great!