Watts-Lab/surveyor

Improve robustness of CSV import

Closed this issue · 5 comments

Right now CSVs with unusual formatting can cause problems. As an example look at the history on demographics.csv where an export from Excel caused line endings and special characters, e.g., ... or ' to break rendering.

If we take default encoding in VS Code it seems to work fine, but we can't guarantee everyone will use good encoding. So the question is, can we add some kind of steps to the CSV parser that handles this stuff a bit better?

In particular, the messed up line endings issue meant that some of the variable names in the CSV were not correct, because name was the first variable it was invisibly being made incompatible with the rest of our code, and was messing everything up, e.g., inputs had no name tag.

I think this will need some regex work. Should this be handled on the server-side (when first importing the survey) or in survey.pug when parsing through each line? One would be more efficient while the other allows for more line-by-line checks.
@markwhiting

I think probably it should be a validation function on the server that just validates that everything is ok for a given survey. (it could actually do it on the internal server representation instead of the CSV specifically)

Standardize CSV formatting (seeing what the import CSV currently does), check libraries and behaviors. See what the docs don't check, and ensure that those are checked (eg: are the answers valid answer types)

@karansampath how's this going? We seem to be running into issues a lot (e.g., see the two mentions above)

Not complete as of now, but on hold until needed