digital-preservation/csv-validator

Errors refering another column

filipegarcia opened this issue · 6 comments

I can't have the check if a value is in another column working

Explained in
http://digital-preservation.github.io/csv-schema/csv-schema-1.1.html#ex-14-example-14-a_column-in-some-string-the-value-of-a_column-must-be-a-substring-of-some-string-eg-some-or-string-or-me-st-etc-another_column-in-a_column-the-value-of-another_column-must-be-a-substring-of-the-contents-of-a_column

Example 14

a_column: in("some string")   //the value of a_column must be a substring of "some string" eg "some" or "string" or "me st" etc
another_column: in($a_column) //the value of another_column must be a substring of the contents of a_column

I'm trying something like

employee_id: notEmpty
employee_name: notEmpty
employee_manager_id: in($employee_id) or empty

is this feature implemented?

Yes, we frequently use this test. Would the employee_manager_id really be a substring of employee_id though? What relationship do you actually want between these fields?

I was trying is, any , in and even startsWith

schema.csvs

version 1.1
@totalColumns 3
@separator ','
employee_id: notEmpty
employee_name: notEmpty
employee_manager_id: any($employee_id) or empty

test.csv

employee_id,employee_name,employee_manager_id
1234, john,
2344, smith, 1234
4566, doe, 1234

and I'm running the cli validator

$validate test.csv schema.csvs
Error:   any($employee_id) or empty fails for line: 2, column: employee_manager_id, value: " 1234"
Error:   any($employee_id) or empty fails for line: 3, column: employee_manager_id, value: " 1234"
FAIL

I think it's because you have a space following each comma which is treated as part of the field value (see how the error message says value: " 1234").

Change your data to

employee_id,employee_name,employee_manager_id
1234,john,
2344,smith,1234
4566,doe,1234

and see what happens

I get the same thing,

$validate test.csv schema.csvs
Error:   any($employee_id) or empty fails for line: 2, column: employee_manager_id, value: "1234"
Error:   any($employee_id) or empty fails for line: 3, column: employee_manager_id, value: "1234"
FAIL

my real data is also clean, no spaces before or after.

I'm running CSV Validator - Command Line 1.1.5

Sorry, I should have noticed this the other day: the parser operates line by line of the CSV file, is does not refer across different lines, so this would only work if the employee_id and employee_manager_id were identical in each row. I sort of alluded to that in my first comment, but without spelling it out explicitly. You can't check that a value in one column occurs in another column elsewhere in the file. To do so would go against the basic principles in http://digital-preservation.github.io/csv-schema/csv-schema-1.1.html#principles

ohh I see, missed that one.
I'll take to another layer then.