Errors refering another column
filipegarcia opened this issue · 6 comments
I can't have the check if a value is in another column working
Example 14
a_column: in("some string") //the value of a_column must be a substring of "some string" eg "some" or "string" or "me st" etc
another_column: in($a_column) //the value of another_column must be a substring of the contents of a_column
I'm trying something like
employee_id: notEmpty
employee_name: notEmpty
employee_manager_id: in($employee_id) or empty
is this feature implemented?
Yes, we frequently use this test. Would the employee_manager_id really be a substring of employee_id though? What relationship do you actually want between these fields?
I was trying is
, any
, in
and even startsWith
schema.csvs
version 1.1
@totalColumns 3
@separator ','
employee_id: notEmpty
employee_name: notEmpty
employee_manager_id: any($employee_id) or empty
test.csv
employee_id,employee_name,employee_manager_id
1234, john,
2344, smith, 1234
4566, doe, 1234
and I'm running the cli validator
$validate test.csv schema.csvs
Error: any($employee_id) or empty fails for line: 2, column: employee_manager_id, value: " 1234"
Error: any($employee_id) or empty fails for line: 3, column: employee_manager_id, value: " 1234"
FAIL
I think it's because you have a space following each comma which is treated as part of the field value (see how the error message says value: " 1234").
Change your data to
employee_id,employee_name,employee_manager_id
1234,john,
2344,smith,1234
4566,doe,1234
and see what happens
I get the same thing,
$validate test.csv schema.csvs
Error: any($employee_id) or empty fails for line: 2, column: employee_manager_id, value: "1234"
Error: any($employee_id) or empty fails for line: 3, column: employee_manager_id, value: "1234"
FAIL
my real data is also clean, no spaces before or after.
I'm running CSV Validator - Command Line 1.1.5
Sorry, I should have noticed this the other day: the parser operates line by line of the CSV file, is does not refer across different lines, so this would only work if the employee_id and employee_manager_id were identical in each row. I sort of alluded to that in my first comment, but without spelling it out explicitly. You can't check that a value in one column occurs in another column elsewhere in the file. To do so would go against the basic principles in http://digital-preservation.github.io/csv-schema/csv-schema-1.1.html#principles
ohh I see, missed that one.
I'll take to another layer then.