phenopackets/phenopacket-tools

ValidationItem

Closed this issue · 3 comments

@ielis @julesjacobsen
Currently, we have this

public interface ValidationItem {
    /**
     * @return basic description of the validator that produced this issue.
     */
    ValidatorInfo validatorInfo();
    // TODO - decide which enum to use here - either ErrorType or ValidationAspect
    ErrorTypeOLD errorType();
    String message()
}

We are replacing ErrorTypeOLD with an interface/class-based solution. I would suggest we consider adding something like context (We could add lines of a JSON file)

public interface ValidationErrorType {
    String name();
    String message();
}

Therefore, I would propose

public interface ValidationItem {
    ValidatorInfo validatorInfo();
    ValidationItem item()
}

We could also consider adding something to the ValidationItem interface that would allow us to capture the source of the error. For instance, a list of strings. Some of the validation will not be doing via the original phenopacket and thus it would cost a lot of extra work to show line numbers in general (e.g., ontology validators might point out pairs of inconsistent ontology terms).
On the other hand, for simplicity, I would suggest we do not support this right now -- this is an additional feature that we can add on after the basics are in place.

I think it would be more useful to have this interface

public interface ValidationItem {
    String validatorId();
    String validatorName();
    String category();
    String subcategory();
    String message();
}

that is, expose the individual fields of rather than requiring users to access the information via the objects. @ielis thoughts?

ielis commented

Source of the error

I think we should indeed save this for the future release.
My thoughts about reporting the source of error: the format depends on what do we choose as a reference. ValidationWorkflowRunner is provided with String, so we can choose to report errors by returning the int start and int end coordinates of the culprit region(s). The coordinates can be used to highlight the erroneous part(s), say in a similar way as we highlight concepts in text mining results.
Alternatively, we can refer to parts of the phenopacket hierarchy using the field names. Each phenopacket is a tree, hence, there exists a path from the root to the offending node. For instance:

  • missing phenopacket -> diseases entry for phenopacket -> medical_actions[3] -> treatment_intent meaning that the 3rd medical action intends to treat a disease X but the corresponding disease is not present in the diseases section.

ValidationItem hierarchy

I don't mind keeping hierarchy in ValidationItem, as long as the object's only purpose is to hold state. In this case, keeping the hierarchy can communicate which parts pertain to the validator (ValidatorInfo) and which to the actual validation result (ValidationErrorType).


PS: I'm not sure I'm getting the intent behind

public interface ValidationItem {
    ValidatorInfo validatorInfo();
    ValidationItem item();
}

ValidationItem contains another ValidationItem? Isn't that supposed to be the following?

public interface ValidationItem {
    ValidatorInfo validatorInfo();
    ValidationErrorType error();
}

refactored and discussed for 0.4.5