LUMC/pytest-workflow

Add hook for custom tests in test yaml

sstrong99 opened this issue ยท 5 comments

it would be convenient if the custom test infrastructure interfaced with the test.yml file so that certain files could be validated using custom test code rather than one of the built in validation methods. This can already be achieved with the custom test infrastructure, but that requires encoding information about the desired test results in two separate places (in the custom test and in the test.yml. If the test yml could hook custom tests then all the expected output information could be kept in the same place. For example:

test.yml

- name: ...
  command: ...
  files:
    - path: "file1.xyz"
      custom_validation:
        - test_method: "my_custom_validator"
          arg1: "val1"
          ...
        - test_method: "another_custom_validator"
          another_arg: "some_value"

tests/test_custom_validators.py

import pytest

@pytest.mark.custom_workflow_validator("my_custom_validator"):
def test_xyz(filename, arg1):
    # this could for example remove a certain line from filename that contains a timestamp and then 
    # check the checksum of the remaining contents against an md5sum, which would be passed in via arg1

we do a lot of custom tests to do file diffs which are more informative than a simple md5 checksum. I did some brainstorming a while ago that was along the line of what you're thinking here although maybe not quite as general.

I like your idea of using the built-in pytest mechanisms to mark or "register" these custom tests, so that pytest-workflow doesn't have to do the job of figuring out where these things live.

I do appreciate suggestions such as this. The custom test interface is a bit clunky indeed as it uses the workflow name to figure out the directory and nothing more.

However, this code has to work within technical constraints as well. The major technical constraint here is the pytest API. It is not easy to flow information from the YAML to the "mark" interpreter. The project is already quite complex as it is. Complexity breeds bugs and having the test framework as bug-free as possible is absolutely essential. A test framework that throws errors on its own accord is much less useful as a good testing tool.

Custom validators are probably only a small percentage of your workflow tests. So while I think it would be convenient to have a feature such as this, I don't think it would merit the technical complexity that it adds to the project.

I have been thinking a bit about this. How about something like this:

- name: ...
  command: ...
  files:
    - path: "file1.xyz"
  custom_tests:
    - mytestfunc
    - myothertestfunc

Test funcs should be defined in some file. The test func will be called with the following arguments: workflow_dir, workflow_test_meta

  • workflow_dir, the temporary directory where the workflow is called.
  • workflow_test_meta a dictionary containing all the information in the workflow test yaml for that particular workflow. That includes the command, the files etc.

This leaves the YAML simple, it also explicitly couples workflow tests to defined functions. It is much less clunky than using the pytest.mark.workflow hook.

The implementation is not going to be trivial though.

@DavyCats @Redmar-van-den-Berg also curious to your thoughts on this issue.

I'm not sure this would be an improvement. In the current situation, the custom tests get the workflow name, and you have to go search in the .yml files to find which workflow they refer to.
But in the proposal, the link between the workflow and the custom tests are defined directly in the .yml files (yay), but now you have to go search for the tests in the .py files. So there still isn't really an explicit link, since you don't know where (or if!) these python functions have been defined.

I quite like the current system, it is very straightforward. You get the workflow directory in your custom tests, and from there you can have your tests do whatever you want. Especially since the custom tests no longer run if the workflow itself fails, you can always trust that all expected files will exist.

@Redmar-van-den-Berg thank you for your observations. You are right that this solution merely moves the problem rather than solving it. So I won't implement that particular solution.