multimeric/PandasSchema

Distinct across multiple columns

Maarten-vd-Sande opened this issue ยท 7 comments

I am not sure if this is supported currently, or how to implement it as custom validator.

I want distinct values across two columns, so that this is okay:

sample    value
1         2
2         2

But this is not:

sample    value
1         2
1         2

Unfortunately not, the design of pandas_schema 0.X.X is such that every validation is on a per-column basis. This will be fixed in 1.X.X, and indeed I have a demonstration of this behaviour here: https://github.com/TMiguelT/PandasSchema/blob/9452513fbd2f58acc6ca8c3ff94062b07f3f7ffd/test/test_df_validations.py#L50-L61.

But who knows when that will be released., because it's been hard to find the time to finish it.

Great this is already in the works! But then I already have a feature request for it to work on a subset of columns ๐Ÿ˜‡ . Seems like the current implementation does not support this right?

If you want (and if I have time, not too soon), I could start a PR for this

Ah I see. Just to be sure I understand: I then would use DistinctRowValidation and use validate on a list of columns?

Feel free to close the issue (either now or with the new release). Thanks for all your help ๐Ÿ‘

Right, so you want unique rows but across a subset of the columns. I don't think you can currently do that in the future release but I'll look into it.

I'll keep the issue open since it's still not solved in a release.

Closing in favour of the more general #57 that I just opened.