Distinct across multiple columns
Maarten-vd-Sande opened this issue ยท 7 comments
I am not sure if this is supported currently, or how to implement it as custom validator.
I want distinct values across two columns, so that this is okay:
sample value
1 2
2 2
But this is not:
sample value
1 2
1 2
Unfortunately not, the design of pandas_schema
0.X.X is such that every validation is on a per-column basis. This will be fixed in 1.X.X, and indeed I have a demonstration of this behaviour here: https://github.com/TMiguelT/PandasSchema/blob/9452513fbd2f58acc6ca8c3ff94062b07f3f7ffd/test/test_df_validations.py#L50-L61.
But who knows when that will be released., because it's been hard to find the time to finish it.
Great this is already in the works! But then I already have a feature request for it to work on a subset of columns ๐ . Seems like the current implementation does not support this right?
If you want (and if I have time, not too soon), I could start a PR for this
The current (and future) releases support this: https://tmiguelt.github.io/PandasSchema/#pandas_schema.schema.Schema.validate
Ah I see. Just to be sure I understand: I then would use DistinctRowValidation
and use validate on a list of columns?
Feel free to close the issue (either now or with the new release). Thanks for all your help ๐
Right, so you want unique rows but across a subset of the columns. I don't think you can currently do that in the future release but I'll look into it.
I'll keep the issue open since it's still not solved in a release.
Closing in favour of the more general #57 that I just opened.