canimus/cuallee

[JOSS Review] duckdb v0.9.2 does not work

FlorianK13 opened this issue · 8 comments

Describe the bug
With duckdb v0.9.2 I get the error:
Exception: Cuallee is not ready for this data structure. You can log a Feature Request in Github.

To Reproduce
Steps to reproduce the behavior:

  1. Install duckdb v0.9.2
  2. Read the taxi parquet file to duckdb and pandas
  3. Define some check
  4. Make sure that the check runs sucessfully for the pandas df
  5. Try to run the check with duckdb, get the error

Desktop (please complete the following information):

  • OS: Windows
  • python: 3.11.9
  • cuallee: 0.10.2

This issue is part of a JOSS Review

Hi @FlorianK13 the README.md had the wrong reference about the version supported in cuallee.
The pyproject.toml however, had and has the right dependency version and compatibility pointing to 0.10.2
Upgrading your version of DuckDB or installing an earlier version of cuallee that supported the earlier version will resolve your issue.
Fixed by #230

Ok, that also works for me. However when using df = duckdb.read_parquet("path/to/file.parquet") this gives me an error when I want to validate this df with a check.
I think it is ok if you do not maintain multiple data formats for the different frameworks. However, could you add example scripts for the different frameworks to your documentation page? So that people know that e.g. for duckdb they need to use the duckdb.connect syntax etc.

I also have problems running the check for daft where a template in the documentation could help a lot.

Thanks for the heads up @FlorianK13 . We purposely left out the Daft implementation in the paper because it has less than a year of use, though it was a commendable community contribution. We focused on six frameworks in our paper because I have professional experience with these and can confidently discuss the use of cuallee at scale, as well as its robustness and resilience within these frameworks.

For the JOSS submission, I believe covering the six highlighted frameworks—PySpark, Snowpark, Pandas, Polars, DuckDB, and BigQuery—already represents a significant effort. None of these implementations were trivial, and we only included those we considered mature and well-tested.

Given this context, I am open to addressing this issue but feel it is outside the scope of our current paper submission.

Agree?

Reference:

Fixed by #234

Hi @canimus, I was only testing the frameworks listed in the README, that's why I created this issue. I agree in the fact that the integration of several frameworks to a large number of test cases is not trivial.

I also saw that you have added the supported frameworks with examples to the documentation. From my opinion, this makes cuallee much easier to use for new users.

@FlorianK13 thank you very much for your attention to detail. Very much appreciated it. And very happy and glad with your observations. In the end, as you said we aim to support a long living user community and clarity is paramount of that. Have a great rest of your week.