duckdblabs/duckplyr

Error evaluating duckdb query: Not implemented Error: Sorting is not supported on big endian architectures [ FAIL 194 | WARN 1 | SKIP 321 | PASS 972 ]

Closed this issue · 12 comments

Quite a number of tests fail on BE with identical errors:

══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test-as_duckplyr_df.R:32:3'): as_duckplyr_df() and anti_join(join_by(a)) ──
Error: Error evaluating duckdb query: TransactionContext Error: cannot start a transaction within a transaction
Backtrace:
    ▆
 1. └─testthat::expect_equal(pre, post) at test-as_duckplyr_df.R:32:2
 2.   └─testthat:::expect_waldo_equal("equal", act, exp, info, ..., tolerance = tolerance)
 3.     └─testthat:::waldo_compare(...)
 4.       └─waldo::compare(x, y, ..., x_arg = x_arg, y_arg = y_arg)
 5.         └─waldo:::compare_structure(x, y, paths = c(x_arg, y_arg), opts = opts)
 6.           └─waldo:::is_identical(x, y, opts)
── Error ('test-as_duckplyr_df.R:57:3'): as_duckplyr_df() and arrange(a) ───────
Error: Error evaluating duckdb query: Not implemented Error: Sorting is not supported on big endian architectures
Backtrace:
    ▆
 1. └─testthat::expect_equal(pre, post) at test-as_duckplyr_df.R:57:2
 2.   └─testthat:::expect_waldo_equal("equal", act, exp, info, ..., tolerance = tolerance)
 3.     └─testthat:::waldo_compare(...)
 4.       └─waldo::compare(x, y, ..., x_arg = x_arg, y_arg = y_arg)
 5.         └─waldo:::compare_structure(x, y, paths = c(x_arg, y_arg), opts = opts)
 6.           └─waldo:::is_identical(x, y, opts)

In addition, these fail:

── Error ('test-relational-duckdb.R:31:3'): duckdb_rel_from_df() ───────────────
Error: TransactionContext Error: Current transaction is aborted (please ROLLBACK)
Backtrace:
     ▆
  1. ├─testthat::expect_silent(duckdb_rel_from_df(df)) at test-relational-duckdb.R:31:2
  2. │ └─testthat:::quasi_capture(enquo(object), NULL, evaluate_promise)
  3. │   ├─testthat (local) .capture(...)
  4. │   │ ├─withr::with_output_sink(...)
  5. │   │ │ └─base::force(code)
  6. │   │ ├─base::withCallingHandlers(...)
  7. │   │ └─base::withVisible(code)
  8. │   └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
  9. └─duckplyr::duckdb_rel_from_df(df)
 10.   └─duckdb$rel_from_df(con, df, experimental = experimental)
 11.     └─duckdb:::rapi_rel_from_df(con@conn_ref, as.data.frame(df), experimental)
── Error ('test-relational-duckdb.R:46:3'): duckdb_rel_from_df() and changing column names ──
Error: TransactionContext Error: Current transaction is aborted (please ROLLBACK)
Backtrace:
    ▆
 1. ├─data.frame(a = 1) %>% duckplyr_select(a) at test-relational-duckdb.R:46:2
 2. └─duckplyr:::duckplyr_select(., a)
 3.   ├─dplyr::select(.data, ...)
 4.   └─duckplyr:::select.duckplyr_df(.data, ...)
 5.     ├─duckplyr:::rel_try(...)
 6.     └─duckplyr::duckdb_rel_from_df(.data)
 7.       └─duckdb$rel_from_df(con, df, experimental = experimental)
 8.         └─duckdb:::rapi_rel_from_df(con@conn_ref, as.data.frame(df), experimental)
── Error ('test-relational-duckdb.R:60:3'): rel_aggregate() ────────────────────
<packageNotFoundError/error/condition>
Error in `loadNamespace(x)`: there is no package called 'palmerpenguins'
Backtrace:
     ▆
  1. ├─... %>% ... at test-relational-duckdb.R:60:2
  2. ├─duckplyr::rel_aggregate(., list(expr_species), list(expr_aggregate))
  3. ├─duckplyr::duckdb_rel_from_df(.)
  4. │ ├─base::stopifnot(is.data.frame(df))
  5. │ └─base::is.data.frame(df)
  6. ├─duckplyr::as_duckplyr_df(.)
  7. ├─dplyr::mutate(., sex = as.character(sex))
  8. ├─dplyr::mutate(., island = as.character(island))
  9. ├─dplyr::mutate(., species = as.character(species))
 10. └─base::loadNamespace(x)
 11.   └─base::withRestarts(stop(cond), retry_loadNamespace = function() NULL)
 12.     └─base (local) withOneRestart(expr, restarts[[1L]])
 13.       └─base (local) doWithOneRestart(return(expr), restart)
── Error ('test-relational-duckdb.R:95:3'): duckdb_rel_from_df() uses materialized results ──
Error: TransactionContext Error: Current transaction is aborted (please ROLLBACK)
Backtrace:
    ▆
 1. ├─data.frame(a = 1) %>% duckplyr_filter(a == 1) at test-relational-duckdb.R:95:2
 2. └─duckplyr:::duckplyr_filter(., a == 1)
 3.   ├─dplyr::filter(.data, ...)
 4.   └─duckplyr:::filter.duckplyr_df(.data, ...)
 5.     ├─duckplyr:::rel_try(...)
 6.     └─duckplyr::duckdb_rel_from_df(.data)
 7.       └─duckdb$rel_from_df(con, df, experimental = experimental)
 8.         └─duckdb:::rapi_rel_from_df(con@conn_ref, as.data.frame(df), experimental)
── Failure ('test-relocate.R:109:3'): attributes of bare data frames are retained (#6341) ──
attr(out, "foo") (`actual`) not identical to "bar" (`expected`).

`actual` is NULL
`expected` is a character vector ('bar')
── Failure ('test-select.R:154:3'): duckplyr_select() keeps attributes of raw data frames (#5831) ──
attr(duckplyr_select(df, x), "a") (`actual`) not equal to "b" (`expected`).

`actual` is NULL
`expected` is a character vector ('b')

[ FAIL 194 | WARN 1 | SKIP 321 | PASS 972 ]

testthat.Rout.fail.txt

P. S. palmerpenguins error can be ignored, it is not installed due to this: allisonhorst/palmerpenguins#96

Thanks. We would need a way to run tests on this architecture on GitHub Actions to address these and similar issues sustainably. What's the best way?

Thanks. We would need a way to run tests on this architecture on GitHub Actions to address these and similar issues sustainably. What's the best way?

@krlmlr Thank you for responding.

Given that the error appears to be specific to Big-endian platforms in general, presumably it should be reproducible on Linux or FreeBSD – PowerPC BE or SPARC. Of these, I believe, Linux should be rather common (of major ones Debian and Gentoo support/supported PPC BE, likely more).
Not really sure if GitHub Actions support any of those though.

(Alternatively, it can be reproduced locally on macOS 10.6.8 Server in Rosetta (i.e. on any modern Intel Mac in VM). However, I mention this just for information; it is not a convenient way by any means, unless one specifically is interested in macOS on PowerPC.)

We would have to be able to test this on GitHub Actions to support it. Can qemu emulate big-endian platforms? What's the performance hit?

Even if this works, this looks like a major effort where we will likely need support for.

Qemu certainly does. Running Big-endian Linux on it should be perfectly fine.

Can you show an example GHA workflow that does this?

@krlmlr Will something like this work? https://til.simonwillison.net/docker/emulate-s390x-with-qemu

Also:
wasm3/wasm3#125
google/flatbuffers#4939

So yes, apparently GHA can be used indeed.

Thanks. Actions now running in https://github.com/krlmlr/wasm3/actions/runs/6525677714 to understand how they work.

This should really be filed with duckdb mainline, but DuckDB does currently not support all operations on Big-Endian architectures, and given current hardware trends it's unlikely to add support anytime soon. Happy to review a PR on this of course.

@hannes If you could suggest what specifically should be fixed in the sources, that will be helpful.

It looks like there are two places where this is not supported, sorting and indexing:

https://github.com/search?q=repo%3Aduckdb%2Fduckdb+Radix%3A%3AIsLittleEndian%28%29&type=code

Out of curiosity, which hardware are you trying to run DuckDB on?

@hannes The main testing machine is G5 Quad, it is pretty fast and got plenty of RAM. I have a few other PowerPC machines, but G4s are slow and pain to use for intensive tasks. So basically Quad and another PowerMac, 2.3 DC, are usable.

@hannes I have opened the issue with duckdb now: duckdb/duckdb#9714
Hopefully it can be fixed in the source.