Fix descending sort for character data
Closed this issue · 4 comments
krlmlr commented
options(conflicts.policy = list(warn = FALSE))
library(duckplyr)
Sys.setenv("DUCKPLYR_FORCE" = "TRUE")
data.frame(a = letters[1:3]) |>
as_duckplyr_df() |>
arrange(desc(a))
#> Error: Binder Error: No function matches the given name and argument types '-(VARCHAR)'. You might need to add explicit type casts.
#> Candidate functions:
#> -(TINYINT) -> TINYINT
#> -(TINYINT, TINYINT) -> TINYINT
#> -(SMALLINT) -> SMALLINT
#> -(SMALLINT, SMALLINT) -> SMALLINT
#> -(INTEGER) -> INTEGER
#> -(INTEGER, INTEGER) -> INTEGER
#> -(BIGINT) -> BIGINT
#> -(BIGINT, BIGINT) -> BIGINT
#> -(HUGEINT) -> HUGEINT
#> -(HUGEINT, HUGEINT) -> HUGEINT
#> -(FLOAT) -> FLOAT
#> -(FLOAT, FLOAT) -> FLOAT
#> -(DOUBLE) -> DOUBLE
#> -(DOUBLE, DOUBLE) -> DOUBLE
#> -(DECIMAL) -> DECIMAL
#> -(DECIMAL, DECIMAL) -> DECIMAL
#> -(UTINYINT) -> UTINYINT
#> -(UTINYINT, UTINYINT) -> UTINYINT
#> -(USMALLINT) -> USMALLINT
#> -(USMALLINT, USMALLINT) -> USMALLINT
#> -(UINTEGER) -> UINTEGER
#> -(UINTEGER, UINTEGER) -> UINTEGER
#> -(UBIGINT) -> UBIGINT
#> -(UBIGINT, UBIGINT) -> UBIGINT
#> -(DATE, DATE) -> BIGINT
#> -(DATE, INTEGER) -> DATE
#> -(TIMESTAMP, TIMESTAMP) -> INTERVAL
#> -(INTERVAL, INTERVAL) -> INTERVAL
#> -(DATE, INTERVAL) -> DATE
#> -(TIME, INTERVAL) -> TIME
#> -(TIMESTAMP, INTERVAL) -> TIMESTAMP
#> -(INTERVAL) -> INTERVAL
Created on 2024-02-04 with reprex v2.1.0
toppyy commented
I'm new to the project, but had a look at this and found the following:
- Currently sorting in descending order relies on manipulating the values being sorted, not defining the sort order. Specifically
desc
is defined as a macro that negates the value of it's parameters before being sorted in an ascending order. - While the above is fine for numeric types, this issue arises because you cannot negate varchar-type
- If I understood correctly,
arrange
ultimately callsrapi_rel_order
in the duckdb-r -package, which createsOrderByNode
:s for duckdb to process. Said nodes have an ascending sort order as a hardcoded value
Possible solutions:
- Use the current solution and define an macro that alters varchars so that they are sorted correctly. However, I cannot think of any way to do this?
- Extract and pass the sort order of each expression in sort to
rapi_rel_order
and use that information to create OrderByNodes with the correct sort order- Basically means that if the expression is not wrapped in a desc-function call, the sort order is ascending?
- It is not clear how to pass the information? I don't know if we can pass it within the duckdb expressions or would this require changing the signature of
rapi_rel_order
.
- Something completely different?
krlmlr commented
Thanks. Your second solution sounds like the way to go. Would you like to take a stab?
toppyy commented
Sure, I'd like to give it a go.
ATM I plan to:
- Create a PR for duckdb-r -package that changes the signature of
rapi_rel_order
so that it can accept a vector of boolean values describing the sort order (if true, sort ascending)- The vector defaults to true for all expressions
- After 1. is merged to duckdb-r -package, tackle this issue by altering
rel_order
so that it passes the sort order of each expression torapi_rel_order
krlmlr commented
Thanks, I forgot how much work this was. Happy to support you here.