SeaQL/sea-query

Bug in logic for dropping parentheses

Closed this issue ยท 18 comments

Description

From https://github.com/SeaQL/sea-query/blob/master/src/expr.rs:

pub(crate) fn need_parentheses(&self) -> bool {
        match self {
            Self::Binary(left, oper, _) => !matches!(
                (left.as_ref(), oper),
                (Self::Binary(_, BinOper::And, _), BinOper::And)
                    | (Self::Binary(_, BinOper::Or, _), BinOper::Or)
            ),
            _ => false,
        }
    }

This logic appears flawed, since the expression
(("a" OR "b") OR "c") is determined to not need a parenthesis.
But it clearly does some times:
((("a" OR "b") OR "c") AND "d")

In binary_expr() (a method of QueryBuilder), need_parenthesis()=false is sufficient to drop them, but SimpleExpr really does not have enough information to be able to decide that.

Steps to Reproduce

mod tests {
    use crate::backend::CommonSqlQueryBuilder;
    use crate::{Alias, BinOper, ColumnRef, DynIden, IntoIden, QueryBuilder, SimpleExpr};

    #[test]
    fn test_parentheses() {
        let qb = CommonSqlQueryBuilder {};
        let mut sql_string = "".to_string();
        let a = SimpleExpr::Column(ColumnRef::Column(Alias::new("a").into_iden()));
        let b = SimpleExpr::Column(ColumnRef::Column(Alias::new("b").into_iden()));
        let c = SimpleExpr::Column(ColumnRef::Column(Alias::new("c").into_iden()));
        let d = SimpleExpr::Column(ColumnRef::Column(Alias::new("d").into_iden()));

        let x_op_y = |x,op,y| SimpleExpr::Binary(Box::new(x), op, Box::new(y));
        let a_or_b = x_op_y(a,BinOper::Or, b);
        let a_or_b_or_c = x_op_y(a_or_b,BinOper::Or, c);
        let a_or_b_or_c_and_d = x_op_y(a_or_b_or_c.clone(),BinOper::And, d);
        println!("Needs parentheses: {}", a_or_b_or_c.need_parentheses());
        qb.prepare_simple_expr(&a_or_b_or_c_and_d, &mut sql_string);

        println!("Sql: {}", sql_string);
    }
}

Outputs:
Needs parentheses: false
Sql: "a" OR "b" OR "c" AND "d"
But should be ("a" OR "b" OR "c") AND "d" or equivalent.

Expected Behavior

Expect parentheses to preserve semantics of expressions.

Actual Behavior

Parentheses are eliminated in a way that changes the semantics of the expression.

Reproduces How Often

Deterministic.

Versions

v0.30.0
DB / OS not relevant.

Additional Information

See test case above.

This was the reason the Condition API was introduced, and is the recommended way to build complex / mixed conditional expressions.

The reason () was (attempted to be) omitted is we wanted to make it more readable, i.e. to not grow into ((((((a or b) or c) ...

However as you pointed out, it may be non-trivial to do it properly.

Thanks for getting back to me quickly!
Yes, I think it might be a bit tricky to solve generally.
The nested or / and are however pretty much fixed by the conditions for the left hand side parenthesis in binary_expr():

&& left.is_binary()
&& op != left.get_bin_oper().unwrap()

Which appears sound - most builtin binary operations are parsed in a left associative way in PostgreSQL.
https://www.postgresql.org/docs/15/sql-syntax-lexical.html
But custom operators, which appear to be supported by SeaQL might be right associative. (Edit: this was a bit of confusion on my part, we are looking here at the case where the placing left-associative parens vs. not doing so, and postgresql docs states that it will parse custom operators left associative aswell. So not a matter of mathematical associativity.)

Also, the comparison operators are nonassociative, so you have to keep the parentheses. See the additional bug below.
I think the simple straight and narrow approach is best here, initially requiring parentheses and only introducing rules that are known to be sound based on the left- and right associativity of the operator in question.

#[test]
    fn test_parentheses2() {
        let qb = CommonSqlQueryBuilder {};
        let mut sql_string = "".to_string();
        let a = SimpleExpr::Column(ColumnRef::Column(Alias::new("a").into_iden()));
        let b = SimpleExpr::Column(ColumnRef::Column(Alias::new("b").into_iden()));
        let c = SimpleExpr::Column(ColumnRef::Column(Alias::new("c").into_iden()));

        let x_op_y = |x,op,y| SimpleExpr::Binary(Box::new(x), op, Box::new(y));
        let a_smaller_b = x_op_y(a,BinOper::SmallerThan, b);
        let a_smaller_b_smaller_c= x_op_y(a_smaller_b, BinOper::SmallerThan, c);
        qb.prepare_simple_expr(&a_smaller_b_smaller_c, &mut sql_string);

        println!("Sql: {}", sql_string);
    }

Produces:
Sql: "a" < "b" < "c"
But this is something like
("a" < "b") and ("b" < "c")

I could make a pull request if you like? Think I have an easy solution.

The handling of unary expressions also has a serious bug:

#[test]
fn test_parentheses3() {
    let qb = CommonSqlQueryBuilder {};
    let mut sql_string = "".to_string();
    let a = SimpleExpr::Column(ColumnRef::Column(Alias::new("a").into_iden()));
    let b = SimpleExpr::Column(ColumnRef::Column(Alias::new("b").into_iden()));
    let c = SimpleExpr::Column(ColumnRef::Column(Alias::new("c").into_iden()));

    let op_x = |op,x| SimpleExpr::Unary(op, Box::new(x));
    let x_op_y = |x,op,y| SimpleExpr::Binary(Box::new(x), op, Box::new(y));
    let a_and_b = x_op_y(a, BinOper::And, b);
    let not_a_and_b = op_x(UnOper::Not, a_and_b);
    qb.prepare_simple_expr(&not_a_and_b, &mut sql_string);

    println!("Sql 3: {}", sql_string);
}

#[test]
fn test_parentheses4() {
    let qb = CommonSqlQueryBuilder {};
    let mut sql_string = "".to_string();
    let a = SimpleExpr::Column(ColumnRef::Column(Alias::new("a").into_iden()));
    let b = SimpleExpr::Column(ColumnRef::Column(Alias::new("b").into_iden()));
    let c = SimpleExpr::Column(ColumnRef::Column(Alias::new("c").into_iden()));

    let op_x = |op,x| SimpleExpr::Unary(op, Box::new(x));
    let x_op_y = |x,op,y| SimpleExpr::Binary(Box::new(x), op, Box::new(y));
    let not_a = op_x(UnOper::Not, a);
    let not_a_and_b = x_op_y(not_a, BinOper::And, b);
    qb.prepare_simple_expr(&not_a_and_b, &mut sql_string);

    println!("Sql 4: {}", sql_string);
}
Both of these yield the same result:
Sql 3: NOT "a" AND "b"
Sql 4: NOT "a" AND "b"

Well, there is no easy comprehensive solution to these problems, which must be fixed by considering how the associativity and precedence rules for parsing expressions in the respective backends.

The idea was to in binary expressions:
Drop left parenthesis if either:

  • Left binary and left.op = op and left_associative(op)
  • precedence(left) > precedence(op)

Drop right parenthesis if:

  • precedence(right) > precedence(op)

And similarly for Unary expressions:

  • Drop expr parenthesis if expr does not contain an op or precedence(expr.op) > precedence(op)

However, the respective backends supported have differing operator precedence rules:

What can be done is to order some of those pairs of operators where all backends agree, and have precedence be unknown otherwise. Adding the simple things.. like "*/%+-" greater than the comparison operators and so on.

Otherwise SQL construction should know about the backend, which it does not appear to do now, and integer precedence values consistent with the precedence table of the backend can be assigned.

Thank you for your investigation and suggestion. It seemed very doable! I was wondering, can we further simplify the logic to "we add a paren if the operators are different", without regarding precedence? It might be more verbose, but it would not be wrong. I hope it'd not break too many of our existing test cases though.

Btw, the SQL building stage is database specific, so we can potentially have different logic for each backend. Though it is preferable if we can align them.

I think I changed my mind about needing DB specific information, since for the human SQL reader, you would essentially need to know the particularities of precedence rules for your DB to decide what the expression means. Better to go safe and readable and use some parentheses.

So I did as you suggested above and added paren if operators are different by default, but I also added some cases where parens can be dropped to cover exceptional situations and some broad precedence cases - otherwise one third of the tests broke, and the SQL was quite ugly.

Had a go at a pull request, let me know what you think.
Possibly the tests are a bit misplaced, or need to be duplicated across DBs?

#675

This code causes a panic I believe:

fn test_sea_query() {
        let e = SimpleExpr::from(true).and(SimpleExpr::from(true).and(true.into()).and(true.into()));
        println!("Expr: {}",  Query::select().and_where(e).to_string(SqliteQueryBuilder));  // panics
}

Please fix this soon; this is a serious bug. We are building queries dynamically and cannot simply "switch" to using Condition::all/Condition::any. I would much prefer sea query use unnecessary parentheses and preserve soundness rather than panic on only moderately complicated nested binary operations.

Edit: Sorry, forgot to post the location of the panic. This is the stack trace:

panicked at 'called `Option::unwrap()` on a `None` value', ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:1405:63
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   3: core::option::Option<T>::unwrap
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/option.rs:950:21
   4: sea_query::backend::query_builder::QueryBuilder::binary_expr
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:1405:43
   5: sea_query::backend::query_builder::QueryBuilder::prepare_simple_expr_common
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:312:22
   6: sea_query::backend::query_builder::QueryBuilder::prepare_simple_expr
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:285:9
   7: sea_query::backend::query_builder::QueryBuilder::prepare_condition_where::{{closure}}
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:1304:21
   8: core::iter::traits::iterator::Iterator::fold
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/iter/traits/iterator.rs:2482:21
   9: sea_query::backend::query_builder::QueryBuilder::prepare_condition_where
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:1283:9
  10: sea_query::backend::query_builder::QueryBuilder::prepare_condition
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:1260:17
  11: sea_query::backend::query_builder::QueryBuilder::prepare_select_statement
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:127:9
  12: <sea_query::query::select::SelectStatement as sea_query::query::traits::QueryStatementBuilder>::build_collect_any_into
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/query/

And the relevant lines are:

        let right_paren = (right.need_parentheses()
            || right.is_binary() && op != left.get_bin_oper().unwrap())
        ---------------------------------------------------   ^ unwrap called on None
            && !no_right_paren
            && !no_paren;

@prakol16 can you post the location and reason for the panic?

@prakol16 can you post the location and reason for the panic?

I edited my original comment.

This code causes a panic I believe:

fn test_sea_query() {
        let e = SimpleExpr::from(true).and(SimpleExpr::from(true).and(true.into()).and(true.into()));
        println!("Expr: {}",  Query::select().and_where(e).to_string(SqliteQueryBuilder));  // panics
}

Please fix this soon; this is a serious bug. We are building queries dynamically and cannot simply "switch" to using Condition::all/Condition::any. I would much prefer sea query use unnecessary parentheses and preserve soundness rather than panic on only moderately complicated nested binary operations.

Edit: Sorry, forgot to post the location of the panic. This is the stack trace:

panicked at 'called `Option::unwrap()` on a `None` value', ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:1405:63
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:117:5
   3: core::option::Option<T>::unwrap
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/option.rs:950:21
   4: sea_query::backend::query_builder::QueryBuilder::binary_expr
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:1405:43
   5: sea_query::backend::query_builder::QueryBuilder::prepare_simple_expr_common
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:312:22
   6: sea_query::backend::query_builder::QueryBuilder::prepare_simple_expr
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:285:9
   7: sea_query::backend::query_builder::QueryBuilder::prepare_condition_where::{{closure}}
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:1304:21
   8: core::iter::traits::iterator::Iterator::fold
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/iter/traits/iterator.rs:2482:21
   9: sea_query::backend::query_builder::QueryBuilder::prepare_condition_where
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:1283:9
  10: sea_query::backend::query_builder::QueryBuilder::prepare_condition
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:1260:17
  11: sea_query::backend::query_builder::QueryBuilder::prepare_select_statement
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/backend/query_builder.rs:127:9
  12: <sea_query::query::select::SelectStatement as sea_query::query::traits::QueryStatementBuilder>::build_collect_any_into
             at ~/.cargo/registry/src/index.crates.io-6f17d22bba15001f/sea-query-0.30.0/src/query/

And the relevant lines are:

        let right_paren = (right.need_parentheses()
            || right.is_binary() && op != left.get_bin_oper().unwrap())
        ---------------------------------------------------   ^ unwrap called on None
            && !no_right_paren
            && !no_paren;

Edit:I think this is fixed as a consequence of by bugfix in #675 (edited with correct PR)
Are you able to provide a test case?

Issue #657 seems to be about derive_attr which seems unrelated to this issue. Did you mean #675 ?

I agree #675 likely solves this issue. I just wanted to point out that this isn't just a soundness issue -- it can cause crashes/panics as well. I think this puts a little pressure to get the PR merged and released sooner.

Test case:

#[test]
fn test_issue_674_nested_logical_panic() {
    let e = SimpleExpr::from(true).and(SimpleExpr::from(true).and(true.into()).and(true.into()));

    assert_eq!(
        Query::select()
            .columns([Char::Character])
            .from(Char::Table)
            .and_where(e)
            .to_string(PostgresQueryBuilder),
        r#"SELECT "character" FROM "character" WHERE TRUE AND (TRUE AND TRUE AND TRUE)"#
    );
}

I believe this is correct. Another correct option (with more parentheses) includes r#"SELECT "character" FROM "character" WHERE TRUE AND ((TRUE AND TRUE) AND TRUE)"#. I might have done the "manual query building" here wrong though -- the important thing is that it does not panic.

Issue #657 seems to be about derive_attr which seems unrelated to this issue. Did you mean #675 ?

I agree #675 likely solves this issue. I just wanted to point out that this isn't just a soundness issue -- it can cause crashes/panics as well. I think this puts a little pressure to get the PR merged and released sooner.

Test case:

#[test]
fn test_issue_674_nested_logical_panic() {
    let e = SimpleExpr::from(true).and(SimpleExpr::from(true).and(true.into()).and(true.into()));

    assert_eq!(
        Query::select()
            .columns([Char::Character])
            .from(Char::Table)
            .and_where(e)
            .to_string(PostgresQueryBuilder),
        r#"SELECT "character" FROM "character" WHERE TRUE AND (TRUE AND TRUE AND TRUE)"#
    );
}

I believe this is correct. Another correct option (with more parentheses) includes r#"SELECT "character" FROM "character" WHERE TRUE AND ((TRUE AND TRUE) AND TRUE)"#. I might have done the "manual query building" here wrong though -- the important thing is that it does not panic.

Thanks for the test case! Put it into the PR. Strictly we can drop the parentheses around the right hand expression here.
This requires rules on the mathematical associativity of AND. Let's leave that for another time :-)

Can we get a release for this @tyt2y3 I think we hit a bug in our codebase due to that too...

Sure. I am planning to get on it tmr.

Backported the fixes to 0.30.1

@tyt2y3 Could we have a "give me all the parentheses" mode (either feature flag or runtime).
Not that that I don't trust the implicit priority of operations but I don't hahaha (is it even standard between databases?), so I would sleep better if I could force it to add parentheses.

Yeah, that's fair. I think it is better to be a runtime option instead of a feature flag.
I remember serde_json having a arbitrary_precision feature flag which gives different runtime behaviour. Which from my experience was pretty difficult to manage.
I am not sure whether using lazy_static for global options would be a better idea though.