github/codeql-cli-binaries

`No need to rerun` although query changed (`codeql database analyze`)

RasmusWL opened this issue ยท 5 comments

If you alter a query after you have run codeql database analyze with that query, any subsequent calls to codeql database analyze will not run the new query, but re-use the results from the old and outdated version

This seems to have been a problem for some users: github/codeql#5084

Steps to reproduce

  1. mkdir src && echo 'print(42)' > src/wat.py
  2. codeql database create db --language python --source-root src/
  3. Add dummy query and qlpack:
    qlpack.yml:
name: codeql-python-no-rerun-example
version: 0.0.0
libraryPathDependencies: codeql-python

wat.ql:

/**
 * @kind problem
 * @id py/example-of-no-rerun
 * @name Calls
 * @description Finds any calls
 * @problem.severity error
 * @tags call
 */

import python

from CallNode call
// where none()
select call, "a call"
  1. Run the query: codeql database analyze db wat.ql --format=csv --output=out.csv && cat out.csv and notice that output contains the one call.
  2. Alter the query, by uncommenting the where none() part
  3. Run the query: codeql database analyze db wat.ql --format=csv --output=out.csv && cat out.csv and notice that output still contains the one call ๐Ÿ˜ฑ
    Running queries.
    [1/1] No need to rerun /home/rasmus/tmp/wat/wat.ql.
    Shutting down query evaluator.
    Interpreting results.
    "Calls","Finds any calls","error","a call","/wat.py","1","1","1","9"
    
  4. Run the query, forcing a rerun: codeql database analyze --rerun db wat.ql --format=csv --output=out.csv && cat out.csv and notice that output is now empty (as it should be)
    Running queries.
    Compiling query plan for /home/rasmus/tmp/wat/wat.ql.
    [1/1] Found in cache: /home/rasmus/tmp/wat/wat.ql.
    Starting evaluation of codeql-python-no-rerun-example/wat.ql.
    [1/1 eval 43ms] Evaluation done; writing results to codeql-python-no-rerun-example/wat.bqrs.
    Shutting down query evaluator.
    Interpreting results.
    

Additional info

I'm using codeql cli version 2.5.5+202105241554plus. (locally built)

Supplementary info:
Same can be reproduced with codeql cli version 2.5.0 too.

This is by design, though perhaps not the most user-friendly design in hindsight.

The result of codeql database run-queries (which is the first half of codeql database analyze) stores output of queries in a format that doesn't remember what the actual text of teh QL source of each query was -- it just identifies the result by filename within the database's results directory.

At the time we designed the CLI, we imagined that the primary reason anyone would run codeql database run-queries multiple times on a single database was that the first run had timed out, run out of RAM or otherwise died. Then you'd want the next run to be an attempt to see if leaving out the queries that did succeed the first time around would allow the rest to squeeze through too. Therefore the default behavior of run-queries is to skip queries that already appear to have results ready.

This behavior can be changed by giving the --rerun option, in which case all queries will be evaluated afresh even if they already have results.

You can make this behavior the default by adding a line reading

database analyze --rerun

to your ~/.config/codeql/config.

Thanks for providing that extra insight, avoiding rerunning all queries does seem very reasonable ๐Ÿ‘

I guess there is no easy way to determine whether a query has changed, since that is not just the query text, but also all transitive imports it depends on. So I'm very understanding about the fact that there is no easy solution for "fixing" this. (so if you want to close this as "wont-fix", that's totally fine by me)

@hmakholm , @RasmusWL ,

Would you mind explaining if and how this --rerun can be used via the Github action?

@carlspring this was a problem with local development. Please open a new issue instead ๐Ÿ‘