/codeql-dataflow-ii-java

Primary LanguageCodeQLApache License 2.0Apache-2.0

./images/under-construction.png

SQL injection example

This is a more advanced version of https://github.com/advanced-security/codeql-workshops-staging/tree/master/java/codeql-dataflow-sql-injection

Initial results should show a false positive, and false negative (missing path). To that end:

  • false negative: db write using struct => dataflow won’t work, taint-tracking should. Debug using partial paths. Add some extra library function so that additional taint steps are needed to propagate.
  • false positive: second write using safe sql command (pre-compiled), initially shown as false positive (will be sanitizer). Add a checking-function in this path also (integer from string?). This one will become a barrier function.

Setup and sample run

The jdbc connector at https://github.com/xerial/sqlite-jdbc, from here is included in the git repository.

# Use a simple headline prompt 
PS1='
\033[32m---- SQL injection demo ----\[\033[33m\033[0m\]
$?:$ '


# Build
./build.sh

# Prepare db
./admin -r
./admin -c
./admin -s 

# Add regular user interactively
./add-user 2>> users.log
First User

# Check
./admin -s

# Add Johnny Droptable 
./add-user 2>> users.log
Johnny'); DROP TABLE users; --

# And the problem:
./admin -s

# Check the log
tail users.log

Identify the problem

./add-user is reading from STDIN, and writing to a database; looking at the code in ./AddUser.java leads to

System.console().readLine();

for the read and

conn.createStatement().executeUpdate(query);

for the write.

This problem is thus a dataflow problem; in codeql terminology we have

  • a source at the System.console().readLine();
  • a sink at the conn.createStatement().executeUpdate(query);

We write codeql to identify these two, and then connect them via

  • a dataflow configuration – for this problem, the more general taintflow configuration.

Build codeql database

To get started, build the codeql database (adjust paths to your setup):

# Build the db with source commit id.
cd ~/local/codeql-dataflow-ii-java
SRCDIR=$(pwd)
DB=$SRCDIR/java-sqli-$(cd $SRCDIR && git rev-parse --short HEAD)

echo $DB
test -d "$DB" && rm -fR "$DB"
mkdir -p "$DB"

cd $SRCDIR && codeql database create --language=java -s . -j 8 -v $DB --command='./build.sh'

# Check for AddUser in the db
unzip -v $DB/src.zip | grep AddUser

Then add this database directory to your VS Code DATABASES tab.

Build codeql database in steps

For larger projects, using a single command to build everything is costly when any part of the build fails.

To build a database in steps, use the following sequence, adjusting paths to your setup:

# Build the db with source commit id.
export PATH=$HOME/local/vmsync/codeql250:"$PATH"
SRCDIR=$HOME/local/codeql-training-material.java-sqli/java/codeql-dataflow-sql-injection
DB=$SRCDIR/java-sqli-$(cd $SRCDIR && git rev-parse --short HEAD)

# Check paths
echo $DB
echo $SRCDIR

# Prepare db directory
test -d "$DB" && rm -fR "$DB"
mkdir -p "$DB"

# Run the build
cd $SRCDIR
codeql database init --language=java -s . -v $DB
# Repeat trace-command as needed to cover all targets
codeql database trace-command -v $DB -- make 
codeql database finalize -j4 $DB

Then add this database directory to your VS Code DATABASES tab.

Develop the query bottom-up

  1. Identify the source part of the
    System.console().readLine();
        

    expression, the buf argument. Start from a from..where..select, then convert to a predicate.

  2. Identify the sink part of the
    conn.createStatement().executeUpdate(query);
        

    expression, the query argument. Again start from from..where..select, then convert to a predicate.

  3. Fill in the taintflow configuration boilerplate
    class SqliFlowConfig extends TaintTracking::Configuration {
        SqliFlowConfig() { this = "SqliFlow" }
    
        override predicate isSource(DataFlow::Node node) {
            none()
                }
    
        override predicate isSink(DataFlow::Node node) {
            none()
                }
    }
        

The final query (without isAdditionalTaintStep) is

/**
 * @name SQLI Vulnerability
 * @description Using untrusted strings in a sql query allows sql injection attacks.
 * @kind path-problem
 * @id java/SQLIVulnerable
 * @problem.severity warning
 */

import java
import semmle.code.java.dataflow.TaintTracking
import DataFlow::PathGraph

class SqliFlowConfig extends TaintTracking::Configuration {
    SqliFlowConfig() { this = "SqliFlow" }

    override predicate isSource(DataFlow::Node source) {
       // System.console().readLine();
       exists(Call read |
           read.getCallee().getName() = "readLine" and
           read = source.asExpr()
       )
   }

    override predicate isSink(DataFlow::Node sink) {
       // conn.createStatement().executeUpdate(query);
       exists(Call exec |
           exec.getCallee().getName() = "executeUpdate" and
           exec.getArgument(0) = sink.asExpr()
       )
   }
}

from SqliFlowConfig conf, DataFlow::PathNode source, DataFlow::PathNode sink
where conf.hasFlowPath(source, sink)
select sink, source, sink, "Possible SQL injection"

misc

codeql  resolve library-path --query=SqlInjection.ql 

The data flow and taint tracking libraries have been extended with versions of `isBarrierIn`, `isBarrierOut`, and `isBarrierGuard`, respectively `isSanitizerIn`, `isSanitizerOut`, and `isSanitizerGuard`, that support flow states.

module BarrierGuard<guardChecksSig/3 guardChecks> {
  /** Gets a node that is safely guarded by the given guard check. */
  Node getABarrierNode() {
    exists(Guard g, SsaVariable v, boolean branch, RValue use |
      guardChecks(g, v.getAUse(), branch) and
      use = v.getAUse() and
      g.controls(use.getBasicBlock(), branch) and
      result.asExpr() = use
    )
  }
}

/**
 * DEPRECATED: Use `BarrierGuard` module instead.
 *
 * A guard that validates some expression.
 *
 * To use this in a configuration, extend the class and provide a
 * characteristic predicate precisely specifying the guard, and override
 * `checks` to specify what is being validated and in which branch.
 *
 * It is important that all extending classes in scope are disjoint.
 */
deprecated class BarrierGuard extends Guard {
  /** Holds if this guard validates `e` upon evaluating to `branch`. */
  abstract predicate checks(Expr e, boolean branch);

  /** Gets a node guarded by this guard. */
  final Node getAGuardedNode() {
    exists(SsaVariable v, boolean branch, RValue use |
      this.checks(v.getAUse(), branch) and
      use = v.getAUse() and
      this.controls(use.getBasicBlock(), branch) and
      result.asExpr() = use
    )
  }
}

override predicate isSanitizer(DataFlow::Node sanitizer) {
  exists(Call noSemi |
    noSemi.getCallee().getName() = "no_semi" 
    and sanitizer.asExpr() = noSemi
  )
}

sanitizer.asExpr() = noSemi does not find no_semi or writeNicknames or write_info. It finds writeNicknames1 though.

The partial flow no longer exists?

from EntityNameMatchCfg cfg, DataFlow::PartialPathNode src, DataFlow::PartialPathNode sink
where cfg.hasPartialFlow(src, sink, 25)
select src, sink

https://codeql.github.com/docs/writing-codeql-queries/debugging-data-flow-queries-using-partial-flow/

String line = System.console().readLine();
if (line.isEmpty()) break;
nicknames.addNickname(line);

line as source only flows to nicknames;

nicknames.addNickname() has to propagate to nicknames.getNicknames()