This workshop is based on the workshop codeql-js-goof-workshop.
-
Install Visual Studio Code.
-
Install the CodeQL extension for Visual Studio Code.
-
Install the latest CodeQL CLI and make it available on your PATH.
-
Clone this repository recursively to ensure the submodule is cloned:
git clone --recursive https://github.com/rvermeulen/codeql-workshop-javascript-prototype-pollution
-
Install the CodeQL pack dependencies using the command
CodeQL: Install Pack Dependencies
and selectexercises
, andsolutions
. -
Build the database
nodejs-goof-db.zip
by running the commandmake
or manually executed the commands associated with thenodejs-goof-db.zip
Make target. Alternatively, you can download a pre-built database. -
Select the database using the command
CodeQL: Choose Database from Archive
and pass the path to the database.
This workshop is an introduction to JavaScript and will cover:
- How to build a database for a JavaScript project.
- How QL represents JavaScript source-code.
- How to describe JavaScript program elements in QL.
- How to use QL classes to create reusable patterns.
- How to identify API calls to external functions using the API graph.
- How to use global data-flow to find flows from untrusted data to security sensitive operations.
In this workshop we will introduce QL for JavaScript by finding a JavaScript prototype pollution vulnerability in a deliberately vulnerable NodeJS application named Goof by Snyk Labs.
Prototype pollution is a type of vulnerability in which an attacker is able to modify Object.prototype
.
This can happen when recursively merging a user-controlled object with another object, allowing an attacker to modify the built-in Object prototype.
Once that is done, later requests can abuse the new property by abusing newly obtained privileges.
The Goof application contains an example exploit that exploits the vulnerability in the Goof application.
The exploit abuses the merge
of an user-controlled object in the Chat add handler (line 334) to gain the same privileges as an admin user.
curl --request PUT \
--url "$GOOF_HOST/chat" \
--header 'content-type: application/json' \
--data '{"auth": {"name": "user", "password": "pwd"}, "message": { "text": "😈", "__proto__": {"canDelete": true}}}'
The corresponding vulnerable merge
call from the Chat add handler.
...
_.merge(message, req.body.message, {
id: lastId++,
timestamp: Date.now(),
userName: user.name,
});
...
Key information for writing the query are:
- The request is processed by the Chat add handler.
- Information from the request object containing user-controlled information is provide to the
merge
call.
Using the exercises we will incrementally build a final query to find the prototype pollution vulnerability.
In the first part we will identify the entry point that processes user-provided data.
Then we will identify the security sensitive merge
call that can be abused by an attacker.
Finally, we will use global dataflow to connect the two by creating a configuration that determines if user-controlled data can reach the merge
call.
We will start with reasoning about the abstract syntax tree (AST) of our JavaScript program to identify all functions.
Implement Exercises1.ql such that it finds all functions in the program.
Hints
- Use the autocompletion function to determine a useful QL call to describe functions.
- Alternatively, use the VS Code file explorer to open a file from the selected database and use the command
CodeQL: View Ast
to view the AST of the file to determine if there are useful QL classes to solve the question.
A solution can be found in the query Exercise1.ql.
All the functions in the program is a good starting point.
However, we are interested in the function representing the add
handler.
Implement Exercises2.ql such that it finds all functions in the program with the name add
.
Hints
- The class
Function
has a member predicate namedgetName
that returns the name of the function. - Use the formula = to compare the name to
"add"
.
A solution can be found in the query Exercise2.ql.
Looking at the results of the query that finds all functions names add
shows that it finds a lot of unrelated functions.
To exclude the unrelated functions we need to add more constraints on the pattern described by our query.
Which characteristics of our target function can be used to distinguish it? The signature, that is the parameters, might provide a solution.
Implement Exercises3.ql such that it finds our Chat handler add
.
Hints
- The class
Function
has a member predicate namedgetNumParameter
that returns the number of parameters. - The class
Function
has a member predicate namedgetParameter
that returns aParameter
given an index. - The class
Parameter
describes the formal parameters of a function and has a member predicategetName
that returns its name.
A solution can be found in the query Exercise3.ql.
We have now sufficiently described our add
handler to successfully find it in the JavaScript program.
Recall that predicates and classes allow you to encapsulate logical conditions in a reusable format.
Convert your solution to Exercise3.ql into a class in Exercises4.ql by replacing the none formula in the characteristic predicate of the AddChatHandler
class.
A solution can be found in the query Exercise4.ql.
Now that we found our add
handler, we are going to look for the call to the merge
function.
Implement Exercises5.ql such that it finds all method calls in the program.
Hints
- Use the autocompletion function to determine a useful QL call to describe functions.
- Alternatively, use the VS Code file explorer to open a file from the selected database and use the command
CodeQL: View Ast
to view the AST of the file to determine if there are useful QL classes to solve the question.
A solution can be found in the query Exercise5.ql.
The next step is to restrict all the method calls to calls that call the method merge
.
Implement Exercises6.ql such that it finds all method calls to the method merge
in the program.
Hints
- The class
MethodCallExpr
has a member predicategetCalleeName
that returns the name of the called method. - Use the formula = to compare the method name to
"merge"
.
A solution can be found in the query Exercise6.ql.
Unlike looking for functions with the name add
, our solution returns only a single result because this JavaScript program only has a single call to a merge
method.
However, this is unlikely to be the case in real-world applications.
To improve the precision of our query we can look at the qualifier _
of the method call.
Looking at the definition of _
we can see that it points to the module lodash
.
var _ = require('lodash');
We will start by identifying the import of Lodash.
Implement Exercises7.ql such that it finds all the imports of the module named "lodash"
.
Hints
- Imported modules are represented by the class
ModuleImportNode
part of theDataFlow
module. - The
DataFlow
module provides the predicatemoduleImport
to reason about module imports by name.
A solution can be found in the query Exercise7.ql.
With the import of the Lodash module, we need to identify calls to its member merge
.
Implement Exercises8.ql such that it finds all calls to member merge
of the "lodash"
module.
Hints
- The class
ModuleImportNode
, part of theDataFlow
module, has a member predicategetAMemberCall
to reason about member calls by name.
A solution can be found in the query Exercise8.ql.
While the previous query is sufficiently precise in this case to find the member call to merge
in the Lodash module, the question provides an opportunity to look at API graphs.
API graphs are a uniform interface for referring to functions, classes, and methods defined in external libraries that was first added to our Python standard library (hence the link to the Python documentation).
The most common entry point into the API graph is the importing of an external package or module.
Using the predicate moduleImport
, part of the API
module, we can find those entry points.
For every node in the API graph, for which we can statically infer its name, we can reason about its attributes using the getMember
predicate.
Complete Exercises9.ql by implementing the predicate lodash
, representing the Lodash module, and the class LodashMergeCall
, to represent calls to its member merge
, by using the API graph implemented by the API
module.
Hints
- The
Node
class, part of theAPI
module, has the member predicategetACall
to reason about calls to the member.
A solution can be found in the query Exercise9.ql.
With the merge
calls identified we can start focussing on the vulnerability.
A user-provided value that is passed to the merge
call can be exploited by an attacker.
Reuse the identification of the merge
call from Exercises9.ql and identify the arguments to the merge
call in Exercises10.ql
Hints
- The
CallNode
class, part of theAPI
module, has the member predicategetAnArgument
to reason about arguments passed to the call.
A solution can be found in the query Exercise10.ql.
Having both identified the entry point and the security sensitive operation, we can move to reasoning about dataflow. Up to now we have already used parts of the available data flow analysis due to the nature of dynamic languages when reasoning about imported modules and using the API graph.
The dataflow graph is built on top of the AST, but contains more detailed semantic information about the flow of information through the program. This allows us to determine where user-controlled data is used an whether that use poses a security risk.
In this exercise we are going to make use of global dataflow analysis. Global dataflow analysis can track the use of values across function/method boundaries. This analysis is computational expensive operation and to make this tractable we have to restrict it the parts of the programs that are relevant. This is done using a configuration pattern where we need to extend a dataflow or taintracking configuration and provide predicates to configure the analysis.
For this workshop it suffices to introduce the concepts source
and sink
.
The global dataflow analysis is configured by specifying the sources, the starting points of the analysis, that need to be considered, and the sinks, the program elements where the analysis stops and is considered complete.
With the sources and sinks defined, the global dataflow analysis will try to determine if there is a sink that is reachable from a source.
In other words, does there exists a path from a source to a sink.
Complete Exercise11.ql by copying your solution from Exercises4.ql and implement the getRequestParameter
predicate.
Use quick evaluation on the isSource
member predicate of the PrototypePollutionConfiguration
to validate that it finds the correct request parameter.
Hints
- The
CallNode
class, part of theAPI
module, has the member predicategetAnArgument
to reason about arguments passed to the call.
A solution can be found in the query Exercise11.ql.
The last step is to specify the sink
of the global dataflow configuration.
Reuse your solution for Exercises10.ql and complete Exercises10.ql by implementing the class LodashMergeSink
.
Running the query should provide a dataflow path from the req
parameter to an argument of the merge
call.
A solution can be found in the query Exercise11.ql.