/storycheck

End to end natural language user story verification for web3 apps

Primary LanguagePythonApache License 2.0Apache-2.0

StoryCheck

StoryCheck for Web3 apps based on Ethereum. Experimental app testing playground as well as an API served via Gradio on port 7860.

It takes as input markdown formatted user stories with steps written in natural language. Then it parses the text and executes the steps in a virtual web browser (via Playwright) closely emulating the actions of a real user. Uses RefExp GPT to predict UI element coordinates given a referring expression.

Note: Storycheck is currently most reliable for testing the UI of smartphones and tablets.

Walkthrough Video

StoryCheck Walkthrough Video

Example User Story Input

# Creating a new DAO LLC via SporosDAO.xyz on Goerli Test Net

## Prerequisites

- Chain
  - Id 5
  - Block 8856964

## User Steps

1. Browse to https://app.sporosdao.xyz/
1. Click on Create a new company button
1. Click on Go! right arrow button left of Available
1. Select On-chain name text field
1. Type Test DAO
1. Select Token Symbol text field
1. Type TDO
1. Click on Continue button in the bottom right corner
1. Select Address text field above Enter the wallet address
1. Type 0x5389199D5168174FA177908685FbD52A7138Ed1a
1. Select text field below Initial Tokens
1. Type 1200
1. Select text field under Email
1. Type test@email.com
1. Scroll down
1. Click on Continue button
1. Click on Continue button at the top
1. Scroll up
1. Click on the checkbox left of Agree
1. Scroll down
1. Click on Continue button
1. Scroll up
1. Click on Deploy Now button
1. Press Tab
1. Press Tab
1. Press Enter
1. Press Home

## Expected Results

- Wallet transactions match snapshot

Prerequisites Section

The prerequisites section sets conditions which allow the test to execute from a deterministic blockchain state, which respectively allows for predictable results. Currently supported prerequsite is Chain at the top level with Id as a required parameter, and optionally Block and RPC. These parameters are passed to anvil to create a local EVM fork for the test run.

Default Prerequisites

By default each test starts with 10,000 ETH in the mock user wallet (same as anvil default test accounts).

In order to fund the mock wallet with other tokens (e.g. USDC, DAI, NFTs), the User Steps section of the story file should begin with prompts that initiate the funding via front end interactions (e.g. Uniswap flow for ETH/USDC swap).

Custom RPC

Often Web3 Apps use front end libraries such as wagmi.sh to access current chain state. When that is the case, the user story should include the exact RPC URL used by the front end as a prerequisite. That allows StoryCheck to intercept all calls directed to the RPC and reroute towards the local blockchain fork. This is important to ensure that the app reads and writes from/to the local chain fork.

Example 1. Etheremum Mainnet test

The following example sets up a local fork of ETH Mainnet starting from the latest block using a default RPC.

## Prerequisites

- Chain
  - Id 1

Example 2. Goerli test with specific block and RPC

The following example sets up a local fork of Goerli Testnet starting from the given block number and using a given RPC URL.

## Prerequisites

- Chain
  - Id 5
  - Block 8856964
  - RPC https://eth-goerli.g.alchemy.com/v2/3HpUm27w8PfGlJzZa4jxnxSYs9vQNMMM

User Story Section

The format of user steps in this section resembles the HOWTO documentation of a web3 app. Teams may use the same markdown in their documentation (e.g. gitbook, notion, docusauros) and execute it with StoryCheck to make sure that the latest web app behavior is in sync with docs.

User Story Prompts

Each step in a user story is classified as an action prompt from the following set:

  • Browse - prompts that start with browse and include a URL link to a web page are interpreted as browser navigation actions. For example browse to https://app.uniswap.org. For implementation details, see Playwright goto.
  • Click - prompts that start with click, tap, or select followed by a natural language referring expression of a UI element are interepreted as click actions with the corresponding UI element target. For example click on Submit button at the bottom or select logo next to ETH option. For implementation details see Playwright mouse click and RefExp GPT
  • Type - prompts that start with the keyword type, input or enter (case insensitive) followed by a string are interpreted as a keyboard input action. For example Type 1000 or Type MyNewDAO. For implementation details, see Playwright type.
  • Scroll - prompts that start with scroll followed by up or down are interpreted respectively as Press PageDown and Press PageUp
  • Press - prompts that start with press followed by a keyboard key code (F1 - F12, Digit0 - Digit9, KeyA - KeyZ, Backquote, Minus, Equal, Backslash, Backspace, Tab, Delete, Escape, ArrowDown, End, Enter, Home, Insert, PageDown, PageUp, ArrowRight, ArrowUp) are interpreted as a single key press action. For further details, see Playwright press.

Expected Results Section

Expected Results section currently implements a default transaction snapshot check similar to jest snapshot matching. The first time a test is run, all write transactions going through window.ethereum are recorded and saved. Subsequent runs must match these write transactions. If there is a mismatch, then one of three changes took place in the UI under test:

  • Developers changed the frontend code in a significant way. This warrants a careful code review and update of the user stories.
  • There is malicious injected code that changes the behavior of the app. A big red alert is in order! App infrastructure is compromised: hosting providers, third party libraries, or build tools.
  • There is a bug in some of the third party dependencies that affects UI behavior. Developer attention required to track down and fix the root cause.

Saved Snaphots

Snapshot files with wallet transactions are saved to a file with .snapshot.json extension in the same directory where the story markdown file is stored.

├─ astory.md
├─ astory.snapshot.json

High level design

flowchart TD
    A[User Story] -->|check| B(StoryCheck)
    B --> |parse| C[Markdown Parser]
    B -->|play| D[Browser Driver / playwright]
    D -->|locate UI element| E[AI Model]
    D -->|sign tx| F[Mock Wallet / EIP1193Bridge]
    F -->|blokchain tx| G[Local EVM Fork / anvil]
Loading

Directory structure

├─ .\ — "Main StoryCheck python app."
│  │
│  ├─ markdown — "Markdown parser. Outputs abstract syntax tree (AST) to interpreter."
│  │
│  ├──┬─ interpreter — "Runtime engine which takes AST as input and executes it."
│  │  │
│  ├──┼──┬─ browser — "Playwright browser driver."
│  │  │  │
│  │  │  └─ mock_wallet — "JavaScript mock wallet provider injected in playwright page context as Metamask."
│  │  │
│  │  ├─ ai — "RefExp GPT AI model that predicst UI element location based on natural language referring expressions."
│  │  │
│  │  └─ blockchain — "Local EVM fork runtime via Foundry Anvil."
│  │
│  └─ examples — "Example user stories."

How to Build and Run

This project is pre-configured to build and run via Gitpod.

Open in Gitpod

To run locally or in another dev environment, copy the steps from .gitpod.yml

Command line arguments

StoryCheck can be run as a shell command or as a web service.

$>./storycheck.sh --help


usage: StoryCheck by GuardianUI [-h] [-o OUTPUT_DIR] [--serve] storypath

Parses and executes user stories written in markdown format.

positional arguments:
  storypath             Path to the user story input markdown file (e.g. mystory.md).

options:
  -h, --help            show this help message and exit
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        Directory where all results from the storycheck run will be stored. Defaults to "results"
  --serve               Run as a web service. Defaults to "False".

Copyright(c) guardianui.com 2023

For example to run a check of mystory.md, use:

./storycheck.sh mystory.md

Command line exit codes

If all story checks / tests pass, the command will return with exit code 0. Otherwise if any test fails or other errors occur, the exit code will be non-zero. This makes it possible to use storycheck in shell scripts or CI scripts.

Using in CI scripts

StoryCheck can be used as a test step in CI scripts. Here is an example github action which sets up a storycheck environment and runs checks. If the storycheck step fails, the CI script fails as well.

Output Directory Artifacts

The output directory of a test run is either specified via --output-dir command line argument or defaults to ./results. It contains a number of helpful artifacts for debugging:

├─ ./results — "Main output directory for an input story file."
│  │
│  ├─ storycheck.log — "Consolidated log file between test runner, browser and EVM."
│  │
│  ├─ tx_log_snapshot.json — "Snapshot of all blockchain write transactions."
│  │
│  ├─ videos/"Video recordings of browser interactions."
│  │
│  ├─ screenshots/"Browser screenshot for every prompt in the User Steps section."
│  │
│  ├─ anvil-out.json — "Configuration for the anvil EVM fork."
│  │
│  ├─ trace.zip — "Session trace for Playwright Trace Viewer."
│  │

Contributing

Thanks for your interest in contributing!

Please start with a new discussion before opening an Issue or Pull Request.