Feature Request: Add support for clearing tables between tests

Question

Feature Request: Add support for clearing tables between tests

goldsam opened this issue 5 years ago · 9 comments

goldsam commented 5 years ago

Testing would be easier if database state could be reset at the beginning of each test

Answer 1 · 2019-09-02T09:05:06.000Z

Hey Sam!

I totally agree, this will simplify my tests as well. I'll put this on my agenda.

Answer 2 · 2019-09-05T17:39:01.000Z

I ended up solving this by creating a helper script which explicitly deletes every item from one or all tables. I invoke these methods in a beforeEach block to reset database state before each test. Although this works, it has some limitations.

What I quickly discovered was that Jest runs test files in parallel which becomes problematic for code using a shared resource such as DynamoDB due to race conditions. I ended up solving this (at least temporarily) by restructuring my tests so that all code using a given table is invoked from one root test file and thus executed sequentially.

A better solution would be to create distinct "environments" for each test. I can think of a few approaches:

Within the same DynamoDB instance, create table names uniquely for each test using some kind of random post/prefix. That post/prefix could then be injected into the test so it knows what table names to use. This seems ugly to me.
Instead, allow the test code to simply setup and teardown the database state manualy. It would be nice to simply invoke a method and pass the tables configuration array similar to what is supported in jest-dynamodb-config.js. Admittedly, This solution requires no changes to your library.
Spin up a new DynamoDB instance for each test and inject the port number of that instance into the test environment. This has a lot of cost overhead but provides a strong level of data isolation.

Although this is arguably an unrelated or secondary problem at best, its seemed worthwhile to at least start a discussion.

Below is the dynamodb-utils.ts helper script I mentioned above:

import * as AWS from 'aws-sdk'; 
import { AttributeMap, KeySchemaElement, Key } from 'aws-sdk/clients/dynamodb';
import { DynamoDB } from 'aws-sdk';

function itemToKey(item: AttributeMap, keySchema: KeySchemaElement[]): Key {
    let itemKey: Key = {};
    keySchema.map(key => { 
        itemKey = { ...itemKey, [key.AttributeName]: item[key.AttributeName] };
    });
    return itemKey;
};

export async function clearTable(dynamoDB: AWS.DynamoDB, tableName: string): Promise<void> {
    // get the table keys
    const { Table = {} } = await dynamoDB
      .describeTable({ TableName: tableName })
      .promise();
  
    const keySchema = Table.KeySchema || [];
  
    // get the items to delete
    const scanResult = await dynamoDB.scan({
        AttributesToGet: keySchema.map(key => key.AttributeName),
        TableName: tableName,
        ConsistentRead: true
    }).promise();
    const items = scanResult.Items || [];
  
    if (items.length > 0) {
        const deleteRequests = items.map(item => ({
            DeleteRequest: { Key: itemToKey(item, keySchema) },
        }));

        await dynamoDB
            .batchWriteItem({ RequestItems: { [tableName]: deleteRequests } })
            .promise();
    }
};

export async function clearAllTables(dynamoDb: DynamoDB): Promise<void> {
    const { TableNames } = await dynamoDb.listTables().promise();
    for (const tableName of TableNames) {
        await clearTable(dynamoDb, tableName);
    }

    await new Promise(resolve => setTimeout(resolve, 500));
}

Answer 3 · 2019-09-05T23:22:06.000Z

Is there a way to utilize support for transactions to make this even faster?

Answer 4 · 2019-09-06T13:13:14.000Z

@blakedietz Do you mean batch operations? Yes, batch operations would have definitely been faster. Another possible approach might be to:

Read the table schema definition.
Delete the table in one operation
Recreate the table using the schema definition from (1)

Answer 5 · 2019-10-12T22:44:22.000Z

@goldsam FYI - I rewrote this library with this use case in mind:

https://github.com/freshollie/jest-dynalite

I used dynalite as a mock for dynamo instead of dynamodb-local for several reasons.

Firstly, dynalite is much lighter and so allows us to spin up a single instance for each runner.
Secondly, dynalite allows tables to be created and destroyed quickly.
Thirdly, dynalite does not need java to run.

jest-dynalite provides isolation between tests and between test suites. Give it a go.

Answer 6 · 2019-10-17T10:41:41.000Z

Great job, @freshollie. I've mentioned jest-dynalite in README: https://github.com/shelfio/jest-dynamodb#alternatives

Answer 7 · 2020-09-07T08:48:31.000Z

Having the same problem myself. Good thing I found this issue and now I know it's not just me. 🙂

Answer 8 · 2021-10-17T17:26:48.000Z

I'm using @goldsam utils (thank you very much for sharing!), adapted to TypeScript and dynamoose (it was already 95% compatible, though), and I found an interesting issue I want to share here. Keep reading if you are trying the same and running multiple tests fails, but running them individually works just fine.

By clearing your tables all at once on every run, you may end up in a race condition because jest would try to optimize and run multiple tests simultaneously. You may end up having tests affect other tests by clearing the entire tables in the middle of running tests.

If this is your case, an easy fix is just to run tests sequentially. I prefer this to other options.

Other options:

Ensure tests don't get affected by other tests or existing data (sometimes hard)
Make each test clean its own data. This approach prevents you from re-using mock data on multiple tests.

Answer 9 · 2022-08-29T20:57:04.000Z

Just to add on that thread, as we had a lot of discussions here about it (and thanks everyone for their hard work on that!):

jest-dynamodb is based on the AWS official DynamoDB local implementation. "Official" indeed, but this is not their real implementation of their service,
jest-dynalite is based on dynalite, an implementation of Amazon's DynamoDB built on LevelDB.

dynalite doesn't support transactions and DynamoDB streams, so if you need them, that's an immediate show-stopper.

You can't really unfortunately rely on the previous workaround mentioned on this issue: DeleteRequest are write operations, and because your read operations have eventual consistency (whatever you request, see below), you might get "zombie" items after deleting each table items, which is probably not what you expect from a unit-test isolation perspective.

This issue is quite rare, but testing other 10,000 tests show this issue from time to time. It makes a flaky test. There is no way around it, as one limitation of AWS DynamoDB local is to not acknowledge strong-consistency reads:

Read operations are eventually consistent. However, due to the speed of DynamoDB running on your computer, most reads appear to be strongly consistent.
Source (Emphasis on word "most" added).

Said differently, starting a test by assuming that the database is empty, requires strong-consistency reads if you "clean" a table before running a test. And you can't assume that because of DynamoDB local limitations.

We implemented the solution 1. in @goldsam post upper (creating a new table for each test), as we believe it is, at least conceptually, and whether they said it was ugly, a classic test isolation strategy (avoiding collisions by partitioning space), and the best approach you could have after spinning a new database instance each time (like the excellent jest-dynalite does).

This offers some important features:

Each single test is truly isolated from others,
Because of that, they can run in parallel, which becomes quickly important if you try to run thousands of tests locally in less than a minute, or in less than 10 minutes on CI, over multiple jest workers.

The following assume you follow the single-table design with DynamoDB.

Here is our setup:

// jest-dynamodb-config.js

module.exports = {
   tables: [], // A new table is created before each test, so don't declare anything here
   port: 8000,
   options: [
      '-sharedDb',
      
      // This uses `:memory:` sqlite in-memory table which is
      // ways of magnitude faster than their file-relative usage.
      // This makes creating a new table instant, and accelerates all database operations.
      // This should be probably a `jest-dynamodb` default.
      '-inMemory', 
   ]
};

In your jest tests:

beforeEach(async () => {
   // reset all modules to isolate every single tests...
   jest.resetModules();
   jest.mock('./../store/client');

   const { prepareNewTable } = require('./helper');
   const tableName = await prepareNewTable();
   
   // ... so that your database use the new table name for every test
   process.env.TableName = tableName;
});

helper creates a new table based on the CloudFormation template, but gives the table name a random name each time to provide isolation:

// helper.js

'use strict';

const crypto = require('crypto');
const fs = require('fs');

const { dynamoDBClient } = require('./../store/client');
const { CreateTableCommand } = require("@aws-sdk/client-dynamodb");
const yaml = require('js-yaml');
const { CLOUDFORMATION_SCHEMA } = require('cloudformation-js-yaml-schema');


// ---------------------------------------------------------------------------

const getCloudFormationDynamoDbTableSchema = () => {
   const templateYaml = '../template.yaml';
   const templateYamlContent = fs.readFileSync(templateYaml, 'utf8');
   const cf = yaml.load(templateYamlContent, { schema: CLOUDFORMATION_SCHEMA });

   let resources = [];
   Object.keys(cf.Resources).forEach(item => {
      resources.push(cf.Resources[item]);
   });

   const tables = resources
      .filter(r => r.Type === 'AWS::DynamoDB::Table')
      .map(r => {
         let table = r.Properties;
         delete table.TableName;                // will be renamed
         delete table.TimeToLiveSpecification;  // errors on DynamoDB local
         return table;
      });

   return tables[0];  // we have only one table per service
};

const TABLE_SCHEMA = getCloudFormationDynamoDbTableSchema ();


// ---------------------------------------------------------------------------

const prepareNewTable = async () => {
   const tableName = crypto.randomBytes(16).toString('hex');

   await dynamoDBClient.send(
      new CreateTableCommand({
         ...TABLE_SCHEMA,
         TableName: tableName,
      })
   );

   return tableName;
};


// ---------------------------------------------------------------------------

module.exports = {
   prepareNewTable,
};

The store implementation:

'use strict';

const {
   GetCommand,
   QueryCommand,
   TransactWriteCommand,
   UpdateCommand,
} = require('@aws-sdk/lib-dynamodb');

const { dynamoDBClient } = require('./client');

// this gets evaluated on each single test,
// because modules are reset for each single test
const { TableName } = process.env;

Running around 1000 tests for a service (with all the rest of the code) takes around 10 seconds on a 10-cpu computer, and around 2 minutes on GitHub actions, with DynamoDB taking the most time for each test. That's a bit slow, but that means you can probably run 5000 tests in around the ideal 10 minutes for CI, which should provide in most cases an excellent level of unit testing.

Finally, you can achieve this way to not have any sort of test infrastructure leakage in your production code.