Feature Request: Add support for clearing tables between tests
goldsam opened this issue ยท 9 comments
Testing would be easier if database state could be reset at the beginning of each test
Hey Sam!
I totally agree, this will simplify my tests as well. I'll put this on my agenda.
I ended up solving this by creating a helper script which explicitly deletes every item from one or all tables. I invoke these methods in a beforeEach
block to reset database state before each test. Although this works, it has some limitations.
What I quickly discovered was that Jest runs test files in parallel which becomes problematic for code using a shared resource such as DynamoDB due to race conditions. I ended up solving this (at least temporarily) by restructuring my tests so that all code using a given table is invoked from one root test file and thus executed sequentially.
A better solution would be to create distinct "environments" for each test. I can think of a few approaches:
- Within the same DynamoDB instance, create table names uniquely for each test using some kind of random post/prefix. That post/prefix could then be injected into the test so it knows what table names to use. This seems ugly to me.
- Instead, allow the test code to simply setup and teardown the database state manualy. It would be nice to simply invoke a method and pass the
tables
configuration array similar to what is supported injest-dynamodb-config.js
. Admittedly, This solution requires no changes to your library. - Spin up a new DynamoDB instance for each test and inject the port number of that instance into the test environment. This has a lot of cost overhead but provides a strong level of data isolation.
Although this is arguably an unrelated or secondary problem at best, its seemed worthwhile to at least start a discussion.
Below is the dynamodb-utils.ts
helper script I mentioned above:
import * as AWS from 'aws-sdk';
import { AttributeMap, KeySchemaElement, Key } from 'aws-sdk/clients/dynamodb';
import { DynamoDB } from 'aws-sdk';
function itemToKey(item: AttributeMap, keySchema: KeySchemaElement[]): Key {
let itemKey: Key = {};
keySchema.map(key => {
itemKey = { ...itemKey, [key.AttributeName]: item[key.AttributeName] };
});
return itemKey;
};
export async function clearTable(dynamoDB: AWS.DynamoDB, tableName: string): Promise<void> {
// get the table keys
const { Table = {} } = await dynamoDB
.describeTable({ TableName: tableName })
.promise();
const keySchema = Table.KeySchema || [];
// get the items to delete
const scanResult = await dynamoDB.scan({
AttributesToGet: keySchema.map(key => key.AttributeName),
TableName: tableName,
ConsistentRead: true
}).promise();
const items = scanResult.Items || [];
if (items.length > 0) {
const deleteRequests = items.map(item => ({
DeleteRequest: { Key: itemToKey(item, keySchema) },
}));
await dynamoDB
.batchWriteItem({ RequestItems: { [tableName]: deleteRequests } })
.promise();
}
};
export async function clearAllTables(dynamoDb: DynamoDB): Promise<void> {
const { TableNames } = await dynamoDb.listTables().promise();
for (const tableName of TableNames) {
await clearTable(dynamoDb, tableName);
}
await new Promise(resolve => setTimeout(resolve, 500));
}
Is there a way to utilize support for transactions to make this even faster?
@blakedietz Do you mean batch operations? Yes, batch operations would have definitely been faster. Another possible approach might be to:
- Read the table schema definition.
- Delete the table in one operation
- Recreate the table using the schema definition from (1)
@goldsam FYI - I rewrote this library with this use case in mind:
https://github.com/freshollie/jest-dynalite
I used dynalite as a mock for dynamo instead of dynamodb-local for several reasons.
Firstly, dynalite
is much lighter and so allows us to spin up a single instance for each runner.
Secondly, dynalite
allows tables to be created and destroyed quickly.
Thirdly, dynalite
does not need java
to run.
jest-dynalite
provides isolation between tests and between test suites. Give it a go.
Great job, @freshollie. I've mentioned jest-dynalite
in README: https://github.com/shelfio/jest-dynamodb#alternatives
Having the same problem myself. Good thing I found this issue and now I know it's not just me. ๐
I'm using @goldsam utils (thank you very much for sharing!), adapted to TypeScript and dynamoose (it was already 95% compatible, though), and I found an interesting issue I want to share here. Keep reading if you are trying the same and running multiple tests fails, but running them individually works just fine.
By clearing your tables all at once on every run, you may end up in a race condition because jest would try to optimize and run multiple tests simultaneously. You may end up having tests affect other tests by clearing the entire tables in the middle of running tests.
If this is your case, an easy fix is just to run tests sequentially. I prefer this to other options.
Other options:
- Ensure tests don't get affected by other tests or existing data (sometimes hard)
- Make each test clean its own data. This approach prevents you from re-using mock data on multiple tests.
Just to add on that thread, as we had a lot of discussions here about it (and thanks everyone for their hard work on that!):
jest-dynamodb
is based on the AWS official DynamoDB local implementation. "Official" indeed, but this is not their real implementation of their service,jest-dynalite
is based ondynalite
, an implementation of Amazon's DynamoDB built on LevelDB.
dynalite
doesn't support transactions and DynamoDB streams, so if you need them, that's an immediate show-stopper.
You can't really unfortunately rely on the previous workaround mentioned on this issue: DeleteRequest
are write operations, and because your read operations have eventual consistency (whatever you request, see below), you might get "zombie" items after deleting each table items, which is probably not what you expect from a unit-test isolation perspective.
This issue is quite rare, but testing other 10,000 tests show this issue from time to time. It makes a flaky test. There is no way around it, as one limitation of AWS DynamoDB local is to not acknowledge strong-consistency reads:
Read operations are eventually consistent. However, due to the speed of DynamoDB running on your computer, most reads appear to be strongly consistent.
Source (Emphasis on word "most" added).
Said differently, starting a test by assuming that the database is empty, requires strong-consistency reads if you "clean" a table before running a test. And you can't assume that because of DynamoDB local limitations.
We implemented the solution 1. in @goldsam post upper (creating a new table for each test), as we believe it is, at least conceptually, and whether they said it was ugly, a classic test isolation strategy (avoiding collisions by partitioning space), and the best approach you could have after spinning a new database instance each time (like the excellent jest-dynalite
does).
This offers some important features:
- Each single test is truly isolated from others,
- Because of that, they can run in parallel, which becomes quickly important if you try to run thousands of tests locally in less than a minute, or in less than 10 minutes on CI, over multiple
jest
workers.
The following assume you follow the single-table design with DynamoDB.
Here is our setup:
// jest-dynamodb-config.js
module.exports = {
tables: [], // A new table is created before each test, so don't declare anything here
port: 8000,
options: [
'-sharedDb',
// This uses `:memory:` sqlite in-memory table which is
// ways of magnitude faster than their file-relative usage.
// This makes creating a new table instant, and accelerates all database operations.
// This should be probably a `jest-dynamodb` default.
'-inMemory',
]
};
In your jest
tests:
beforeEach(async () => {
// reset all modules to isolate every single tests...
jest.resetModules();
jest.mock('./../store/client');
const { prepareNewTable } = require('./helper');
const tableName = await prepareNewTable();
// ... so that your database use the new table name for every test
process.env.TableName = tableName;
});
helper
creates a new table based on the CloudFormation template, but gives the table name a random name each time to provide isolation:
// helper.js
'use strict';
const crypto = require('crypto');
const fs = require('fs');
const { dynamoDBClient } = require('./../store/client');
const { CreateTableCommand } = require("@aws-sdk/client-dynamodb");
const yaml = require('js-yaml');
const { CLOUDFORMATION_SCHEMA } = require('cloudformation-js-yaml-schema');
// ---------------------------------------------------------------------------
const getCloudFormationDynamoDbTableSchema = () => {
const templateYaml = '../template.yaml';
const templateYamlContent = fs.readFileSync(templateYaml, 'utf8');
const cf = yaml.load(templateYamlContent, { schema: CLOUDFORMATION_SCHEMA });
let resources = [];
Object.keys(cf.Resources).forEach(item => {
resources.push(cf.Resources[item]);
});
const tables = resources
.filter(r => r.Type === 'AWS::DynamoDB::Table')
.map(r => {
let table = r.Properties;
delete table.TableName; // will be renamed
delete table.TimeToLiveSpecification; // errors on DynamoDB local
return table;
});
return tables[0]; // we have only one table per service
};
const TABLE_SCHEMA = getCloudFormationDynamoDbTableSchema ();
// ---------------------------------------------------------------------------
const prepareNewTable = async () => {
const tableName = crypto.randomBytes(16).toString('hex');
await dynamoDBClient.send(
new CreateTableCommand({
...TABLE_SCHEMA,
TableName: tableName,
})
);
return tableName;
};
// ---------------------------------------------------------------------------
module.exports = {
prepareNewTable,
};
The store implementation:
'use strict';
const {
GetCommand,
QueryCommand,
TransactWriteCommand,
UpdateCommand,
} = require('@aws-sdk/lib-dynamodb');
const { dynamoDBClient } = require('./client');
// this gets evaluated on each single test,
// because modules are reset for each single test
const { TableName } = process.env;
Running around 1000 tests for a service (with all the rest of the code) takes around 10 seconds on a 10-cpu computer, and around 2 minutes on GitHub actions, with DynamoDB taking the most time for each test. That's a bit slow, but that means you can probably run 5000 tests in around the ideal 10 minutes for CI, which should provide in most cases an excellent level of unit testing.
Finally, you can achieve this way to not have any sort of test infrastructure leakage in your production code.