-
Test Structures
Optimal way on structuring test which is aiming for readability and maintainability.
-
In this secion, we would share testing techniques outside the famous test pyramid (unit, integration and e2e). How micro-service architecture affects testing and what are the efficient way to test it.
-
Measuring the quality of the implemented Tests
-
Leveraging continuous testing for failing fast and continuous improvement
-
My closing thoughts about the current state of testing techniques in todays era.
-
My research materials that help me a lot when writing this.
-
The citations I made when writing this.
🌟 "Test only as much as needed, strive to keep it nimble, sometimes it's even worth dropping some tests and trade reliability for agility and simplicity"
-- Yoni Goldberg
Test structures could be differ depending on the unit under tests.
There are some unit that we need to test which is straight up simple that we can use a simple assertions with no or minimal setup.
There are some unit we need to tests that have several methods.
There are also functions that are data-driven and therefore we need a technique called Table-Driven Testing.
Although, in any kind of situations there are 3 important parts we need to uphold when testing a unit.
-
What is being tested?
E.g., the
ProductRepository#add
method -
Under what circumstancess and scenario?
E.g., no price is passed to the method
-
What is the expected result?
E.g., the new product is not approved.
// (1.) What is being tested
describe('isOver18', function() {
// (2.) Under what scenario
describe('WHEN the age is under 18', function() {
// (3.) Expected result
it('should return false', function() {
expect(isOver18({ age: 16 })).to.be.false;
});
});
});
-
The function we are testing is
isOver18
. -
In this part, we define when the passed argument is under 18. You can also structure the test by combining the
WHEN
keyword together with the assertion description. E.g:it('should return false, when the age is under 18, ', function() { ... })
// (1.) Unit under test
describe('ProductRepository', function() {
// (2.) Category of the unit test
describe('Add New Product', function() {
// (3.) Pre-conditions, preparation
describe('GIVEN no price is specified', function() {
beforeEach(function() {
this.product = { name: 'Kraker oots' };
});
// (4.) What scenario
context('WHEN #add is called', function() {
beforeEach(function() {
this.newProduct = new ProductRepository().add({this.product});
});
// (5.) Expected Result
it('should set the status as PENDING', function() {
expect(this.newProduct.status).to.equals('PENDING')
});
});
});
});
describe('Update Product', function() {
describe('GIVEN no product found', function() {
...
describe('WHEN #update is called', function() {
...
it('should throw an error', function() {
....
});
});
});
});
});
-
The unit under tests is only
ProductRepository
. -
Since
ProductRepository
have more than one methods, then we can categorize it via methods. In our caseAdd New Product
andUpdate Product
are the categories. -
Some of the tests require pre-conditions to test certain results. We can describe it by starting with a keyword,
GIVEN
. In addition, we can also prepare/setup the requirements for this scenario. -
In this part, we can use the requirements that we prepare in (3.) together with the function we need to test. In this case, we should start it with the WHEN keyword.
-
Lastly, we should define on what to expect when calling the functions given those requirements we prepare and calling the function under certain cicumstances. In BDD, the assertion part is from the
THEN
keyword.
// (1.) What is being tested
describe('isEligibleToVote', function() {
// (2.) Table-driven testing technique setup
const tests = [
{
name: 'should return false, when age is under 18',
input: { age: 17, naturalizationDate: new Date(), country: 'Philippines' },
expected: false,
},
{
name: 'should return false, when the naturalization date is null',
input: { age: 18, naturalizationDate: null, country: 'Philippines' },
expected: false,
},
{
name: 'should return false, when the country is not equals to Philippines',
input: { age: 18, naturalizationDate: new Date(), country: 'Brazil' },
expected: false,
},
...
];
// (3.) Applying the table testing
tests.forEach(test => {
it(test.name, function() {
expect(isEligibleToVote(test.input)).to.equals(test.expected)
});
});
});
- The unit under test is
isEligibleToVote
- Here, we setup the table testing
💡 TAKE NOTE that WE NEED to name the test. So that later on, we could treat this as a clear-cut documentation for this function
⚠️ WARNING Table-driven testing is still unconventional in NodeJS environment YET. It also breaks the default convention ofeslint-plugin-mocha
. Fortunately, you can customize yourrules
.
You could based the scenarios from the documentation requirements
-
Based your test from the documentation requirements. It might be written formally, or just communicated via email or Slack. Ask help from the QA or PM!
-
Name your tests using the production language. Naming your tests, most likely similar with the business' language requirements, using scenarios and expectation, will help correlate the code with the business expectations.
-
Minimizes the communication gap between technical and non-technical people. It is easy also for those people who didn't write code like the QA or people who have little knowledge about the feature like DevOps fellows and also the future YOU.
A - rrange
Setup your test, this could be under the
GIVEN
keywordA - ct
Execute the unit under test, it can also use the variables that was being set up in
GIVEN
sectionA - Assert
Assert the result and make sure it satisfies the expectation
This will allow the reader or reviewer assess your code more easily.
😞 BAD EXAMPLE
describe('Customer classifier', function() {
it('should be classified as premium', function() {
const customerToClassify = {
spend: 505,
joined: new Date(),
id: 1
};
const DBStub = sinon.stub(dataAccess, 'getCustomer').reply({
id: 1,
classification: 'regular',
});
const actual = CustomClassifier.classify(customerToClassify);
expect(actual).to.equals('premium');
});
});
Although, the example is pretty straight forward in the developer's point of view. It is horrible for the non-technical people. IT DOES NOT state any kind of steps why the expected result should be classified as premium.
🥳 THIS IS MUCH BETTER
describe('Customer classifier', function() {
describe('GIVEN the classification is regular', function() {
before(function() {
this.customerToClassify = { spend: 505, joined: new Date(), id: 1 };
this.DBStub = sinon.stub(dataAccess, 'getCustomer').reply({
id: 1,
classification: 'regular',
});
});
context('WHEN customer spent more than $500', function() {
before(function() {
this.receivedClassification = CustomClassifier.classify(this.customerToClassify);
});
it('should be classified as premium', function() {
expect(this.receivedClassification).to.equals('premium');
});
});
});
});
The test runner would output this eligibly and readable for technical and non-technical people.
Coding your tests in a declarative-style allow the reader to get the grab instantly without spending even a single brain-CPU cycle.
Base on the examples above we are structuring our tests via BDD-style.
😞 BAD
test("When asking for an admin, ensure only ordered admins in results", () => {
//assuming we've added here two admins "admin1", "admin2" and "user1"
const allAdmins = getUsers({ adminOnly: true });
let admin1Found,
adming2Found = false;
allAdmins.forEach(aSingleUser => {
if (aSingleUser === "user1") {
assert.notEqual(aSingleUser, "user1", "A user was found and not admin");
}
if (aSingleUser === "admin1") {
admin1Found = true;
}
if (aSingleUser === "admin2") {
admin2Found = true;
}
});
if (!admin1Found || !admin2Found) {
throw new Error("Not all admins were returned");
}
});
🥳 GOOD
describe('User', function() {
describe('GIVEN the list of accounts', function() {
before(function() {
sinon.stub(UserModel, 'find').resolves([...]);
});
context('WHEN asking for admin', function() {
before(function() {
this.allAdmins = getUsers({ adminOnly: true });
});
it('should ensure only ordered admins in results', function () {
expect(this.allAdmins).to.include.all.ordered.members([
'admin1',
'admin2',
]);
});
});
});
});
Testing the internals brings huge overhead for almost nothing. If you code/API delivers the right results, should you really invest your next 3 hours in testing HOW it worked internally and then maintain the fragile tests?
Whenever a public behavior is checked, the private implementation is also implicitly tested and your tests will break only if there is a certain problem (e.g. wrong output).
😞 BAD
class ProductService {
//this method is only used internally
//Change this name will make the tests fail
calculateVATAdd(priceWithoutVAT) {
return { finalPrice: priceWithoutVAT * 1.2 };
//Change the result format or key name above will make the tests fail
}
//public method
getPrice(productId) {
const desiredProduct = DB.getProduct(productId);
finalPrice = this.calculateVATAdd(desiredProduct.price).finalPrice;
return finalPrice;
}
}
it("White-box test: When the internal methods get 0 vat, it return 0 response", async () => {
//There's no requirement to allow users to calculate the VAT, only show the final price. Nevertheless we falsely insist here to test the class internals
expect(new ProductService().calculateVATAdd(0).finalPrice).to.equal(0);
});
- Test doubles are necessary evil because they are coupled to the application internals, yet some provide immense value
Before using test doubles, ask a very simple question: Do I use it to test functional that appears, or could appear, in the requirement document? If no, it's a white-box testing smell.
😞 BAD
it("When a valid product is about to be deleted, ensure data access DAL was called once, with the right product and right config", async () => {
//Assume we already added a product
const dataAccessMock = sinon.mock(DAL);
//hmmm BAD: testing the internals is actually our main goal here, not just a side-effect
dataAccessMock
.expects("deleteProduct")
.once()
.withArgs(DBConfig, theProductWeJustAdded, true, false);
new ProductService().deletePrice(theProductWeJustAdded);
dataAccessMock.verify();
});
🥳 GOOD
describe('WHEN a valid product is about to be deleted', function() {
before(function () {
sinon.spy(Emailer.prototype, 'sendEmail');
new ProductService().deletePrice(...);
});
it('should ensure an email is sent', function() {
expect(Emailer.sendEmail.calledOnce).to.be.true;
});
});
💡 TAKE NOTE Spies are focused in testing the requirements but as a side-effect are unavoidably touching to the internals.
Often production bug are revealed under some very specific and suprising input -- the more realistic the test input is, the greater the chances are to catch bugs early.
Use dedicated libraries like chance
or Faker
to generate pseudo-real data that resembles the variety and form of production data.
😞 BAD
const addProduct = (name, price) => {
const productNameRegexNoSpace = /^\S*$/; //no white-space allowed
if (!productNameRegexNoSpace.test(name)) return false; //this path never reached due to dull input
//some logic here
return true;
};
test("Wrong: When adding new product with valid properties, get successful confirmation", async () => {
//The string "Foo" which is used in all tests never triggers a false result
const addProductResult = addProduct("Foo", 5);
expect(addProductResult).toBe(true);
//Positive-false: the operation succeeded because we never tried with long
//product name including spaces
});
🥳 GOOD
it("Better: When adding new valid product, get successful confirmation", async () => {
const addProductResult = addProduct(faker.commerce.productName(), faker.random.number());
//Generated random input: {'Sleek Cotton Computer', 85481}
expect(addProductResult).to.be.true;
//Test failed, the random input triggered some path we never planned for.
//We discovered a bug early!
});
Typically we chooose a few input sample for each test. Even when the input format resembles real-world data. However, in productio, an API that is called with 5 parameters can be invoked with thousands of different permutations, one of them might render our process down Fuzz Testing
import fc from "fast-check";
describe("Product service", () => {
describe("Adding new", () => {
//this will run 100 times with different random properties
it("Add new product with random yet valid properties, always successful", () =>
fc.assert(
fc.property(fc.integer(), fc.string(), (id, name) => {
expect(addNewProduct(id, name).status).toEqual("approved");
})
));
});
});
😞 BAD
before(async () => {
//adding sites and admins data to our DB. Where is the data? outside. At some external json or migration framework
await DB.AddSeedDataFromJson('seed.json');
});
it("When updating site name, get successful confirmation", async () => {
//I know that site name "portal" exists - I saw it in the seed files
const siteToUpdate = await SiteService.getSiteByName("Portal");
const updateNameResult = await SiteService.changeName(siteToUpdate, "newName");
expect(updateNameResult).to.be(true);
});
it("When querying by site name, get the right site", async () => {
//I know that site name "portal" exists - I saw it in the seed files
const siteToCheck = await SiteService.getSiteByName("Portal");
expect(siteToCheck.name).to.be.equal("Portal"); //Failure! The previous test change the name :[
});
🥳 GOOD
it("When updating site name, get successful confirmation", async () => {
//test is adding a fresh new records and acting on the records only
const siteUnderTest = await SiteService.addSite({
name: "siteForUpdateTest"
});
const updateNameResult = await SiteService.changeName(siteUnderTest, "newName");
expect(updateNameResult).to.be(true);
});
When trying to assert that some input triggers an error, it might look right to use try-catch finally and asserts that the catch clause was entered.
😞 BAD
it("When no product name, it throws error 400", async () => {
let errorWeExceptFor = null;
try {
const result = await addNewProduct({});
} catch (error) {
expect(error.code).to.equal("InvalidInput");
errorWeExceptFor = error;
}
expect(errorWeExceptFor).not.to.be.null;
//if this assertion fails, the tests results/reports will only show
//that some value is null, there won't be a word about a missing Exception
});
🥳 GOOD
it("When no product name, it throws error 400", async () => {
await expect(addNewProduct({}))
.to.eventually.throw(AppError)
.with.property("code", "InvalidInput");
});
Different tests must run on different scenarios: quick smoke, IO-less, tests should run when a developer saves or commits a file, full end-to-end tests usually run when a new pull requests is submitted etc.
🥳 GOOD
//this test is fast (no DB) and we're tagging it correspondigly
//now the user/CI can run it frequently
describe("Order service", function() {
describe("Add new order #cold-test #sanity", function() {
test("Scenario - no currency was supplied. Expectation - Use the default currency #sanity", function() {
//code logic here
});
});
});
Apply some structure to your test suite so an occasional visitor could easily understand the requirements (tests are the best documentation) and the various scenarios that are being tested.
😞 BAD
test("Then the response status should decline", () => {});
test("Then it should send email", () => {});
test("Then there should not be a new transfer record", () => {});
🥳 GOOD
// Unit under test
describe("Transfer service", () => {
//Scenario
describe("When no credit", () => {
//Expectation
test("Then the response status should decline", () => {});
//Expectation
test("Then it should send email to admin", () => {});
});
});
Learn and practice TDD principles
Given all the dramatic changes that we've seen in the recent 10 years (Microservices, cloud, serverless)
only like any other model, despite its usefulness, it must be wrong sometimes. For example, consider an IoT application that ingests many events into a message-bus like Kafka/RabbitMQ, which then flow into some data-warehouse and are eventually queried by some analytics UI. Should we really spend 50% of our testing budget on writing unit tests for an application that is integration-centric and has almost no logic? As the diverisity of application types increase (bots, crypto, Alexa-skills) greater are the chances to find scenarios where testing pryramid is not the best match.
⚠️ WARNING, The TDD argument in the software world takes a typical false-dichotomy face, some preach to use it everywhere, other think it's the devil. Everyone who speaks in absolutes is wrong.
- Gathering logs from different machines
- Capturing performance metrics real-time
- Tracing individual requests running through different machines.
- Each unit test covers a tiny portion of the application and it's expensive to cover the whole, whereas end-to-end testing easily covers a lot of ground but is flaky and slower, why not apply a balanced approach and write tests that are bigger than unit tests but smaller than end-to-end testing? Component testing is the unsung song of the testing world -- they provide the best from both worlds: reasonable performance and a possibility to apply TDD patterns + realistic and great coverage.
Component tests focus on the Microservice 'unit', they work against the API, don't mock anything which belongs to the Microservice itself (e.g. DB, or at least the in-memory version of that DB). but stub anything that is external like calls to other Microservices.
So your Microserve has multiple clients. Consumer-drive contracts and the framework PACT were born to formalize this process with a very disruptive approach. PACT can record the client expectation and put in a shared location, "broker", so the server can pull the expectations and run on every build using PACT library to detect broken contracts -- a client expectation that is not met. By doing so, all the server-client API mismatches are caught early during build/CI and might save you a great deal of frustration.
Many avoid middleware testing because they represent a small portion of the system and require a live Express server. Both reasons are wrong -- Middlewares are small but affect all or most of the requests and can be tested easily as pure functions that get { request, response }
JS objects.
Using static analysis tools helps by giving objective ways to improve code quality and keep your code maintainable.
Most software testings are about logic and data only, but some of the worst things that happen (and are really hard to mitigate) are infrastructural issues.
For example, did you ever test what happens when your process memory is overloaded, or when the server/process dies, or does your monitoring system realizes when the API becomes 50% slower?
Chaos Engineering, it aims to provide awareness, frameworks and tools for testing our app resiliency for chaotic issues.
chaos monkey
kube-monkey
- Get enough coverage for being confident ~80% seem to be the lucky number The purpose of testing is to get enough confidence for moving fast, obviously the more code is tested the more confident the team can be. Coverage is a measure of how many code lines (and branches, statements, etc) are bing reached by the tests.
The long answer is that it depends on many factors like the type of application -- if you're building the next generation of Airbus or about medical application then 100% is a must. For a cartoon pictures website 50% might be too much. Most of the testing enthusiasts claim that the right coverage threshold is contextual, most of them also mention the number 80% as a rule of thumb
-
Inspect coverage reports to detect untested areas and other oddities Some issues sneak just under the radar and are really hard to find using traditional tools. These are not really bugs but more of suprising application behavior that might have severe impact.
-
Measure logical coverage using mutation testing
The Traditional Coverage metric often lies: It may show you 100% code coverage, but none of your functions, even not one, return the right response.
Mutation-based testing is here to help by measure the amount of code that was actually TESTED not just VISITED. Styker
-
It intentionally changes the code and "plant bugs". For example, the code
newOrder.price === 0
becomesnewOrder.price !== 0
. This "bugs" are called mutations. -
It runs the tests, if all succeed then we have a problem -- the tests didn't serve their purpose of discovering bugs, the mutations are so-called survived. If the tests failed, then great, the mutations where killed.
-
-
Preventing test code issues with Test linters
- Enrich your linters and abort builds that have linting issues
- Shorten the feedback loop with local developer-CI
- Perform e2e testing over a true production
- Parallelize Test Execution
- Stay Away from Legal Issues using License and Plagiarism Check
- Constantly inspect for vulnerable dependencies
- Automate dependency updates
- Others
-
Use a declarative syntax
-
Opt for a vendor that has native Docker Support
-
Fail early, run your fastest tests first
-
Create multiple pipelines/jobs for each event, reuse steps between them.
-
Never embed secrets in a job declaration, grab them from a secret store or from the job's configuration
-
Explicitly bump version in a release or at least ensure the developer did so
-
build only once and perform all the inspections over the single build artifact (e.g. Docker Image)
-
Test in an ephemeral environment that doesn't state between builds. Caching
node_modules
might be the only expcetion -
Build matrix: Run the same CI steps using multiple Node Version
-