12 Keys to Write Senior-Level Tests (Test Desiderata in JavaScript)
"Desiderata" derives from latin for "things desired." Kent Beck wrote an article called "Test Desiderata". Together with Kelly Sutton, they created a YouTube playlist about that article and explain the desirable properties of automated tests. Some of these properties are trade-offs and not every test needs to have them.
The examples Kent and Kelly give are in Ruby. Consider this article a TL;DW with additions from me and a translation into JavaScript.
The examples in this tutorial use RITEway because of its genius API that forces you to write good tests and once Jest, where RITEway prohibits you from writing bad tests. I highly encourage you to watch Ken's and Kelly's playlist in addition to reading this article - even if you, like me, don't know Ruby. Their explanations are fun and enlightening.
Obviously you still want to learn Jest and Vitest because they are the most popular testing frameworks in JavaScript, but that's a topic for another article.
Let's dive into the principles.
Behavioral-Sensitive
Tests should be sensitive to the behavior of your code.
If you change the behavior of the code, the results of the tests should change, too. If you change the minimum wage in calculateSF
to 150 dollars, the tests for the San Francisco zip code fail.
Structure-Insensitive
Tests should be insensitive to the internals of your code's structure.
RITEway makes it nearly impossible to write structure sensitive tests because there is no mocking API like mock functions, spies or module mocks.
So the next example uses Jest to write structure-sensitive code by using spyOn
. You need to create an hourlyWageCalculator
object to be able to use spyOn
.
Here we made the assumption that calculateSF
exists and that we are using it. If we refactor the code to inline the San Francisco calculation, the tests fail because they are structure sensitive.
If you refactor code without changing its behavior, the test results should stay the same. The RITEway tests from the behavior section above are invariant to the structure of the code. You might now this property as "avoid testing implementation details".
Readable
"Tests, like code, are read more than written." - Kent Beck
Some things that make code more readable actually reduce the readability of tests.
You might see how most people work 40 hours per week and therefore create a defaultHours
constant and re-use it in your test setups. But, drying up your tests often makes them less readable.
Avoid writing a test that is a few lines, but incomprehensible and impossible to understand without reading a lot of other lines of shared setup code.
"You're not supposed to repeat yourself, unless you're supposed to repeat yourself." - Kent Beck
Tests should be self-contained and therefore tests can be verbose. The reader must be able to extract the motivation for the test and what happens in the test.
(Obviously, there are trade-offs and there are occasions where you want to generalize your tests setup.)
Writable
Tests should be easy to write. If the tests for your code are hard to write, it indicates your code has a bad API, is too tightly coupled or you lack good setup functions.
In these tests, firstName
, lastName
and address
could be deleted and are irrelevant for calculate
. Its the fault of the system that the tests are expensive relative to the code under test.
If you need dozens or hundreds of lines of test setup to test a few lines of code, that's a design problem, not a test problem. For example, your tests might use a lot of mocks because your code is too tightly coupled and has many side effects.
"If you're unable to test it, you designed it wrong." - Kent Beck
Fast
Tests should run quickly.
On average it takes your brain 23 minutes to re-focus on a task after it gets distracted. And those distractions can happen within seconds. Good tests help you stay in flow because they run fast.
Add this delay
function before each assert
in the tests.
Now run your tests and see how you feel. Watch how your mind drifts off while you wait for those tests to finish ...
You can run your unit tests in under a second by isolating side effects and composing your code from pure functions. Your CI / CD pipeline (including your integration and functional tests) should run in under 10 minutes.
Avoid interrupting flow to keep your engineering team productive.
Automated
Machines should do the testing, not humans.
Avoid Q&A teams having to manually test your release candidates for regressions. This is what automated tests are for.
The benefits of automated tests compound. Initially, it might be faster and cheaper to click through your app and writing the test takes longer.
However as your app grows, not investing into automated tests causes compounding debt.
- It gets harder to add tests.
- Manually going through your program's flow takes longer than a machine clicking through it.
- Machines have better memory than humans. Once a test is written, the machine doesn't forget the test case and prevents regressions.
- You pay a computer less per hour than an engineer.
- While the automation (CI / CD pipeline) runs, you can work on new features.
Automated tests guarantee program correctness (- when your tests are good).
Isolated
Isolated tests can run in any order and in parallel. Whether one test passes or not should have no effect on whether another test passes. To achieve this, the state needs to be the same before and after each test.
If you change the order of the tests, they both fail, even though the code is the same. If the code was right before, the tests should still pass and if it was wrong before, the tests should still fail.
In these tests the order matters because they both share mutable state. Tests should have no shared state and they should be isolated from their environment (e.g. DOM, database, scope etc.).
The tests depending on the order they run in points to a flaw in the code's design. Prefer to write code where you initialize the state locally (in the scope of the test). If your code is correct and must change the state, you should restore the initial state after each test.
Composable
"The confidence provided by one test should combine with the confidence provided by other tests." - Kent Beck
We ensure the correctness of inc
and double
and then ensure the correctness of the composed function incDouble
. The calculation of inc
is isolated from double
and vice versa and both are isolated from incDouble
.
To make your tests composable, your code needs to be composable, too.
Specific
A failing test should point you to the cause of failure.
If you break incDouble
by changing the order in the function composition, your test points you to the failure.
incDouble
✖ Given a number: should increment it by 1 and double the result
------------------------------------------------------------------
operator: deepEqual
diff: "42" => "41"
source: ...
The stack trace (given by RITEway in the source
key) tells us exactly which test failed and therefore which component broke.
If you restore incDouble
and break inc
by having it add 2 to your number, two tests break.
inc
✖ Given a number: should increment it by 1
--------------------------------------------
operator: deepEqual
diff: "42" => "43"
source: ...
double
✔ Given a number: should return the number doubled
incDouble
✖ Given a number: should increment it by 1 and double the result
------------------------------------------------------------------
operator: deepEqual
diff: "42" => "44"
source: ...
Using the stack trace you can find the test that points you to the cause of both errors. When you encounter multiple errors in your stack, the leafs are likely the source of the error.
Deterministic
Given the same code, tests should produce the same results.
A few years ago, I helped with a project that had a flaky test.
What made this test flaky, was the side effect in getShouldUpdateToken
. The selector creates a new Date
object. Occasionally, this would happen a millisecond after the test setup and the test would fail.
You can make this test deterministic by making the unit under test a pure function and explicitly passing the arguments to the function in the test.
Tests shouldn't be flaky (even though some test frameworks accommodated for unreliable tests). They should be independent of random number generations, times, dates, locations, language settings, operating systems etc.
BTW, the example that Kent and Kelly give in their video in Ruby excellently visualizes the problem for E2E tests.
Predictive
Passing tests should give you confidence to deploy to production.
Some environments and conditions are only created in production environments. Sometimes you can't know a condition is going to exist and even if you know about it, it's hard or impossible to replicate it.
If you find an error in production, you can write a test for it when you fix it and prevent the bug from resurfacing.
100% predictability can rarely be achieved in the real world.
To maximize predictability, aim for 100% case coverage instead of 100% code coverage by using unit tests and integration tests. You can have 100% of your code covered by unit tests, but your application is still broken.
Inspiring
Tests should give you that feeling of confidence.
Tests should make a difference to team productivity. Your team should be faster and merge more often. Good tests serve as documentation, help with onboarding and aid in understanding code that someone else wrote. (That someone else might be your past you.) They should allow you to increase your deploy frequency while decreasing your deploy risk. Tests make you relax. Tests allow you to rewrite or delete code.
"And finally, tests are a way to liberate energy, to move from worry to hope, creativity, and empathy, to demonstrate trustworthiness.
Passing tests should feel good." - Kent Beck