-
Unit test is a good negative indicator, but it's a bad positive indicator:
- If you find that code is hard to unit test, it’s a strong sign that the code needs improvement. (code is tight coupling).
- The fact that you can easily unit test your code base doesn’t necessarily mean it’s of good quality. The project can be a disaster even when it exhibits a high degree of decoupling
The difference in growth dynamics between projects with good and bad tests. A project with badly written tests exhibits the properties of a project with good tests at the beginning, but it eventually falls into the stagnation phase.
-
You can’t achieve the goal of unit testing by just throwing more tests at the project. You need to consider both the test’s value and its upkeep cost. The cost component is determined by the amount of time spent on various activities:
- Refactoring the test when you refactor the underlying code
- Running the test on each code change
- Dealing with false alarms raised by the test
- Spending time reading the test when you’re trying to understand how the underlying code behaves
-
Code is a liability, not an asset. The more code you introduce, the more you extend the surface area for potential bugs in your software, and the higher the project’s upkeep cost. It’s always better to solve problems with as little code as possible. And Tests are code, too. You should view them as the part of your code base that aims at solving a particular problem: ensuring the application’s correctness. Unit tests, just like any other code, are also vulnerable to bugs and require maintenance.
It’s the same situation as with the ability to unit test the code: coverage metrics are a good negative indicator but a bad positive one.
If a metric shows that there’s too little coverage in your code base—say, only 10%—that’s a good indication that you are not testing enough. But the reverse isn’t true: even 100% coverage isn’t a guarantee that you have a good-quality test suite. A test suite that provides high coverage can still be of poor quality.
This metric shows the ratio of the number of code lines executed by at least one test and the total number of lines in the production code base.
Example:
def is_long_string(str) # 1
if str.length > 10 # 1
return true #2
end# 1
return false # 1
end # 1
# unit test
assert_equal false, is_long_string('abcde')
1 Covered by the test
2 Not covered by the test
This gives us 4/5 = 0.8 = 80% code coverage.
If I refactor code like this:
def long_string?(str)
str.length > 10 ? true : false
end
This gives us 100% code coverage. But the test still verifies the same number of possible outcomes.
As you see, i didn't improve the test suite with this refactoring.
Branch coverage provides more precise results than code coverage because it helps cope with code coverage’s shortcomings. Instead of using the raw number of code lines, this metric focuses on control structures, such as if and switch statements.
def long_string?(str)
str.length > 10 ? true : false
end
There are 2 branches in above example: one for the situation when the length of the string argument is greater than five characters, and the other one when it’s not. The test covers only one of these branches, so the branch coverage metric is 1/2 = 0.5 = 50%
Although the branch coverage metric yields better results than code coverage, you still can’t rely on either of them to determine the quality of your test suite, for two reasons:
- You can’t guarantee that the test verifies all the possible outcomes of the system under test.
- No coverage metric can take into account code paths in external libraries.
- Example for (1)
class Example
def initialize
@str = ''
@is_only_number = false
end
def check_str(str)
is_number = /\d+/.match(str) ? true : false
@is_only_number = is_number
is_number
end
end
# a bad test code:
test 'check_str' do
example= Example.new
assert_equal true, check_str('12323')
assert_equal false, check_str('abc12a')
end
The check_str
method now has two outcomes: an explicit one, which is encoded by the return value; and an implicit one, which is the new value of the property. And in spite of not verifying the second, implicit outcome, the coverage metrics would still show the same results: 100% for the code coverage and 100% for the branch coverage. As you can see, the coverage metrics don’t guarantee that the underlying code is tested, only that it has been executed at some point. If you remove @is_only_number = is_number
line, the test code still pass.
- Example for (2) - No coverage metric can take into account code paths in external libraries
# bert is 3rd party libary
class Bert
def self.encode(obj)
case obj
when Symbol then "symbol"
when String then "string"
else
fail(obj)
end
end
end
# code on our application
def encode(val)
Bert.encode(val)
end
# test code
test "encode should return class name of value" do
assert_equal 'string', encode("abc")
end
The branch coverage metric shows 100%, and the test verifies all components of the method’s outcome.
It has a single such component anyway—the return value. At the same time, this test is nowhere near being exhaustive.
It doesn’t take into account the code paths the bert
libray’s encode
method may go through.
The Bert.encode
has plenty of branches that are hidden from the test and that might lead to different results.
Here are just a few possible arguments:
null
valueinterger
value ....
This is not to say that coverage metrics should take into account code paths in external libraries (they shouldn’t), but rather to show you that you can’t rely on those metrics to see how good or bad your unit tests are. Coverage metrics can’t possibly tell whether your tests are exhaustive; nor can they say if you have enough tests.
An extreme Example:
class Example
def initialize
@str = ''
@is_only_number = false
end
def check_str(str)
is_number = /\d+/.match(str) ? true : false
@is_only_number = is_number
is_number
end
end
test 'check_str' do
example = Example.new
result = check_str('12323')
result2 = check_str('abc12a')
end
This test has both code and branch coverage metrics showing 100%. But at the same time, it is completely useless because it doesn’t verify anything.
A story from the author of the book
- Protection against regressions
- Resistance to refactoring
- Fast feedback
- Maintainability
-
Note that it’s not only the amount of code that matters, but also its complexity and domain significance. Code that represents complex business logic is more important than boilerplate code—bugs in business-critical functionality are the most damaging.
-
Furthermore, in addition to your code, the code you didn’t write also counts: for example, libraries, frameworks, and any external systems used in the project. That code influences the working of your software almost as much as your own code. For the best protection, the test must include those libraries, frameworks, and external systems in the testing scope, in order to check that the assumptions your software makes about these dependencies are correct.
The second attribute of a good unit test is resistance to refactoring—the degree to which a test can sustain a refactoring of the underlying application code without turning red (failing).
This situation is called a false positive. A false positive is a false alarm. It’s a result indicating that the test fails, although in reality, the functionality it covers works as intended.
To evaluate how well a test scores on the metric of resisting to refactoring, you need to look at how many false positives the test generates. The fewer, the better.
Example:
def take_x_character(str, x)
str.slice(0, x)
end
# a bad test code
test "take_x_character" do
a_str = 'aaaa'
a_str.expects(:slice).with(0, 3).returns("cds").once
assert_equal "cds", take_x_character(a_str, 3)
end
after 3 days, you decide to refactor your code as following. And the test failed.
def take_x_character(str, x)
str[0..(x-1)]
end