Measuring Software Test Failures Tutorial

Understand how to measure software test failures in order to improve the quality and speed at which you can produce fresh code.

Key Takeaways

  • Measuring software test failures is crucial for improving software quality and delivery speed, with the Test Failure Ratio (defect density) serving as a valuable metric for evaluating effectiveness and identifying areas for improvement.

  • Addressing common software test failure questions involves managing flaky tests, prioritizing tests based on their impact and historical data, and optimizing test suites to enhance performance.

  • Monitoring test suite run times, setting acceptable thresholds, and tracking testing frequency are essential for maintaining software quality and efficient delivery.

  • Recognizing the increasing failure rate of test suites and investigating the underlying causes helps in maintaining and enhancing software quality.

  • Launchable's Intelligent Test Failure Diagnostics streamlines the identification of crucial issues within failing tests, simplifying QA engineers' analysis.

You’ll never really learn from the past if you don’t look at it, and the same goes for your software testing, too. And while you can get deep into analyzing your test results, it’s much easier for you and your team to keep an eye on your test failure ratio to start and deep dive when needed.

Rather than taking test failures as a roadblock, let’s explore how you can better measure software test failures to increase delivery speed without sacrificing quality.

Measuring Software Test Failures

Test failure ratio, also known as the defect density or defect ratio, is a metric used to measure software quality. This metric is an excellent way to assess how effective your testing efforts are and even spot areas for improvement. It’s a pretty simple formula to figure out:

💡 Test Failure Ratio = Number of Defects found ÷ Size of the Software

For most teams, the formula only needs two data points:

  • Number of defects: Simply count up the number of defects you’ve found within the piece of software you’re trying to measure.

  • Size of the software: Most teams track this by counting every thousand lines of code (KLOC), but you can define this however you’d like as long as it can accurately represent the size and complexity of your software.

Let’s say we’re testing a component with about 30,000 lines of code and find 15 bugs. (15 ÷ 30,000) * 1000 = A ratio of 0.5 per 1000 lines of code, or 0.0005 per line. The higher the number, the more likely your software has a higher concentration of defects, impacting overall software quality. On the other hand, a lower number means you’ve got less chance of defects in your software, meaning it’s a quality product.

It’s important to note that no single metric can provide a clear picture into software quality. There are a multitude of other factors at play, such as defect severity, your testing process, and the context of the software itself, that all account for overall software quality. Your acceptable or expected defect density level can vary based on the type of software, project requirements, and industry standards, too.

How to Measure and Answer Common Software Test Failure Questions

Knowing the defect ratio for your software is a great way to get a peek at your software quality, but it cannot paint the full picture of software test suite health and failure insights for you. 

But there are some steps your team can take to answer common software test failure questions in the pursuit of quality.

Are Your Tests Flaky?

A flaky test is an automated test with inconsistent results — passing or failing without any changes to the codebase. Flaky tests are harmful to the quality of your releases and the morale of your team. If you think some of your tests may be flaky:

  1. Keep track of the historical pass/fail status of each test.

  2. Calculate the “flakiness rate” by dividing the number of times the test has failed by the number of times the test has been run.

If you suspect that some of your tests are consistently flaky, it’s probably time to take action. Your teams should investigate what is causing the test to produce inconsistent results, whether through test isolation, timing issues, or resource problems.

What Tests Should Be Prioritized?

You likely have a huge swath of tests available to run — but how do you know what tests should be run? Prioritizing tests is important to focus testing efforts on critical areas. To get a good baseline on what tests to prioritize consider:

  • Categorizing your tests based on impact, splitting them between critical, high, medium, and low impact tests.

  • Use code coverage metrics to identify areas that aren’t being tested or not tested often enough. Some good metrics are Branch coverage (number of executed branches divided by the total number of branches, then multiplied by 100) or Path coverage (testing all possible execution paths, including different branches and loops).

  • Gather and analyze historical data on your defects to spot areas in your codebase with the most defects and ensure it’s adequately covered.

Ideally, teams should prioritize tests that cover the most critical aspects of your codebase and those with a troubled history of defects. That way, you’ll always get the most out of your testing process without wasting time and resources.

Are Test Suites Slowing Down?

Testing is a slow process, and while it can be sped up in some ways, it’ll always be time-consuming. Test suites, in particular, can impact development speed, but there are a few ways to speed it up:

  • Track time - Simply track how long it takes for a test suite to run. For example, you can use the “time” module in Python to track how long it takes for a test to complete:

import time
import unittest
class YourTestSuite(unittest.TestSuite):
def __init__(self):
super().__init__()
self.addTest(YourTestCase1())
self.addTest(YourTestCase2())
# Add more test cases as needed
if __name__ == "__main__":
start_time = time.time()
# Create an instance of your test suite
test_suite = YourTestSuite()
# Create a test runner and run the test suite
test_runner = unittest.TextTestRunner()
result = test_runner.run(test_suite)
end_time = time.time()
elapsed_time = end_time - start_time
print(f"Time taken to run the test suite: {elapsed_time:.2f} seconds")
# Check if there were any test failures or errors
if not result.wasSuccessful():
exit(1)
  • Set thresholds - Figure out with your team what an acceptable length of time it should take for a test suite to run. Anything over that should be analyzed and improved (or split into new test suites).

  • Monitor trends - Keeping an eye on your test suites can point out areas of improvement, especially if tests start to take longer than usual as your codebase develops.

If your test suites are slowing down (or are already slow to begin with), it may be time to start optimizing them for better performance. You can consider parallel testing, removing (or improving) slow and redundant tests, and ensuring that resources are used efficiently in your testing process.

Are Test Suite Runs Decreasing?

Your teams may be running test suites less often, which might show that your testing coverage is decreasing over time. This could be due to any number of reasons, such as changes in your development processes or resource constraints. To combat this, there are two easy steps:

  1. Monitor how often teams are running test suites.

  2. Compare the coverage of recent test suite runs to historical data.

With that data, you can investigate the underlying reasons why test suites are being run less often —  development practices, resource constraints, or other factors  —  and hopefully, course-correct going forward for better test coverage.

Are Test Suites Failing More Often?

Nobody likes it when tests fail, but it can be a noticeable symptom of declining software quality if your test suites are failing more often than usual. If you want to dive in and spot why this may be happening, you should:

  • Track the number of test failures over time.

  • Monitor the failure rate as a ratio of failed tests to total tests executed.

If your tests are failing more often than usual, it’s time to investigate why. It could be from code changes throwing off tests, integration issues, environmental issues, or more. By putting a spotlight on these issues, you can flush them out and get back to maintaining and improving the quality of your software.

Measure Software Test Failures and Triage Issues with Launchable

Failing tests involve more than spotting errors; distinguishing critical ones amid numerous logs and understanding their history demands QA engineers' thorough analysis. Launchable’s Intelligent Test Failure Diagnostics solves traditional bug triage bottlenecks by finding and focusing on what truly matters.

We help teams identify and eliminate flaky tests by tracking the data your tests output. Each test gets their own score to show you how much of an impact they make on your testing process.

Plus, we track how long your tests are taking, alongside how often they’re run. Paired with our Predictive Test Selection, we show you the tests that should be running, and the ones slowing down your teams. Measure and analyze your test failures with Launchable to increase quality and speed up your test cycles.