Current Approaches to Flaky Testing

The hurdles of flaky software tests and the top insights we gathered on how teams are approaching flaky test mitigation more effectively.

Key Takeaways

  • The flaky test plague haunts developers and development teams, and there are different approaches that industry leaders are taking to handle them within organizations.

  • Developers need a tool that measures flakiness and its potential impact.

  • How DevOps teams are approaching flaky test mitigation varies; different options are emerging to help troubleshoot such problems.

In our recent webinar The State of the Art in tackling Flaky Tests, we discussed the flaky test plague that haunts developers and development teams, and the different approaches industry-leaders are taking to handling them within the organizations. Here’s our summary of the common hurdles flaky software tests present, their causes, and the top insights we gathered regarding how teams are approaching flaky test mitigation more effectively.

Watch the on-demand webinar to dive further into how software teams around the world are tackling this problem.

Flaky Tests: The CI/CD Pipeline Challenge

The move to containers and microservices makes it easier for organizations to build and enhance enterprise applications. Rather than work with one large monolithic block of code, developers link small pieces of functionality together in a freeform fashion and then rely on automated testing to determine how well the code is written. 

The shift to CI/CD was designed to speed up development, support continuous integration, and improve software quality. But with continuous integration comes a need to address flaky test results faster and more efficiently in order to keep up with cycle times.

Continuous Integration Breeds Noise

Flakiness is a common product of today’s CI environment. In many cases, DevOps teams are under intense pressure to continuously deliver code as swiftly as possible.

The consequence of CI is the noise generated by flaky tests, which slows down the software delivery pipeline. The strain on timelines, compounded with test mistrust, makes flaky tests repercussions a drain on developer resources. Flaky tests develop as a result of either tests being written wrong or as a result of the environment. Naturally the less variables within your testing suite the lower the flakiness probability. Unit and component testing will naturally have a lower flakiness probability by nature compared to API and UI testing, as with regression testing. Shifting testing left means testing sooner which inherently helps reduce flaky test likeliness, but there is still the possibility of ineffectively written tests that cause inconclusive failures.

As this is a prevalent problem, we looked at common approaches to handling flaky tests. From sprint planning methods, to running a series of automatic retries, to utilizing tools to push code scanning results in to a database for better visibility of trends, development teams are testing ways to automate and observe tests that indicate flakiness.

Industry Leader Approaches To Flaky Tests

All development teams face flaky tests. World-leading organizations are testing ways to address the prevalent flakiness problem with their own various approaches. Google estimates that more than 1 in 7 of the tests written by their engineers fail with flaky implications. How DevOps teams are approaching flaky test mitigation varies; different options are emerging to help troubleshoot such problems.

  • Spotify developed a mathematical model that correlates flakiness with other failures.

Image Source: Spotify R&D Engineering

  • Dropbox triages testing data to determine what is going on, using an aggressive quarantine approach by harnessing a simple automation.

Image Source: Dropbox.tech

Another approach for more effectively identifying flaky tests is mapping the memory consumption of the program versus the tests, which allows them to correlate between the high level of flaky tests and the tests that occupies a large memory.

Flaky test impact varies dramatically. GitHub found that 24% of tests exhibit flakiness -- which is higher than Google’s estimate. But many of the tests had only 1 minor item that was problematic, and a mere 0.4 of the tests exhibit a great deal of flakiness, more than 100 incidents. Knowing how flaky an application is helps a company determine how much effort should be put into identifying if the test failed because of a true error or if the results are just noise.

In sum, flakiness is an emerging DevOps problem, one that slows application development down. Developers need a tool that measures flakiness and its potential impact. With it, DevOps teams can test more intelligently, better understand just how well application code is designed, identify problem areas, escalate them as necessary, and create solid rather than flaky code.