13 times E2E testing could have saved the day

John Gluck
September 13, 2024

Let’s be real: on the internet, every day is Friday the 13th. Which makes this the perfect day to talk about all the times things went spectacularly wrong: rockets blowing up, websites crashing at the worst possible moments, and stock prices going haywire. While it’s easy to chalk these incidents up to bad luck, the reality is that many of these failures are the result of software defects.

Still, bad luck can and does play a part in these catastrophes, so knock on wood, cross your fingers, throw salt over your shoulder, and get ready for tales of misfortune smiling upon the unprepared.

13. Free money! The Bank of Ireland withdrawal glitch (2023)

It was quite the scene on the night of August 16 in Ireland—imagine people rushing to the nearest ATM like it was Black Friday for free money!

Thanks to a software glitch, even customers with a negative balance could transfer up to €1,000 when the limit should have been €500 for customers with sufficient funds. But the following day, the fun was over, and customers woke up to the most predictable plot twist ever: the bank knew. Surprise! Those who were after free money were now overdrawn and had to pay back the cash.

The Bank of Ireland did not escape unscathed. It gave customers six months to return the money, interest-free. So, yeah, they basically handed out a bunch of unintentional, interest-free loans. And let’s not forget the PR mess.

If this had just been a one-off hiccup, that would be understandable. But, nope, this was just the latest in a long series of tech fails for the bank while trying to move customers to digital banking. They apparently forgot the whole “making sure things actually work” part. They likely skipped over the good stuff like E2E integration testing, failure scenario simulations, real-time transaction synchronization testing, and stress testing of backup systems.


12. A royal mess: Knight Capital (2012)

In 2012, Knight Capital turned a critical software update into a disaster. The culprit was a piece of old code that should’ve been retired but got reactivated during an update. Once live, this rogue code went haywire for 45 minutes, buying high and selling low.

By the time they managed to stop the frenzy, Knight Capital had lost a king’s ransom—a whopping $440 million.

What went wrong was a combination of oversights. Knight Capital ran E2E tests on seven servers but missed the eighth, where the old, unused code was reactivated unexpectedly. They also didn’t conduct failure mode testing to simulate scenarios where legacy code might accidentally be re-enabled. To make matters worse, there was no rollback plan in place, so once the faulty trades started, they had no way to quickly undo the damage.

Ultimately, Knight Capital’s failure to properly test and remove old code left them with a king-sized headache—and an empty treasury to match.

11. When one typo took the Internet down: AWS S3 outage (2017):

In 2017, the internet hit pause. This time, it wasn’t Kim Kardashian—it was AWS (Amazon Web Services) causing the chaos, thanks to one little typo. An AWS engineer meant to take down a few servers for maintenance but accidentally pulled the plug on far more. Suddenly, Amazon's Simple Storage Service (S3) wasn’t so simple anymore.

This wasn’t a minor inconvenience. For a few hours, Trello, Slack, Quora, and plenty of other places you pretend to “work” were offline. Businesses panicked, memes flooded in, and engineers everywhere started triple-checking their spelling.

Testing can’t prevent typos, but better safeguards could’ve stopped this from becoming a disaster. AWS didn’t plan on someone hitting the wrong key. No one asked, “What if we accidentally take down a big chunk of the internet?” Without a rollback mechanism, the AWS team had to manually clean up the mess. Furthermore, preparing for disaster by stress-testing for unexpected shutdowns—like half the internet going dark—could’ve helped.

10. A prescription for chaos: Healthcare.gov launch (2013):

When Healthcare.gov launched in 2013, it crashed—hard. The site couldn’t handle the flood of users, and by the end of the first day, only six people had successfully signed up. Load testing wasn’t a priority, and the team severely underestimated the traffic to the site, which buckled under the pressure.

But it gets worse. The backend systems, which were supposed to verify insurance info and connect with agencies, weren’t cooperating. Without proper integration testing, these systems were like mismatched puzzle pieces. An E2E testing strategy could have simulated real-world traffic and caught the integration issues early, but instead, users were left refreshing their browsers in frustration.

The Healthcare.gov fiasco ended up costing the U.S. government and taxpayers an estimated $1.7 billion. Originally budgeted at $93 million, costs skyrocketed due to technical failures, months of fixes, and extra contractors needed to stabilize the site. The lack of adequate testing played a major role in this expensive disaster, which took months and extensive overhauls to fix.

9. When one line of code hung up the nation: AT&T long-distance outage (1990)

In 1990, AT&T’s long-distance service went down for nine hours, leaving millions without phone access—and it all came down to a single faulty line of code. The bug caused the system's switches to miscommunicate, triggering a constant reboot cycle across the network. The resulting communications blackout cost the company tens of millions.

Integration testing would have helped, though it wasn’t as simple as flipping a switch. Testing such a massive, distributed system would have been complicated, requiring the simulation of interactions between thousands of switches spread across the country. Though the technology was more limited compared to today, it wouldn’t have been impossible to run these tests back then.

By simulating failure scenarios, AT&T could have caught the problem before unleashing chaos on their customers. However, catching how the switches handled errors—like the one that caused this cascading failure—was crucial, and skipping it ended up being a costly oversight.

8. You’re grounded: U.S. NOTAM system outage (2023)

In January 2023, the FAA managed to ground every single flight in the US—all thanks to a corrupted database file. A long-standing bug appeared during a routine maintenance procedure, causing the entire NOTAM (Notice to Air Missions) system to crash.

Pilots rely on NOTAM for safety info before takeoff, and when that system goes dark, so do all runways. The FAA had no choice but to halt every flight for over 90 minutes, resulting in over 1,300 cancellations and 11,000 delays. The economic fallout was estimated as a $400 to $600 million hit to the airline industry. From grounded planes to disrupted business trips, the outage sent ripples across supply chains, travel plans, and pretty much every major airport.

In hindsight, it would’ve been much cheaper to prevent this mess. The FAA is now pouring $30 million into modernizing the system with better testing, redundancy, and everything they probably wished they'd had before the sky fell. If they'd made that investment earlier, they could’ve saved a small fortune and a whole lot of angry travelers​.

7. When division didn’t conquer: Pentium FDIV Bug (1994)

In 1994, Intel ran into a major problem when its Pentium processor was found to have a flaw in its floating-point division. The bug caused minuscule but critical errors in high-precision calculations, especially affecting those in fields like science and engineering. The deeper the math, the bigger the problem.

If Intel had run comprehensive E2E tests on how the processor handled complex operations, they could have caught the issue before release. Real-world usage and mathematical simulations would have flagged the flaw.

Intel had to fork out $475 million to recall and replace affected processors and incurred some serious reputational damage.

6. The $327 million metric mix-up: Mars climate orbiter loss (1999)

In 1999, NASA’s Mars climate orbiter became the ultimate casualty of a simple yet catastrophic unit conversion error. One engineering team used metric units (newtons), while another used imperial units (pound-force seconds). This mismatch caused the spacecraft to enter Mars' atmosphere at the wrong angle and erupt into a giant ball of flames.

Proper E2E testing could have flagged the issue long before the spacecraft ever left Earth. By thoroughly testing how the software systems interacted—including checks for unit conversions between teams—NASA could have caught the problem. In 1999, NASA had access to systems that could run these tests, but the lack of cross-system validation turned a promising mission into ash and one of the most expensive unit conversion blunders in history.

5. The float that didn't fly: Ariane 5 rocket explosion (1996)

Floating points strike again. In 1996, the European Space Agency’s Ariane 5 rocket exploded just 37 seconds after launch due to a floating-point error in reused software from the Ariane 4. The software wasn’t built to handle the faster trajectory of the Ariane 5, causing a malfunction in the guidance system. This led the rocket off course and triggered its self-destruct mechanism. The problem stemmed from an unhandled floating-point exception, which could have been caught if the software had been adjusted for the Ariane 5’s unique conditions.

E2E testing of the entire flight sequence—particularly simulating Ariane 5’s flight parameters—would have flagged this issue. Testing how the reused software handled the rocket’s increased speed could have exposed the flaw well before launch. Unfortunately, without these full-system simulations, the error went unnoticed until it was too late.

4. A first-class failure that ruined lives: UK Post Office Horizon Scandal (2000-2014)

Imagine writing software so flawed someone made a movie out of it. That's exactly what happened with the Horizon IT system used by the UK Post Office. From 2000 to 2014, this system showed false financial discrepancies, leading to the wrongful prosecution of over 700 sub-postmasters. It made it seem like they had stolen money from their branches. Many were convicted of fraud or theft, with some going to prison, while others faced financial ruin, destroyed reputations, and drawn-out legal battles. It became one of the largest miscarriages of justice in UK history.

E2E testing could have changed everything. Simulating real-world transactions across the entire system would have exposed the data integrity issues in Horizon. Instead of relying on faulty numbers, the Post Office could have identified the root problems early on and avoided prosecuting innocent people. Testing the system under real-world conditions—handling different transactions and account scenarios—could have caught the errors before they caused so much harm.

The impact has been massive. The Post Office has paid out hundreds of millions of pounds in compensation, and the reputational damage is immeasurable. Legal battles dragged on for years, and many wrongfully convicted individuals are still seeking justice. This remains one of the most haunting IT failures in modern history.

3. A deadly race condition: Therac-25 radiation overdoses (1985-1987)

In the mid-1980s, the Therac-25 radiation therapy machine became infamous for delivering fatal overdoses of radiation to patients due to a software bug. The issue stemmed from a race condition in the control software, which allowed the machine to enter an unsafe state when switching between different modes of operation. The software failed to detect or prevent dangerous conditions, leading to massive overdoses in multiple cases, resulting in several deaths and severe injuries.

End-to-end testing could have prevented these tragedies by simulating how the machine operated across all modes and scenarios, especially during quick transitions. Thorough testing would have revealed the race condition and how the software was interacting with hardware, allowing engineers to address the problem before the machines were used on patients.

The incident had widespread repercussions, including major lawsuits, and led to significant changes in medical device regulations. It serves as one of the most serious examples of how inadequate software testing can lead to catastrophic outcomes.

2. A dangerous oversight: Toyota unintended acceleration (2005-2010)

From 2005 to 2010, Toyota faced a massive crisis when cars across its lineup began experiencing unintended acceleration. A software bug in the throttle control system was identified as one of the possible causes. This terrifying flaw contributed to over 30 reported deaths and numerous accidents, sparking a firestorm of recalls, lawsuits, and safety concerns.

E2E testing could have mitigated this disaster. By running comprehensive tests under real-world driving conditions, Toyota might have discovered the interactions between the throttle control software, sensors, and mechanical components that were contributing to the issue. Simulating high-stress conditions—like sensor failures or unexpected throttle inputs—would have flagged these anomalies before cars ever hit the road.

Toyota ended up recalling over 9 million vehicles globally and paying billions in settlements and fines. The incident severely dented their reputation for reliability, reminding automakers that thorough testing isn’t just about performance—it’s about saving lives.

1. A tragic miscalculation: Patriot missile failure (1991)

In the first Gulf War, a rounding error in the Patriot missile system's internal clock prevented it from intercepting a Scud missile, which hit an Army barracks and killed 28 soldiers. The bug only became noticeable after extended operating periods.

E2E testing under long-duration, real-time conditions could have flagged this error before the system was deployed. By running tests that simulated extended operations—similar to the prolonged use seen in the Gulf War—the issue in the timing mechanism might have been identified and corrected, preventing the catastrophic failure.

The incident not only resulted in the tragic loss of life but also shook confidence in the reliability of military systems, especially during critical combat operations. This failure underscored the importance of thorough testing, especially in high-stakes environments like missile defense.

Prepare for the unexpected

Anyone can be a Monday morning quarterback. But that’s not worth a lot if your company has suffered a preventable, multimillion-dollar bug. We know this from experience. Two of our founders started QA Wolf because they shipped preventable defects and learned how to keep that from happening again as a result.

No one wants to hear or say, “I told you so.” And the only way to prevent that is to plug in the “ow” and spend what it takes to get your E2E testing right. And if you need help with that, QA Wolf is happy to help.

Keep reading