Application Meltdown: Symptoms, Prevention, and Recovery

Picture this: you have an app. This app has been released for public consumption and gets a lot of traffic. It generates significant revenue for your company and functions as the lifeblood of your business model. This app has been lovingly maintained and expanded with new and exciting features that enhance and increase its popularity.  

Then something happens. 

Your app starts having performance problems. Users start complaining, and you begin to lose market share. The app is in meltdown. You get pressure from the CEO to fix it immediately. So what now?

What are the causes of App Meltdown?

A change to any component could have devastating consequences on the app.  We’ve seen instances where a simple change to a single SQL statement in the app caused a bottleneck in the database server, which triggered intolerable user wait times. This resulted in users abandoning sessions and going elsewhere (i.e., to competitor websites). Likewise, a change to a firewall setting may choke traffic to the web server, having the same effect.

An app (along with its underlying infrastructure) is like a living breathing organism, or rather more like a finely tuned ecosystem. Seemingly trivial changes can have dramatic adverse consequences on an app, throwing the entire ecosystem in constant flux. Apps have migrated from single-server on-premises data centers to virtual machines, then migrated from on-premises virtual machines to cloud computing environments.  

The servers’ operating system faces constant changes, from the database management system and the underlying programming languages to the connectivity software and e-commerce engines. In addition, there are constant changes to the network configuration, such as the firewall settings and cybersecurity protocols. All of this results in an ever changing ecosystem that becomes a monumental challenge to manage.

How do you prevent App Meltdown?

The best way to prevent App Meltdown is through constant testing. Having a well-defined set of automated “smoke tests” to test all system components is critical. We recommend these tests cover a large majority of your business transactions – say 90%.  If these tests are automatically executed on a daily basis, you can get notified immediately if something is amiss and begin the process of determining what changed from yesterday. 

For example, we had a global client with a large public web presence – over 6,000 separate web pages in several languages. Being a global company with a variety of brands, their web development was decentralized across those brands. The corporate IT department had no direct oversight of the changes but was responsible for the website’s uptime. We created a series of automated smoke tests executed every day at midnight. These tests covered several customer transactions to make sure the most frequently used transactions worked. They also validated every link on every page to make sure there were no “page not found” errors. Corporate IT received a report every morning detailing any failed tests, which they then investigated and corrected. Finally, we added automated load testing to make sure they could continually support expected volumes.

If you need help…

Contact us if you have a business critical app and you want to avoid App Meltdown. We can help you construct an automated smoke test to give you an early warning indicator that something has changed and has affected your app.