Automating Edge Cases

Automating Edge Cases

Do the happy path first. Deal with edge cases later. Usually, this is sound advice, and admittedly it is exactly what I have been telling people for long. But when a wave is rolling at you, the idea of dealing with flood protection later suddenly does not feel like good advice anymore.

I had a flight booked from Munich to Berlin in March 2020. Nothing out of the ordinary, a regular business trip to visit a client. But this time, everything would turn out differently.

A few days before my scheduled on-site visit, in face of the burgeoning coronavirus pandemic, our client switched to home office work. Getting on site was obviously no longer indicated, so I cancelled my flight for the following Sunday evening (I usually arrive the night before an engagement). I received a confirmation email informing me about cancellation fees. Nothing too far out of the ordinary up to this point, at least if you ignore the fact that the whole world already was in the process of shutting down due to a pandemic of unprecedented scale.

On Sunday, things started to get really weird. I received another email by my airline inviting me to check in for my Monday flight to Berlin. That felt somewhat unusual, so I started to investigate the matter. It seems that the airline, at some point, had cancelled the Sunday evening flight that I had originally been booked on. Trying to be customer-friendly, they re-booked me to the next available flight on Monday. This was of course completely unnecessary, since I had already cancelled my original reservation.

In addition, it was nonsensical, because I would have arrived in Berlin long after the meeting that I was set to co-chair had ended. Well, at least the airline had tried to be helpful.

An isolated incident?

One might argue that this was just a glitch. After all, as a computer scientist and unconventional thinker I have a tendency of finding edge cases and bugs. But I am
definitely not the only one (tweet is in German) who had such an experience. I think it is safe to assume that even in that early phase of a beginning lock down, the airline's IT systems already were in a pretty messed up state.

With systems seriously out of sync and a big wave rolling at them, many companies seem to try to take load off their computer systems by directing customers to the phone hotline. It is somewhat ironic that the software-based queues that keep you waiting for a call center agent seem to be so much more resilient to flooding than the ever so scalable, cloud-based web and application servers.

Granted, doing things manually could in fact be a business decision to throttle the cancellation rates. In due course, my airline announced an "improvement": instead of cancelling flights online, customers are now supposed to call the hotline to reschedule flights. However: If you need to contact us, please only do so, if you wish to travel during the next 3 days, the website read.

Business decision or technical limitation: the impact is enormous, and the ripple effects serious. Loads of corrective actions will become necessary. Coincidentally, this reminds me of my first travel through the newly opened Terminal 5 at London's Heathrow airport, on day two after it had opened. It was disorder, chaos, and mayhem. It took my bag about two weeks to find its way back to me - via Italy.

Disruptions: just a matter of time

Last year in 2019, I gave the presentation Beyond Clean Code - Building the Right Software at a few conferences. One key message was that if software lives long enough, there will always be disruptions in the market that will require major changes to the software.

This is why software has to be easy to change, and why test automation is crucial.

Back then, I gave a few examples for disruptions. The introduction of SEPA, which for Germans meant to move away from an established system of bank and account numbers. GDPR forced many of us to rethink privacy, with some serious effects on system and application architecture.

A while ago, the introduction of the Euro as a new currency, with Deutsche Mark and Euro both being valid currencies in a transition period. More recently, Brexit turned a part of the EU into a third country, however without any perceivable clarity or time line, thus not allowing for any level of planning.

From today's perspective, the list of disruptions clearly missed the term pandemic.

Each disruption challenges fundamental assumptions that were built into systems, and requires substantial changes to existing software. Here is why this is not an easy task.

Quite a few software projects have launched successfully by doing a feature freeze at some point and testing the software manually, or they went into a long public beta phase where the end-users actually tested the software in production. But how do they test those sudden changes that an impending crisis requires them to make?

Without test automation, one is left with more or less the following three choices:

  1. Leave the software as it is
  2. Take down the software for a testing phase
  3. Let the end-user test

None of these choices is particularly appealing. One might feel inclined to pick option 3, because the other two options will leave you at manual processing right away.

Choosing option 3 also comes at a high price. There will be countless bug reports that will have to be triaged and dealt with, customers will get more and more impatient as they wait for bug fixes or at least a reaction to their bug reports, and support hotlines will be flooded with calls. Loads of corrective actions will become necessary. And each of them might cause more ripple effects itself. That is a vicious circle nobody wants to end up in.

It's all about software architecture

Martin Fowler defines software architecture as decisions which are both important and hard to change. We have already established that business software is never really completed.

Nobody can predict which changes to software will become necessary. This, however, must not lead us down the dangerous path of building extremely generic software. Some have tried, and learned a very hard way that you end up with software that is unmaintainable, because nobody can understand what the software does due to its generic nature. I know, because I am one of those who tried.

We cannot even predict disruptions. Some of them, like GDPR, appear on the horizon rather early on, but are ignored for far too long. Others, like Brexit, are non-predictable in itself: nobody knew if and when Brexit would happen, and even today, nobody knows when and how any rules will change.

Let us face the fact: nobody can predict the future. Various cult leaders have predicted the end of the world. Except for those who were smart enough to predict a date well behind their expected lifespan, predictions are usually dis-proven on a regular basis by the fact that we still exist. For example, the world did not end on December 21, 2012, remember?

True, we have been warned about a pandemic. But we have also been warned about another nuclear disaster, the Third World War, crashing stock and/or real estate markets, probably even a Zombie apocalypse. Whatever happens, somebody has "predicted" it, somehow. It is pretty easy to point that out in retrospect, by the way.

Since we cannot predict the future, we need to design for change. We need to become more flexible and expand our abilities to adjust existing software to an ever-changing world.

Time to get ready, again

A high level of business automation is a key success factor, especially in times of crisis. Getting even close to 100% automation remains unrealistic – especially in times of crisis.

Traditionally, automation decisions have been driven by cost and revenue. This is not sufficient. We need to also take a risk-based approach when assessing the potential and impact of business automation. Some people use the term digitization here, I prefer automation.

We need to make our software and systems more flexible, to be able to adapt to changes with less manual labour and fewer ripple effects. Everything we have changed today, we might need to change back at some point in the future. Academically speaking, a most interesting feature about the coronavirus pandemic is that, at some point, things will be back to normal. Or will they?

One thing is for sure: coronavirus will not be the last major disruption we encounter. We should start to prepare, sooner than later.