By Quinton Anderson, Head of Engineering, Commonwealth Bank of Australia
As an engineer, one is used to choosing between many mutually exclusive options. You take a set of constrained resources as input, bring some users along for the journey and engineer the best possible solution given the set of constraints that you have to work with. There are many ideal or purest solutions out there -- in the current world you can access these solutions easily by reading user case studies, blogs and an increasing amount of good research that will help frame your thinking. But that’s all it does, helps us frame our thinking. In the real world we can’t naively apply any purest or external approaches to problems, we have to recognise all of our constraints and engineer the best possible solution given those requirements, and then evolve it later.
Constraints come in many forms for example, market forces, budget constraints, existing system constraints, availability of appropriate people, time, and this list goes on. Given the mix of constraints we often have to compromise, there are many examples of compromise in the world. For project managers it’s about on time, within scope, or quality-- you can only have two, you can’t have all three. In programming languages you typically can’t have a language that is both expressive and have a static type system; you have to choose the safety of a type system or back the long term social benefits of expressiveness. Some choices are deemed to be mutually exclusive, and some choices actually are. Though many compromises that we made in the past are no longer necessary because the world or the community has changed and adapted, or we have unpacked some underlying assumptions or drivers and changed things. Today you can have the safety of a type system and expressiveness, you don’t have to compromise, but that took many years of programming language research to achieve.
As the head of engineering for Analytics and Information at the Commonwealth Bank of Australia, I am expected to provide systems that are stable, available and secure, yet also support experimentation and rapid change. Open source and software defined infrastructure allow me to achieve both; but before I explain, let me provide you with a little context first.
The Commonwealth Bank of Australia is Australia’s largest bank and one of the world’s largest by market capitalisation, with 19 million customers, 52,000 employees and 800,000 shareholders. Our vision is to excel at securing and enhancing the financial wellbeing of people, businesses and communities.
Our business has an increasing number of touch points with our customers, and our services are vital to them. Our business is one of scale, with thousands of transactions per second and extreme expectations of service levels and resiliency. We are also fundamentally a data driven business, our data sets are extremely rich and we use those data sets to drive towards our vision. Internally we have built many capability areas that have enabled us to maintain our competitive edge and lead in customer satisfaction over our competitors. A capability that has received a lot of focus and investment in recent years is Analytics and Information.
Analytics and Information systems are now at the very heart of our business, with most business processes ending in the data warehouse and Hadoop environments -- increasingly, channel and product systems are enabled by and built on top of these core data systems.
In order to achieve our vision we leverage a large amount of data and experiment with that data. We need technical environments that give our business users and data scientists the ability to iterate, fail and learn while at the same time deliver stable and secure banking services. This is a difficult combination to achieve, but we are now achieving it.
The Analytics and Information (A&I) team at the Commonwealth Bank is addressing this problem by using a two prong strategy; embrace open and ensuring that we are robust, even in the face of overwhelming change. Increasingly we’re seeing evidence that this combination allows us to experiment and move fast while being stable, available and secure. The community has solved sufficient underlying issues that we no longer have to compromise, and this recognition is vital for our business.
I must admit, both prongs are quite vague when stated simply like that, so let me elaborate a little for you.
Embracing open is about two things, open standards and embracing open source software. Standards are important because they allow us to incrementally decouple systems, when applied correctly (some standards promote monoliths, which we don’t want). Embracing open source is important for many reasons; there are a number of financial benefits to open source when applying it at scale (I must stress, at scale where unit costs really matter), but more importantly open source allows us to experiment easily with tools and capabilities without the friction of the sales and commercial cycle. This has been a breath of fresh air to our engineers, they are free to reach out to any community and leverage their projects to help solve the problems that they have in front of them. They don’t have to go through lengthy discussions, sign NDAs and engage various legal representatives in order to get their hands on tools that may not solve their problems anyway. Testing something is worth a thousand meetings. Now, that doesn’t mean that we don’t sign commercial agreements and we don’t use closed source, it also doesn’t mean that we support everything ourselves. We are pragmatic. We make the best engineering decisions based on the constraints in our world, but open source frees us to make those decisions and make them fast. It is also important to note that we only embrace open source software, security and privacy are primary concerns for us.
Being robust is slightly less obvious, so let me elaborate a little. IT systems are relatively easy to build. As an industry, we have good experience doing this now. What’s difficult is making IT systems stand the test of time. It is easy to build a system that is correct at a point in time, but something that is correct at some point in time is inherently fragile in the face of change if you aren’t very careful about how you build it. There are various sources of change over time -- people come and go, requirements change, the shape and skew of data changes, non-functionals change, hardware degrades and so on. What is important is the ability of a system to change safely and in a scalable way as the world changes around it. If it is slow and expensive to change, that is because some part of the system or its delivery process is fragile. Software engineers have known this for some time and have developed various strategies for making their code bases robust.
Things like tests, continuous integration, expressiveness, static code analysis, distributed version control, working in the open, code review, etc. The thing is, having a robust code base is only part of the solution, code is simply academic until it is running in an environment and being used by a customer, and we can’t ignore Conway’s law. We have to extend this thinking into other parts of the delivery process, and there are various attempts at doing so; Continuous delivery focusses on extending the continuous integration rigor into the delivery and deployment process. Lean focuses on getting validation from our users as often as possible, questioning our assumptions and pivoting as we learn. Dev ops focuses on breaking down the organisation and culture barriers between those building and those running the systems.
The various versions of Agile provide principles and practices that help delivery teams make better decisions and op
timise for change. Application of any one of these approaches in isolation results in failure – Agile without the supporting technical practices starts fast and gets slower and slower and continuous delivery without the appropriate organisation structures and tooling results in many sprint teams trying to get things done. When you abstract over all of these ideas, frameworks and maturity models, you find that the underlying theme is that we are trying to deal with change, and we are trying to do it in a way that is inexpensive, fast and results in sustainable outcomes for our business and our customers. We can’t apply just one approach, we have to be able to identify fragile systems, practice, culture as well as find ways to make them robust in the face of overwhelming change.
So, how does software defined infrastructure fit into this picture? What does it have to do with being robust? Quite simply, the more places within the system or delivery process that involves manual intervention, the more fragile things are. In order for engineers to implement continuous delivery, they have to automate the entire stack, and it all has to be in version control and codified in some way. Codification allows both humans and machines to interpret artefacts and this allows us to main artefacts simple to maintain over time (automated tests, analysis, etc.). Infrastructure such as networks, servers, storage are simply resources to be managed by automated processes, processes which can then be made robust in the face of change.
What about some of the tangible benefits we’ve seen? Well, let me be clear that we are still in an early stage of this journey, only two years now, but already we are seeing very tangible benefits. Firstly our operational expenditure has decreased quite significantly, not in an absolute sense (we are growing by more than 60% annually), but our unit costs have seen a marked decrease. Secondly, our developer experience has significantly improved from when we started the journey. This has many indirect benefits, the greatest of which is attracting and retaining the right talent within our business. Finally, we have experienced great reductions in operational complexity which has been vital given the rate that we add new functionality onto our platforms.
The greatest benefits however have yet to be proven, but we continue to focus on our numbers. We expect further reductions in per unit costs, and even better developer experiences, but most importantly we expect our cost of changing our systems to greatly reduce over time and we also expect to see the rate at which we deliver change to significantly increase and importantly, keep accelerating. This will enable us to support our business to experiment, learn and ship those learnings rapidly while maintaining the world class availability and security that is expected of a bank.