Cloud overengineering for data platforms
Willem Conradie, CTO and Technical Director at PBT Group
I recently read an article about overengineering architecture in the cloud – and it got me thinking about how I have experienced this in practice, specifically with regards to data platform architecture in the cloud.
From my experience, there are two aspects that immediately jump to mind.
The first is different, and combinations of, data platform architectural patterns. Nowadays data architects must contend with a multitude of data architecture concepts, approaches, and patterns. Just to name some, data lakes, data warehouses, data vaults, data lake houses, data meshes, data products, and data fabrics. Trying to stay ahead of these concepts as they evolve, let alone navigating all of these in a corporate environment, can be a very tedious task.
Often companies end up implementing variations of the above, with bits and pieces of each. The chosen approach may look easy enough – at least on paper alone and because technology in the cloud is very accessible. However, in practice this leads to a complicated architecture to implement with the danger that this complication is only “experienced” during implementation. The data platform architecture ends up containing many layers, and many components in each layer, which all need to integrate, store, transform, serve and transport data at various frequencies.
Over and above the initial data platform architecture overengineering, maturing into a new architecture is an iterative process. Typically, one with one, or more, architecture transition states to iteratively grow into a defined target state.
This is often where the second “overengineering” takes its forms.
Cloud service providers evangelise a modular architecture with a “fit-for-purpose” approach to tooling options in the cloud. Deciding which cloud services are required to realise a given capability in the architecture is then driven by the decided architecture, and not by focusing on delivering business value.
One example of this I have seen in practice is that the cloud service providers will advise that data from relational data base must be replicated in near real-time into the data platform. With that, the data architecture team will then create a principle that dictates that all relational data must be ingested with the near real-time ingestion pattern and tooling. This approach is complex to implement, maintain and support. Add to this the scarcity of skills in the market – not just locally but globally – and there is a lot of potential novelty risk associated with the approach.
Very often, most use cases don’t really require real-time data ingestion but batch ingestion. A batch-based ingestion approach is a robust, simple, cost-effective, and very mature way to ingest data. It still has its place in data management and must not be overlooked.
When there is a “real” near real-time requirement driven by business value, by all means go for it and add it to the data platform capabilities. The difference here is that introducing this capability should be purely driven by business demand, not architecture or technology.
What is important is to design your processes in such a way that adding near real-time capabilities, when needed, will not require any re-architecture, or extensive redevelopment of any existing processes.
Why is over-engineering in the cloud so easy? There are a multitude of factors that can contribute to this. Gaining access to cloud and deploying cloud services is very easy. Skills in the market are limited and still maturing. Internal resources often put-up their hands to assist, with good intentions at that, to do the work. Unfortunately, they sometimes don’t know upfront what the implications down the line are of any incorrect design decisions upfront. Vendor guidance is very general and doesn’t always take into consideration client specific environments. Sometimes technical resources over complicate solutions to challenge themselves, instead of focussing on delivering what is important to business.
What are some of the implications of over engineering? It impedes one of the main benefits of the cloud which is agility. A more complex solution is not just more complex to develop and implement, it is also more complex to evolve or decommission. The solution becomes more difficult to maintain and support during its active use. The more services used in the cloud within a solution, the more skills the implementation will have to learn. In addition, the cloud costs for running the solution are inherently more expensive.
How do you go about managing this? It is important to have the correct architecture governance and controls to ensure solutions deployed in the cloud are “fit-for-purpose” from a business value perspective. Make use of the well architected frameworks cloud service providers offer to establish these controls but add your own flavour to them to ensure they make sense in your specific environment. Keep it simple…. It is very easy to make a solution complex, it takes a lot more forethought and experience to make it simple. At the end of the day, I firmly believe that a simple solution is an elegant one.