Our introductory blog on PaaS adoption in financial services outlined six core challenges. This follow-up examines one of those challenges in more detail – the operating model for provisioning and managing PaaS resources in Public Cloud.
The concept of “Infrastructure as Code” (IaC) has been around for several years. The basic premise is that the cloud resources supporting an application should be managed alongside the application code, using the same software development processes and techniques to invoke infrastructure and adopting a dogmatic approach towards making changes to that infrastructure only via the code pipeline, even in the case of incidents.
There are many benefits of IaC across both technical and non-technical disciplines, such as compliance and audit, but there are also very real risks that need to be actively managed. Many of those risks relate to application teams configuring cloud resources in ways that violate policies and standards, resulting in the exposure of sensitive data (and the subsequent fines and reputational damage), the loss of critical systems or poor cost management.
Stay current on your favourite topics
With IaaS having been the primary focus of public cloud adoption, a typical approach we see in financial services is the low-level infrastructure management code hidden behind an abstraction layer, which manages many of these risks, with application teams managing a more abstract description of the infrastructure they require in their own code pipelines.
The heterogenous nature and fast feature velocity of PaaS means that the investment required to build and manage a similar abstraction layer for these higher level services is exponentially greater and slows down the adoption of new services and features. As a result, the leading organisations are waking up to the fact that they have little choice but provide direct access to the Cloud Providers’ APIs, managing risks using “sidecar” techniques.
The Evolution of Cloud Resource Management: A Shift in Operating Model
In their journey to adopt cloud services, financial institutions have always struck a balance between the immediacy of service provision craved by application developers and the need to control access to those services – principally from the perspective of information security and IT compliance, but also cost.
While cloud programs primarily focused on the adoption of IaaS, cloud/infrastructure teams within financial institutions retained greater control. Not only would they be responsible for building the IaC abstraction layer, but in many cases, they would also be an active participant in the provisioning workflow itself.
The uptake of PaaS is resulting in greater responsibility being devolved to users of those services. Yet in managing that shift in operating model, it is important that lines of responsibility are clearly defined and understood.
Different Layers of Responsibility: The Apartment Building Analogy
Mapping out the lines of responsibility for managing service-provision within any individual CSP can be a challenging process. Each CSP offers its own unique set of services (and service architecture), which are always evolving. Even when services are directly comparable, the terminology used to describe them differs across CSPs making it difficult to find a common language to describe the operating model without preconceptions.
Responsibilities, therefore, need to be allocated at an abstract/logical level, and in doing so, a useful analogy is to think in terms of an apartment building.
Building Structure and Core Infrastructure: Within an apartment building, the structural elements and core infrastructure of the building are shared across all tenants. Foundations, walls, roof, stairs, elevators and the delivery of water, electricity and gas mains – these are all managed centrally to make life easier for each occupant.
In the world of IaC, core structural elements will continue to be managed, controlled and operated by central teams. A simple example might be the direct network links that connect on-premise infrastructure and a CSP. Very few applications teams would benefit from having to procure, configure and manage such services individually and it is not desirable for the organisation to have several heterogenous solutions. Most application team would rather allow a central team to maintain the availability of network links and simply benefit from their use. Another example of building-level resources is the collection of telemetry data required for Security Operations to monitor the environment.
Apartment Units: An apartment building typically offers a handful of different layouts to choose from (studios, 1, 2, 3 bed, etc.), with a good deal of standardisation across units.
Within the world of IaC, these ‘units’ equate to accounts, groups or subscriptions (depending on the lexicon of the CSP) plus any standard infrastructure components required to support an individual application. They are essentially empty shells that app teams can fill according to their needs. Infrastructure teams may be responsible for configuring rules and policies at this level to govern what can be done within those shells, but for the most part, the app teams should only butt up against those restrictions should they try to do something that is not permitted by the policy.
Furnishings: Each tenant within an apartment building is permitted to furnish the inside of their units in any way they choose.
In the context of IaC, these ‘furnishings’ equate to the ‘services’ that need to be provisioned to develop, test and run applications. Whether they be Kubernetes clusters, managed databases or AI-powered analytics, application teams should be afforded direct access to CSP APIs, using whatever tooling they are comfortable with, to configure their own services – providing them with the freedom to operate using agile / DevOps approaches in line with their application codebase.
That is not to say application teams will have free reign. They still need to abide by IT compliance standards. Ensuring and assuring those standards are met will involve a mix of both in-band and out-of-band techniques. In-band controls prevent users from configuring resources in ways that contravene policy (important when those contraventions have serious repercussions) – while out-of-band controls involve continuous monitoring to detect and alert on any violations and may even result in the automated destruction of non-compliance resources. It is important to ensure that any controls an application team will experience in production are also exposed in non-production environments.
The Road Ahead
Across these three levels, the standardisation, lifecycle, ownership, and invocation of infrastructure code differs. Investing time to align the different cloud resources that sit in each of these buckets will help to delineate this ownership and put priorities around building out the cloud environment for application usage.
The major CSPs, Third Parties and Open Source projects offer tooling for in-band and out-of-band management of risks at the “furnishings” level, enabling the direct access and autonomy for application teams to the PaaS APIs. Investing time in mechanisms for rapid delivery of “Apartment Units” which deploy the cloud native policy enforcement (Azure Policy & Blueprints, AWS Service Control Policy, Config & RBAC and GCP Organization Policy) combined with third party continuous compliance monitoring and more destructive “Simian Army” tooling to provide a comprehensive risk management suite will bring much greater business value than trying to keep up with hiding away the CSP from the end user.
Putting in place a mandate that all resources in non-development environments at all layers of the model must be managed via code pipelines maintains discipline and makes some of the more difficult controls (such as privileged access) easier to manage by extending the existing controls that have been put in place for managing application code into the space of infrastructure resource management.
On top of this, creating an in-house community that builds and maintains codified compliant resource patterns in an “open source” like manner, particularly for the more complex and configurable PaaS services, will help teams that are adopting cloud to be confident that they are developing against approved resource configurations and can be used to fast-tracking the reviews required to move into Production.