The Simian Army was originally developed by Netflix as a set of tools to ensure that its video streaming service was always available to global customers without any service degradation (while ensuring compliance with all policies relating to security, conformity and cost). That goal may seem relatively straight forward. However, given the scale of its operations, it is anything but. To frame the challenge, consider some of the following statistics:
- Netflix services approximately 150 million paying accounts across 190 countries
- It streams more than a billion hours of video content every week
- It is said to consume approximately 15% of the world’s internet bandwidth
Read the full version of this whitepaper
Netflix clearly faces requirements (in terms of scale and service quality) that would dwarf those posed by most financial applications. But financial institutions have other unique factors to consider – specifically much more stringent policies and regulations governing access controls, information security and availability (for example MAS 644 specifies a maximum total downtime of 4-hours in a 12-month period for all critical systems operated by banks).
This paper explains why the concepts introduced by the Simian Army are important to any financial institution adopting cloud services. It provides an overview of those concepts – specifically chaos engineering and continuous compliance – along with a more detailed explanation of relevant tools and guidance on how to implement them, both from a practical perspective and in terms of a suggested organisational model.
Why is the Simian Army Important to Financial Institutions?
Public cloud adoption by the financial services industry has lagged behind other sectors. Financial services firms are heavily regulated and subject to more stringent requirements relating to data privacy and security.
Applicable regulations, to name a few, include Dodd-Frank, FFIEC, PCI DSS, GLBA, SOX, USA Patriot Act, MAS TRM, MAS 644, HKMA TM-G and GDPR. Additionally, high profile data leaks have tempered some of the appetite for hosting critical workloads and sensitive data in the cloud, emphasising the importance of controls and continuous compliance.
Operating in a cloud paradigm has some fundamental differences to traditional modes of managing IT infrastructure and software. Cloud supports the creation, modification, and destruction of resources with orders of magnitude greater speed than traditional systems. Cloud environments generally expect relatively high rates of component failures because they are built on large quantities of inexpensive, commodity components.
The growing use of public cloud services in the financial services industry therefore requires a rethink of some key aspects of application development, service management and support:
Public cloud services can experience a higher rate of component failure than traditional on-premise dedicated infrastructure. It is therefore vital that applications developed for the cloud are built to fail. Resilience needs to be architected into software. This core requirement has triggered several corresponding trends in software design, including adoption of microservices, a move from stateful to stateless architectures and a tendency to decouple data from applications.
Similarly, when it comes to service management, the ease with which cloud services can be provisioned enables applications to be re-built more easily and at regular intervals – ensuring system entropy (another potential cause of availability issues) can be re-set.
When it comes to information security, as well as identity and access management, the financial industry is subject to much more exacting standards than most other verticals. Although many financial institutions have grown comfortable with the use of Infrastructure as a Service (IaaS) by implementing an Infrastructure as Code (IAC) approach to define and enforce minimum security standards, the adoption of Platform as a Service (PaaS) has introduced greater complexity and new challenges.
The need to lock down all potential attack surfaces in an environment that has primarily been architected to be internet-based, open and multi-tenant requires continuous monitoring to ensure all security policies are properly implemented and do not change.
Stay current on your favourite topics
Stay current on your favourite topics
Cloud economics can be compelling when using appropriate software architectures but it requires good hygiene. Making resources easier to procure can lead to sprawl, so organisations will need to continuously monitor services to ensure they are making use of everything they procure. Equally, cloud resources are most cost effective when software is architected appropriately, with modern architectures helping to reduce reliance on dedicated resources and ensure firms only pay for the CPU cycles necessary to support application processes.
The move towards Agile & DevOps development methodologies has evolved in tandem with the adoption of cloud. These approaches encapsulate a crucial benefit that financial services are trying to unlock – enabling software development teams to innovate faster. However, as resources become easier to provision and application teams take on more responsibility for their own destiny, new risks need to be managed.
As more responsibility shifts to the application teams, it is vital that those teams are continuously monitored to ensure they conform with all relevant IT policies.
Download this whitepaper now to read the rest of these sections:
Introducing the Simian Army
Learn how tools like the Simian Army help organisations adapt to the cloud and minimise the risks associated with software defined environments.
Discover the different techniques used in Chaos Engineering to provide a holistic set of capabilities for enforcing compliance in the Cloud.
Continuous compliance tools have evolved to help address other key aspects of application design and service management that require a rethink in the cloud – namely, security, cost and conformity. Learn about the different techniques used for continuous compliance in the Cloud.
A Roadmap for Implementation
How to prepare developers and ITSM personnel alike for the Simian Army by applying a “shift left” philosophy and embedding chaos into the environments supporting the organisation’s software development and release pipeline.
Organisational Model – Who Should be Involved
Techniques you can use to bring chaos engineering and continuous compliance into your organisation, requiring buy-in from cross-functional teams spanning multiple roles and responsibilities within an organisation which, historically, have not collaborated particularly effectively.
Build versus Buy
Some key factors to consider when determining whether to buy, build or integrate open source toolsets in implementing chaos engineering and continuous compliance.