The landscape of infrastructure and platform services is constantly changing, as is how we configure them for public and private cloud. Within this environment of change, Infrastructure as Code (IaC) offers the potential for automation and efficiency in managing infrastructure and platform resources through code. While the benefits of IaC seem obvious, its adoption comes with a unique set of challenges, especially at enterprise scale. In this blog, I list the top seven challenges that enterprises run into while using IaC. To explore options to overcome these challenges, read my next blog.
1. Learning Curve: Mastering IaC tools
Adopting IaC often requires mastering new tools and frameworks such as Terraform, Ansible, Chef, or Puppet. The learning curve can be steep, especially for teams accustomed to manual infrastructure provisioning. And not just because it uses a new language or has an unfamiliar interface. Most of the IaC tools use declarative languages to define and support infrastructure and resource management, which means defining ‘what’ is needed and not ‘how’ it needs to be achieved. Shifting from imperative languages (common for many developers) to declarative ones is a challenge for even the most experienced programmers.
2. Tool Proliferation: Choosing the IaC tools wisely
The plethora of IaC tools available can lead to confusion and potential tool proliferation. Each IaC tool offers a unique spin on managing infrastructure, and has particular strengths and areas of focus. This can lead to teams using multiple tools to cover the full need of infrastructure management, which then can cause fragmentation, complexity in automation, and compatibility issues as the tools vie for control.
Using an IaC tool doesn’t remove the need for deep knowledge of:
As an example, AWS supports 200 plus services and over 800 different resource types. Add to that any self-managed, open-source services that the enterprise applications may depend on. And the public cloud service providers (AWS, Azure, GCP, etc) keep introducing new services that offer faster, and sometimes cheaper, ways for applications to perform the same business task.
You will always need infrastructure expertise even if you simplify IaC.
Managing and versioning Infrastructure as Code without proper processes can lead to conflicts, especially when different team members make changes simultaneously. The situation could be significantly complicated when different microservices, which are owned by different development teams, are bundled together as an ‘application’, and each microservice has its own configuration for how it needs to run. Throw different environments – Dev, QA, Staging, Production, etc – in the mix, with each environment having its own set of configurations for the microservices, and you have a mess[h] of dependencies to be managed.
While IaC solutions offer templates for standardizing application and service deployments across different environments, they do introduce the problem of having to detect, track and remediate configuration drifts for each environment. Configuration drifts are introduced when engineers make changes to the platform resources directly in the cloud environment, rather than making those changes through the IaC. And there are usually good operational reasons for making those changes directly, as the Ops or SRE teams that are responsible for the high-availability and security of applications may not be the ones who have written the IaC. For example, direct changes may be made to the cloud environment if an application faces scaling issues in production and needs an immediate fix.
Detecting configuration drifts in any environment creates the additional dependency on an IaC tool that provides state management of platform resources as well. The detection process, either built in-house or provided by a commercial solution, would have to reverse engineer the creation of IaC files from the environment’s state management files, and compare those IaC files with the ones that were created for deployment. The remediation process of that drift would require the delta IaC code changes to be presented to the owners of those IaC files who can then make the decision to either accept or reject the changes that caused the drift.
IaC code may inadvertently expose sensitive information or may have resource misconfigurations that pose security risks. The entire infrastructure and resource provisioning through IaC should enforce zero trust by granting only those privileges that are required by the application and microservices to provide the business functionally, and nothing more. That means that the DevOps engineers writing the IaC code need to create the right security groups, IAM roles and policies, and resource configuration to enforce the enterprise’s defined security policies as well as zero trust.
Introducing IaC can disrupt existing workflows that enforce architecture and policy compliance across all of the enterprise’s applications and services, especially if implementation of IaC has not been a cross-team effort. While IaC offers a way to streamline and automate compliance, it also creates the need for new tools to scan the IaC code for ensuring that the compliance policies are followed and enforced through the IaC.
IaC code written by Developers and DevOps engineers, who are potentially not well-versed with the best design and security practices, could end up causing multiple iterations of IaC to get rejected by the scanning tool, which then requires manual explanation for the rejection, followed by manual revision, etc, all of which could lead to a significant slowdown in the application deployment.
The flip side of that problem could be that enterprises trying to address the slowdown in the deployment lifecycle adopt templates that limit the design choices of the developers and lead to outdated deployment architectures (compared to, as an example, the latest AWS Well-Architected Framework).
While Infrastructure as Code offers solutions to one set of problems it creates a whole new set of problems for the development and DevOps teams to deal with. But all hope is not lost. Read on to learn how those teams can harness the power of IaC to achieve automation, consistency, and scalability in their infrastructure management processes, while not compromising on the flexibility, agility, security, and compliance needs of the enterprise.