Terraform Workflow at Scale: Best Practices

Welcome to a comprehensive guide on using Terraform at scale. As businesses grow, their infrastructural requirements evolve, demanding robust, scalable, and efficient solutions. This post delves into the best practices of scaling Terraform workflow, which could potentially benefit hundreds or even thousands of users.

To begin with, let's understand the workflow of Terraform at a small scale. When an individual uses Terraform, they write some Terraform code locally, run 'terraform plan' to check the potential changes, and if everything appears as expected, they apply these changes. This leads to the generation of a 'state' file to keep track of the created resources.

However, to scale this up for a larger user base, the workflow needs to undergo significant changes. This post outlines three critical aspects to consider when scaling up: Decomposition into smaller workspaces, enabling less experienced users, and risk governance.

Decomposition into Smaller Workspaces

As with a large application, we don't have one super app that represents a whole company; the same applies to Terraform. It's essential to break it down into several smaller services and applications, a concept known as 'workspaces.' For example, one workspace could define the core network, one for shared logging service, one for shared databases, etc., eventually leading to the applications. This hierarchical decomposition eases management as it reduces the size of Terraform code and aligns with different organizational structures and role-based access control, optimizing workflow and security.

Enabling Less Experienced Users

For new teams to onboard smoothly, the concept of a 'module registry' can be useful. Here, the operationally savvy individuals (or the 'producers') publish modules (like Java app, C-sharp app, etc.) with specific definitions on how to set them up. This way, 'consumers' don't need to understand the intricacies of the infrastructure; they only need to input the variables relevant to their use-case, thus creating a more user-friendly experience.

Risk Governance

At a small scale, trust is high, but as we scale up to hundreds or thousands of users, trust diminishes, and operational or security risks arise. One solution is decomposing into smaller units with role-based access control, limiting access and changes to designated teams. However, with app teams defining arbitrary templates, safety becomes a concern.

One conventional method is a central review of the code to ensure compliance with organizational policies. While effective, this approach is time-consuming and can take days or weeks. An alternative solution is the use of 'Policy as Code' through a framework like Sentinel. This approach allows you to automate the review process, which checks each line of code for compliance before deployment, increasing efficiency and minimizing risks

Conclusion

Scaling Terraform requires efficient management of infrastructure as code, the use of module registries, and automated policy enforcement. By adopting these practices, development groups can quickly self-service, reduce dependencies on central review teams, and safely deploy changes. For more resources on Terraform best practices and tips, visit HashiCorp's Terraform product page.

Previous
Previous

Dropping the Beat: Django Deployment Choices - EC2, Lambda, and Kubernetes