OpenTofu Foundations: Multi-Environment Management with OpenTofu (Part 7)

In this session we will learn how to manage multiple deployments with the same OpenTofu configuration. We will learn about workspaces which are OpenTofu's built in mechanism for managing multiple state backends. Next we will cover some tricks to make managing workspaces less error prone for users. Lastly we will look at Checkov and how to implement security and policy scanning to ensure your deployments meet your compliance needs.

by: Dave Williams

iac opentofu terraform hashicorp

This is part 7 of a 10 week workshop. Check out part 6 or watch the recorded session here.

Why Bother with IaC?

To this point we have covered a lot of how, but have neglected the why. It is important to remind ourselves how we even ended up with this technology. If we forget the horrors of the past, it will come back to bite us.

A Trip In the Technology Time Machine

The story really starts in 1993 with the release of CFEngine. Mark Burgess was managing a collections of workspaces in a Physics lab. Mark's time was being stolen manually fixing and scripting solutions on these workspaces which were running different flavors of UNIX. To manage the problem Mark created a declarative system that would allow him to declare what the working state of the system is, and CFEngine would make it so. This is the first instance of configuration as code.

Not until 2009 with the release of Chef do we see this declarative approach to server management gain wide adoption. Chef introduced a full programming language to extend the declarative DSL. Chef was built with a client/server architecture which was perfect for the dawn of EC2 3 years prior where servers were no longer hardware that sits in your data center.

AWS continues to force innovation with the release of managed services and serverless. Now configuration could be spread out across your entire cloud account. You have linux sysadmin problems at datacenter scale. Not surprisingly the first release of Terraform and public access to AWS Lambda happen within months of eachother in 2014.

With the smarts conatined in more and more sophisticated orchestration systems like Kubernetes and ECS, declarative management is absolutely crucial.

Why Not Click Around In The Console?

Cloud native systems are complex. The time between first creating a staging environment to full production release can be months or years. Will you really remember which buttons you pushed? Which IAM roles were necessary? Did you open that security group up to the world for a reason? You need reproducibility in the cloud. Everything is ephemeral so you can spin up a whole environment in the event that something malfunctions in a region. In the event of an outage are you going to remember the magical console incantation? Probably not.

The other major advantage to IaC and the tools we will cover today is parity. When working with dev, staging, and production, you want confidence that the changes in staging will actually work in production. To ensure that is the case, you want your environment to be structurally the same. However we do not want to run at the same level of scale that powers our production environments in our internal testing environments. This is where OpenTofu workspaces shine.

cd week-7/code
tofu init -var-file default.auto.tfvars

As we can see our backend is remote and we have minimal changes.

tofu workspace list

As you can see OpenTofu automatically creates a default workspace. When handling a single deployment of your IaC this is sufficient. We will however be managing staging and production. Let's create our first workspace for staging.

tofu workspace new staging
tofu plan -var-file default.auto.tfvars

As we can see OpenTofu wants to create all of our resources over again. Workspaces prefix the key in our backend configuration. The key is now prefixed with our workspace name giving us the ability to stamp out duplicate infrastructure in parity. We can not run apply though because it would fail due to naming collisions. Let's handle this by creating a new tfvars file for our workspace.

touch staging.auto.tfvars
echo 'name_prefix = "wk7-dave"' >> staging.auto.tfvars
tofu plan -var-file staging.auto.tfvars

This fixed the naming collisions we were going to see, but the tags are wrong as this is no longer the dev environment. Using the terraform.workspace variable we can add tags to the environment. We can add a locals block with a ternary to pull the environment from. This will help us handle backwards compatibility with the default workspace.

locals {
  environment = terraform.workspace == "default" ? "dev" : terraform.workspace
  environment_name_segment = terraform.workspace == "default" ? "" : "${terraform.workspace}-"
}

Now we can set the environment using our local so it reflects what workspace owns the resource

provider "aws" {
  region = "us-west-2"

  default_tags {
    tags = {
      environment = local.environment 
      project     = "opentofu-foundations"
    }
  }
}

Lets add the workspace name to the aws_db_instance, and aws_instance name as well

module "aws_db_instance" {
  source = "github.com/massdriver-modules/otf-shared-modules//modules/aws_db_instance?ref=main"

  name_prefix = "${var.name_prefix}-${local.environment_name_segment}db"
  db_name     = "wordpress"
  username    = "admin"
  password    = "yourpassword" # In production, use a secure method for passwords

  tags = {
    Owner = "YourName"
  }
}

module "aws_instance" {
  source = "github.com/massdriver-modules/otf-shared-modules//modules/aws_instance?ref=main"

  name_prefix   = "${var.name_prefix}-${local.environment_name_segment}instance"
  instance_type = "t2.micro"

  user_data = <<-EOF
                #!/bin/bash
                yum update -y
                amazon-linux-extras install docker -y
                service docker start
                usermod -a -G docker ec2-user
                docker run -d \
                  -e WORDPRESS_DB_HOST=${module.aws_db_instance.endpoint} \
                  -e WORDPRESS_DB_USER=${module.aws_db_instance.username} \
                  -e WORDPRESS_DB_PASSWORD=${module.aws_db_instance.password} \
                  -e WORDPRESS_DB_NAME=${module.aws_db_instance.db_name} \
                  -p 80:80 ${var.image.name}:${var.image.tag}
              EOF

  tags = {
    Owner = "YourName"
  }
}

Once we have a better differentiator for environments, we can change the instance name back because it is likely a project name which we would want in parity.

name_prefix = "wk5-cory"

tofu plan -var-file staging.auto.tfvars

We can see the changes reflected in our plan. Let's apply it and look at the state file

tofu apply -var-file staging.auto.tfvars

We now see a new directory in our bucket, env:/. This is a default prefix that is applied when using non default workspaces. Inside that directory we see staging/ this is the name of our workspace. Nested under our workspace identifier we see the wordpress/ directory which begins our set key in the s3 backend configuration. Your workspace default prefix can be overridden in the backend configuration with the workspace_key_prefix attribute.

Making Prod

Just like staging, lets make a production workspace

touch production.auto.tfvars
echo 'name_prefix = "wk5-cory"' >> production.auto.tfvars
tofu plan -var-file production.auto.tfvars

Now that we have a production workspace, we will want to serve our wordpress blog on a larger instance than our internal staging deployment. Let's declare a variable so we can set an instance type.

variable wordpress_instance_type {
  type = string
}

variable db_instance_class {
  type = string
}

module "aws_db_instance" {
  source = "github.com/massdriver-modules/otf-shared-modules//modules/aws_db_instance?ref=main"

  name_prefix = "${var.name_prefix}-${local.environment_name_segment}db"
  db_name     = "wordpress"
  username    = "admin"
  password    = "yourpassword" # In production, use a secure method for passwords
  instance_class = var.db_instance_class

  tags = {
    Owner = "YourName"
  }
}

module "aws_instance" {
  source = "github.com/massdriver-modules/otf-shared-modules//modules/aws_instance?ref=main"

  name_prefix   = "${var.name_prefix}-${local.environment_name_segment}instance"
  instance_type = var.wordpress_instance_type 

  user_data = <<-EOF
                #!/bin/bash
                yum update -y
                amazon-linux-extras install docker -y
                service docker start
                usermod -a -G docker ec2-user
                docker run -d \
                  -e WORDPRESS_DB_HOST=${module.aws_db_instance.endpoint} \
                  -e WORDPRESS_DB_USER=${module.aws_db_instance.username} \
                  -e WORDPRESS_DB_PASSWORD=${module.aws_db_instance.password} \
                  -e WORDPRESS_DB_NAME=${module.aws_db_instance.db_name} \
                  -p 80:80 ${var.image.name}:${var.image.tag}
              EOF

  tags = {
    Owner = "YourName"
  }
}

Now add your values in your tfvars files to verify backwards compatability in the default workspace

name_prefix = "wk5-cory"
wordpress_instance_type = "t2.micro"
db_instance_class = "db.t3.micro"

tofu workspace select default
tofu plan -var-file default.auto.tfvars

Looks like it will not make any changes in default. Let's do the same in staging.

name_prefix = "wk5-cory"
wordpress_instance_type = "t2.micro"
db_instance_class = "db.t3.micro"

tofu workspace select staging
tofu plan -var-file staging.auto.tfvars

With the plan clean in staging we can do the same in production. Let's also increase the size of the instances in production

name_prefix = "wk5-cory"
wordpress_instance_type = "t3.micro"
db_instance_class = "db.t3.small"

tofu workspace select production
tofu plan -var-file production.auto.tfvars

Workspace Automation

As you can see we are doing a lot of juggling. We have multiple tfvars files, multiple state backends. This is a recipe for a serious problem. Let's do some quick automation to make this whole thing easier. The quickest path is our old friend make.

touch Makefile

.PHONY: prod.plan default.plan staging.plan
default.plan:
	tofu workspace select default
	tofu plan -var-file=default.auto.tfvars
	
prod.plan:
	tofu workspace select production
	tofu plan -var-file=production.auto.tfvars

staging.plan:
	tofu workspace select staging
	tofu plan -var-file=staging.auto.tfvars

.PHONY: staging.deploy prod.deploy default.deploy
staging.deploy:
	tofu workspace select staging
	tofu apply -var-file=staging.auto.tfvars

prod.deploy:
	tofu workspace select production
	tofu apply -var-file=production.auto.tfvars

default.deploy:
	tofu workspace select default
	tofu apply -var-file=default.auto.tfvars

We have no created an abstraction around OpenTofu commands which ensures that we are at a minimum using the correct variable files with the correct workspaces which reduces the chances of user error. Let's test the functionality by planning default

make default.plan

Governance

Now that we have the ability to deploy to multiple environments we need some form of governance in place to ensure that our infrastructure is secure in production. We can also implement checks for things like instance types in non production so large instances aren't running up the AWS bill when they are not required.

To enable these checks we will use Checkov in our pipelines. Checkov runs by taking a plan generated by OpenTofu and running that plan against some predefined security rules. Let's add Checkov now.

prod.plan:
	tofu workspace select production
	tofu plan -var-file=production.auto.tfvars -out ./production.plan
  tofu show -json production.plan  > production.json
  checkov -f production.json
  rm production.json production.plan

make prod.plan

As we can see there are a bunch of checks that are being run, some of which pass and some of which fail. This will give us a clear idea of our security posture. Let's ignore the failing checks in our default workspace.

default.plan:
  tofu workspace select default
  tofu plan -var-file=default.auto.tfvars -out ./default.plan
  tofu show -json default.plan > default.json
  checkov -f default.json \
    --skip-check CKV_AWS_16,CKV_AWS_133,CKV_AWS_293,CKV_AWS_354,CKV_AWS_129,CKV_AWS_129,CKV_AWS_157,CKV_AWS_118,CKV_AWS_79,CKV_AWS_8,CKV_AWS_88,CKV_AWS_135,CKV_AWS_126,CKV_AWS_79,CKV_AWS_88,CKV_AWS_24,CKV_AWS_260,CKV2_AWS_60
  rm default.json default.plan

Let's run this now

make default.plan

This is now good enough for a dev environment and all checks are passing.

Let's add a custom policy which will require a t2.micro in environments that are not production.

touch preprod-policy.yaml

metadata:
  id: "CKV2_PRE_PROD_INSTANCE_TYPE"
  name: "Ensure instance types are t2.micro"
  category: "CONVENTION"
  severity: "HIGH"
definition:
     cond_type: "attribute"
     resource_types:
     - "aws_instance"
     attribute: "instance_type"
     operator: "equals"
     value: "t2.micro"

Lets update the makefile to use this new policy for staging

staging.plan:
  tofu workspace select staging
   tofu plan -var-file=staging.auto.tfvars -out ./staging.plan
  tofu show -json staging.plan > staging.json
  checkov -f staging.json \
    --soft-fail \
    --external-checks-dir . \
    --run-all-external-checks \
    --skip-check CKV_AWS_16,CKV_AWS_133,CKV_AWS_293,CKV_AWS_354,CKV_AWS_129,CKV_AWS_129,CKV_AWS_157,CKV_AWS_118,CKV_AWS_79,CKV_AWS_8,CKV_AWS_88,CKV_AWS_135,CKV_AWS_126,CKV_AWS_79,CKV_AWS_88,CKV_AWS_24,CKV_AWS_260,CKV2_AWS_60
  rm staging.json staging.plan

Finally update your staging tfvars to violate this policy

wordpress_instance_type = "t3.micro"

make staging.plan

As we can see our custom policy has failed

Check: CKV2_PRE_PROD_INSTANCE_TYPE: "Ensure instance types are t2.micro"
        FAILED for resource: module.aws_instance.aws_instance.this[0]
        Severity: HIGH
        File: /staging.json:0-0

The Path to a Platform

As you can see all of this workspace juggling is a challenge. Imagine having to manage an unknown number of environments with feature branches or preview environments. Imagine you are tasked with creating blueprints for a company that buy's media companies and runs thousands of Wordpress instances. What would you have to do in that case?

We will no longer have the bandwidth to be involved in every new infrastructure deployment. We will need a central hub where others can securely deploy infrastructure.
We don't want to be in the business of managing vars files for each environment? How can we remove that from our perview?
We need to maximize OpenTofu's capabilities as blueprints. We should be able to stamp out well crafted infrastructure without having to write more OpenTofu.
We need to eliminate pipelines as a blocker to delivering infrastructure.
We need to package security and validation in our blueprints to ensure end users can have confidence in what they are deploying.

Tired of fragile infrastructure?

OpenTofu Foundations: Multi-Environment Management with OpenTofu (Part 7)

Why Bother with IaC?

A Trip In the Technology Time Machine

Why Not Click Around In The Console?

Making Prod

Workspace Automation

Governance

The Path to a Platform

Sign up to our newsletter to stay up to date

OpenTofu Foundations: Building Composable Infrastructure-as-Code Modules (Part 2)

OpenTofu Foundations: Getting Started with OpenTofu (Part 1)

Getting Started with OpenTofu on Azure

Marketplace

Templates

Solutions

Company

Resources