Infrastructure as Code: Terraform Best Practices

April 2, 2025 · simon balfe

Infrastructure as Code: Terraform Best Practices

Building maintainable cloud infrastructure with Terraform, from module design to state management in production.

Why Infrastructure as Code Matters

We used to provision infrastructure through the AWS console. Click click click, wait for resources to spin up, forget what we did, and have no way to reproduce it. Then came the inevitable “it works on my account but not yours” conversations.

Terraform changed everything. Our entire infrastructure is now versioned, reviewable, and reproducible.

The Foundation: Project Structure

Here’s how we organize Terraform projects:

infrastructure/
├── modules/
│   ├── vpc/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── ecs-service/
│   └── rds/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── terraform.tfvars
│   │   └── backend.tf
│   ├── staging/
│   └── prod/
└── README.md

Key principles:

  • Modules are reusable components
  • Environments consume modules with different configurations
  • Each environment has its own state file

Writing Reusable Modules

Here’s a module for an ECS service:

# modules/ecs-service/main.tf
resource "aws_ecs_task_definition" "app" {
  family                   = var.service_name
  requires_compatibilities = ["FARGATE"]
  network_mode            = "awsvpc"
  cpu                     = var.cpu
  memory                  = var.memory
  execution_role_arn      = aws_iam_role.execution.arn
  task_role_arn           = aws_iam_role.task.arn

  container_definitions = jsonencode([{
    name  = var.service_name
    image = var.image
    
    portMappings = [{
      containerPort = var.container_port
      protocol      = "tcp"
    }]
    
    environment = [
      for key, value in var.environment : {
        name  = key
        value = value
      }
    ]
    
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.app.name
        "awslogs-region"        = var.aws_region
        "awslogs-stream-prefix" = "ecs"
      }
    }
  }])
}

resource "aws_ecs_service" "app" {
  name            = var.service_name
  cluster         = var.cluster_id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.private_subnet_ids
    security_groups  = [aws_security_group.app.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = var.service_name
    container_port   = var.container_port
  }
}

Variables and Validation

Make your modules self-documenting with good variable definitions:

# modules/ecs-service/variables.tf
variable "service_name" {
  description = "Name of the ECS service"
  type        = string
  
  validation {
    condition     = can(regex("^[a-z0-9-]+$", var.service_name))
    error_message = "Service name must be lowercase alphanumeric with hyphens."
  }
}

variable "cpu" {
  description = "CPU units for the task (256, 512, 1024, 2048, 4096)"
  type        = number
  default     = 256
  
  validation {
    condition     = contains([256, 512, 1024, 2048, 4096], var.cpu)
    error_message = "CPU must be one of: 256, 512, 1024, 2048, 4096."
  }
}

variable "memory" {
  description = "Memory (MB) for the task"
  type        = number
  default     = 512
}

variable "environment" {
  description = "Environment variables for the container"
  type        = map(string)
  default     = {}
}

Using Modules in Environments

# environments/prod/main.tf
module "api_service" {
  source = "../../modules/ecs-service"

  service_name     = "api-prod"
  cluster_id       = aws_ecs_cluster.main.id
  cpu              = 1024
  memory           = 2048
  desired_count    = 5
  image            = "123456789.dkr.ecr.us-east-1.amazonaws.com/api:latest"
  container_port   = 8080
  
  environment = {
    NODE_ENV     = "production"
    DATABASE_URL = data.aws_ssm_parameter.db_url.value
    REDIS_URL    = aws_elasticache_cluster.redis.cache_nodes.0.address
  }
  
  vpc_id             = data.aws_vpc.main.id
  private_subnet_ids = data.aws_subnets.private.ids
}

Remote State Management

Never store state files locally. Use S3 + DynamoDB for state locking:

# environments/prod/backend.tf
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-locks"
  }
}

Create the S3 bucket and DynamoDB table first:

# bootstrap/main.tf
resource "aws_s3_bucket" "terraform_state" {
  bucket = "mycompany-terraform-state"
}

resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "terraform-state-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Managing Secrets

Never commit secrets to version control. Use AWS Parameter Store or Secrets Manager:

# Store secret in AWS Secrets Manager
resource "aws_secretsmanager_secret" "db_password" {
  name = "prod/database/password"
}

resource "aws_secretsmanager_secret_version" "db_password" {
  secret_id     = aws_secretsmanager_secret.db_password.id
  secret_string = random_password.db_password.result
}

# Reference in ECS task
container_definitions = jsonencode([{
  name = "app"
  
  secrets = [{
    name      = "DATABASE_PASSWORD"
    valueFrom = aws_secretsmanager_secret.db_password.arn
  }]
}])

Terraform Workspaces (Use Sparingly)

Workspaces seem appealing but can cause issues:

# Create workspace
terraform workspace new staging

# Switch workspace
terraform workspace select prod

When to use:

  • Quick dev/test environments
  • Temporary infrastructure

When NOT to use:

  • Production infrastructure
  • Environments with different configurations

For prod, use separate directories with separate state files.

Plan and Apply Workflow

Always review changes before applying:

# Initialize (first time or after adding providers)
terraform init

# Format code
terraform fmt -recursive

# Validate configuration
terraform validate

# Plan changes
terraform plan -out=tfplan

# Review the plan carefully!

# Apply the plan
terraform apply tfplan

CI/CD Integration

Our GitHub Actions workflow for Terraform:

name: Terraform

on:
  pull_request:
    paths:
      - 'infrastructure/**'
  push:
    branches:
      - main

jobs:
  terraform:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: infrastructure/environments/prod
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
      
      - name: Terraform Init
        run: terraform init
      
      - name: Terraform Format
        run: terraform fmt -check
      
      - name: Terraform Validate
        run: terraform validate
      
      - name: Terraform Plan
        run: terraform plan -no-color
        continue-on-error: true
      
      - name: Terraform Apply
        if: github.ref == 'refs/heads/main'
        run: terraform apply -auto-approve

Import Existing Resources

Already have infrastructure? Import it:

# Import existing EC2 instance
terraform import aws_instance.example i-1234567890abcdef

# Import S3 bucket
terraform import aws_s3_bucket.example my-bucket-name

# Import RDS instance
terraform import aws_db_instance.example mydb

Write the resource definition, then import its state.

Handling State Drift

Resources get modified outside Terraform (console, CLI). Detect drift:

# Refresh state from real infrastructure
terraform refresh

# See what changed
terraform plan

Common Pitfalls

1. Hardcoded Values

# ❌ Bad: Hardcoded values
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"
}

# ✅ Good: Use variables and data sources
data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"] # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = var.instance_type
}

2. Not Using Lifecycle Blocks

Prevent accidental resource deletion:

resource "aws_db_instance" "prod" {
  # ... configuration ...

  lifecycle {
    prevent_destroy = true
    
    ignore_changes = [
      password,  # Don't detect drift in passwords
    ]
  }
}

3. Large Blast Radius

Keep environments separate. A bug in dev config shouldn’t affect prod.

Terraform vs Alternatives

CloudFormation:

  • AWS-only
  • Verbose YAML
  • Better AWS support (new services)

Pulumi:

  • Real programming languages
  • Good for complex logic
  • Steeper learning curve

CDK:

  • Synthesizes to CloudFormation
  • Type-safe
  • AWS-focused

Terraform wins for:

  • Multi-cloud
  • Mature ecosystem
  • Simple learning curve
  • Large community

Real-World Impact

After adopting Terraform:

  • Infrastructure changes: From hours to minutes
  • Environment consistency: Dev/staging/prod truly identical
  • Disaster recovery: Entire infrastructure rebuildable from code
  • Collaboration: Infrastructure changes go through PR review
  • Cost visibility: Easy to see what resources cost

Conclusion

Infrastructure as Code isn’t optional anymore. Terraform provides:

  • Version control for infrastructure
  • Reproducible environments
  • Change review process
  • Documentation through code

Start small. Terraform one service. Learn the patterns. Then expand. Your infrastructure will become more reliable, maintainable, and understandable.

The upfront investment in learning Terraform pays dividends every time you need to provision, modify, or understand your infrastructure.