Building a Production-Ready EKS Infrastructure for SaaS on AWS: Part 1 - The VPC Foundation
- Published on
- • 6 mins read•--- views
Building a Production-Ready EKS Infrastructure for SaaS on AWS: Part 1 - The VPC Foundation
This is the first article in our comprehensive series on building a complete, production-ready infrastructure for running a SaaS application on AWS. By the end of this series, you'll have a fully functional, secure and scalable Amazon Elastic Kubernetes Service (EKS) cluster along with several essential components and tools.
In this initial article, we'll focus on laying the foundation - constructing a robust Virtual Private Cloud (VPC) that will host our entire infrastructure. We'll leverage Terraform to provision a VPC with best-in-class security and scalability features.
Understanding the "why" behind the infrastructure choices is just as important as the "how", so throughout this guide, I'll explain the reasoning and benefits of the various architectural decisions.
So let's start with the foundation: the Virtual Private Cloud (VPC).
Architecture Overview
Our VPC architecture follows AWS best practices with:
- Multi-AZ deployment across 3 availability zones
- Public, private, and intra (isolated) subnets
- NAT Gateways for outbound internet access
- VPC Endpoints for secure AWS service access
- Network flow logs for security monitoring
Understanding the Infrastructure Code
Let's break down each component of our VPC configuration and understand why we're implementing it this way.
1. Base Configuration and Variables
data "aws_availability_zones" "available" {
state = "available"
}
locals {
project_name = "${var.project_name}-${var.environment}"
azs = slice(data.aws_availability_zones.available.names, 0, 3)
tags = {
project_name = var.project_name
environment = var.environment
admin = data.aws_caller_identity.current.arn
}
}
Why?
- We dynamically fetch available AZs instead of hardcoding them
- We use exactly 3 AZs for optimal balance between high availability and cost
- Using slice ensures we don't accidentally deploy to more AZs than needed
How it helps:
- Prevents deployment failures if specific AZs are unavailable
- Ensures consistent high availability across regions
- Controls costs by limiting the number of NAT gateways and other per-AZ resources
- Establishes standardized tagging for resource management
2. VPC Configuration
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.1.2"
name = local.project_name
cidr = var.vpc_params.vpc_cidr
azs = local.azs
tags = local.tags
enable_flow_log = var.vpc_params.enable_flow_log
# ... other configurations ...
}
Key features:
- Uses the official AWS VPC Terraform module
- Adds tags through tags of the VPC resource itself, those tags will be inherited by all the resources inside the VPC
- Captures information about all IP traffic going in and out of your VPC network interfaces with Flow logs
Cost Warning ⚠️
Flow logs can be very costly, so keep them disabled in development environments.
3. Subnet Design
module "vpc" {
# ... other configurations ...
private_subnets = [for k, v in local.azs : cidrsubnet(var.vpc_params.vpc_cidr, var.vpc_params.private_bits, k)]
public_subnets = [for k, v in local.azs : cidrsubnet(var.vpc_params.vpc_cidr, var.vpc_params.public_bits, k + 48)]
intra_subnets = [for k, v in local.azs : cidrsubnet(var.vpc_params.vpc_cidr, var.vpc_params.intra_bits, k + 64)]
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = 1
}
public_subnet_tags = {
"kubernetes.io/role/elb" = 1
}
}
Why Three Subnet Tiers?
Public Subnets:
- Used only for internet-facing load balancers
- Why? Minimizes attack surface by limiting internet-accessible resources
Private Subnets:
- Hosts your EKS worker nodes and application pods
- Why? Provides internet access through NAT while protecting from inbound traffic
Intra Subnets:
- Where the EKS cluster control plane (ENIs) will be provisioned and never need internet access. This placement is solely for network communication, not for hosting the control plane itself, as some might assume
- Why? Maximum security for kubernetes control plane
Why Use cidrsubnet?
- Automatically calculates non-overlapping CIDR ranges
- Makes it easy to adjust subnet sizes using variables
- Ensures consistent IP allocation across environments
What Are These Tags?
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = 1
}
public_subnet_tags = {
"kubernetes.io/role/elb" = 1
}
Why Do We Need These Tags?
These tags are specifically designed for the AWS Load Balancer Controller (AWS LBC) in EKS (we'll revisit that later in the series). They serve as automatic subnet discovery markers that tell Kubernetes where to create load balancers.
4. NAT Gateway Configuration
module "vpc" {
# ... other configurations ...
enable_nat_gateway = var.vpc_params.enable_nat_gateway
single_nat_gateway = var.vpc_params.single_nat_gateway
one_nat_gateway_per_az = var.vpc_params.one_nat_gateway_per_az
}
Why Configurable NAT Gateways?
- Development environments: Use single_nat_gateway = true to reduce costs
- Production environments: Use one_nat_gateway_per_az = true for high availability
- Why variable? Allows environment-specific optimization of cost vs. reliability
5. VPC Endpoints for AWS Services
module "endpoints" {
source = "terraform-aws-modules/vpc/aws//modules/vpc-endpoints"
version = "~> 5.1.1"
endpoints = {
s3 = {
service = "s3"
service_type = "Gateway"
route_table_ids = flatten([module.vpc.intra_route_table_ids, module.vpc.private_route_table_ids])
},
dynamodb = {
service = "dynamodb"
service_type = "Gateway"
},
sts = {
service = "sts"
private_dns_enabled = true
subnet_ids = module.vpc.private_subnets
},
ecr_api = {
service = "ecr.api"
private_dns_enabled = true
subnet_ids = module.vpc.private_subnets
},
ecr_dkr = {
service = "ecr.dkr"
private_dns_enabled = true
subnet_ids = module.vpc.private_subnets
}
}
}
Why VPC Endpoints?
Security:
- Traffic to AWS services never leaves AWS network
- Removes need for internet access for AWS API calls
Performance:
- Direct connection to AWS services
- Lower latency than internet-based API calls
Cost:
- Reduces NAT gateway traffic
- Can lower data transfer costs
Cost Warning ⚠️
Interface endpoints incur hourly costs along with additional data processing charges. In contrast, gateway endpoints do not use AWS PrivateLink and have no additional charges for usage.
Why These Specific Endpoints?
- S3: Required for pulling and accessing application assets
- ECR: Needed for pulling container images without internet access
- STS: Required for IAM authentication
- DynamoDB: Often used as NoSQL DB for Serverless applications
6. Security Groups and Policies
data "aws_iam_policy_document" "endpoint_policy" {
statement {
effect = "Deny"
actions = ["*"]
resources = ["*"]
condition {
test = "StringNotEquals"
variable = "aws:SourceVpc"
values = [module.vpc.vpc_id]
}
}
}
Why This Policy?
- Implements explicit deny by default
- Only allows access from within our VPC
- Prevents any potential cross-VPC access
What's Next?
In the next article of this series, we'll build upon this VPC foundation to:
- Provision EKS cluster
- Install essentials cluster components and tools
- Set up cluster autoscaling
Our VPC configuration isn't just a network, it's the security foundation for our entire Kubernetes infrastructure. Each design decision supports either security, scalability, or operational efficiency (often all three), giving us the robust foundation we need for a production-grade EKS cluster.
Stay tuned for Part 2, where we'll dive into EKS cluster configuration and explain how it integrates with the VPC we've just built!
Remember to adjust the CIDR ranges and NAT gateway configuration based on your specific requirements and budget considerations. The provided code can be easily customized through variables while maintaining the secure architecture patterns.