Integrate Hetzner Cloud Metrics into AWS CloudWatch with Terraform

In modern cloud architectures, organizations often leverage multiple cloud providers to optimize costs, avoid vendor lock-in, or utilize specific regional advantages. Hetzner Cloud offers exceptional price-performance for compute and networking, while AWS provides unmatched breadth in managed services and global infrastructure.

However, managing a hybrid cloud environment introduces a critical challenge: fragmented monitoring. Your operations team needs to context-switch between Hetzner’s monitoring interface and AWS CloudWatch, making it difficult to establish unified alerting, dashboards, and observability practices.

In this comprehensive guide, I’ll show you how to integrate Hetzner Cloud metrics into AWS CloudWatch using a serverless Terraform module that leverages Step Functions, EventBridge, and the Hetzner Cloud API.

The Hybrid Cloud Monitoring Challenge

The Problem

Organizations running hybrid infrastructure across Hetzner and AWS face several monitoring challenges:

1. Monitoring Fragmentation

Hetzner metrics in Hetzner Cloud Console
AWS metrics in CloudWatch
No single pane of glass for infrastructure health
Different metric formats and retention policies

2. Alerting Complexity

Separate alerting systems (Hetzner vs CloudWatch Alarms)
Inconsistent notification channels
Difficult to correlate cross-cloud issues
Complex on-call workflows

3. Compliance & Governance

Centralized logging requirements for auditing
Unified retention policies across providers
Consolidated cost tracking for observability

4. Operational Overhead

Training teams on multiple monitoring platforms
Maintaining separate dashboards and runbooks
Higher MTTR (Mean Time To Resolution) for incidents

The Solution: Centralized Monitoring in CloudWatch

By integrating Hetzner metrics into AWS CloudWatch, you gain:

Unified Dashboards: Single CloudWatch dashboard showing all infrastructure
Consistent Alerting: CloudWatch Alarms for both AWS and Hetzner resources
Correlation: Easy correlation between Hetzner and AWS metrics
Automation: Leverage AWS native tools (Lambda, EventBridge, SNS) for all alerting
Cost Efficiency: Consolidated observability stack reduces tool sprawl
Compliance: Centralized metric storage meeting retention requirements

Architecture Overview

The solution uses a serverless, event-driven architecture to periodically fetch metrics from Hetzner Cloud API and publish them to CloudWatch:

┌─────────────────────────────────────────────────────────────────┐
│                    Hetzner Cloud (External)                      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │ Load Balancer│  │   Server 1   │  │   Server 2   │          │
│  │  (Metrics)   │  │  (Metrics)   │  │  (Metrics)   │          │
│  └──────────────┘  └──────────────┘  └──────────────┘          │
└────────────────────┬────────────────────────────────────────────┘
                     │ Hetzner Cloud API
                     │ (Bearer Token Auth)
                     ▼
┌─────────────────────────────────────────────────────────────────┐
│                         AWS Account                              │
│                                                                  │
│  ┌───────────────────────────────────────────────────────┐     │
│  │          EventBridge Scheduler (Optional)              │     │
│  │        Trigger: rate(5 minutes) or cron               │     │
│  └───────────────────┬───────────────────────────────────┘     │
│                      │                                           │
│                      ▼                                           │
│  ┌───────────────────────────────────────────────────────┐     │
│  │           AWS Step Functions State Machine             │     │
│  │  ┌─────────────────────────────────────────────────┐  │     │
│  │  │ 1. Retrieve API Token from Secrets Manager      │  │     │
│  │  │    (via EventBridge Connection)                 │  │     │
│  │  └─────────────────────────────────────────────────┘  │     │
│  │  ┌─────────────────────────────────────────────────┐  │     │
│  │  │ 2. Fetch Metrics from Hetzner Cloud API         │  │     │
│  │  │    - GET /load_balancers/{id}/metrics           │  │     │
│  │  │    - GET /servers/{id}/metrics                  │  │     │
│  │  └─────────────────────────────────────────────────┘  │     │
│  │  ┌─────────────────────────────────────────────────┐  │     │
│  │  │ 3. Transform Metrics to CloudWatch Format       │  │     │
│  │  └─────────────────────────────────────────────────┘  │     │
│  │  ┌─────────────────────────────────────────────────┐  │     │
│  │  │ 4. Publish to CloudWatch Custom Metrics         │  │     │
│  │  │    - Namespace: HetznerLoadBalancer/HetznerServer│ │     │
│  │  │    - Dimensions: LoadBalancerId/ServerId        │  │     │
│  │  └─────────────────────────────────────────────────┘  │     │
│  └───────────────────────────────────────────────────────┘     │
│                      │                                           │
│                      ▼                                           │
│  ┌───────────────────────────────────────────────────────┐     │
│  │              AWS CloudWatch Metrics                    │     │
│  │  • HetznerLoadBalancer/* (connections, requests, BW)   │     │
│  │  • HetznerServer/* (CPU, disk I/O, network)           │     │
│  └───────────────────────────────────────────────────────┘     │
│                                                                  │
│  ┌───────────────────────────────────────────────────────┐     │
│  │         CloudWatch Dashboards & Alarms                 │     │
│  │  Unified view of AWS + Hetzner infrastructure         │     │
│  └───────────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────────┘

Key Components

EventBridge Connection: Securely stores Hetzner Cloud API token using AWS Secrets Manager
EventBridge Scheduler: Triggers metric collection on a schedule (e.g., every 5 minutes)
Step Functions State Machine: Orchestrates the metric collection workflow
Hetzner Cloud API: Source of truth for infrastructure metrics
CloudWatch Custom Metrics: Storage for Hetzner metrics in AWS
IAM Roles: Least-privilege permissions for Step Functions execution

Supported Hetzner Metrics

Load Balancer Metrics

The module collects comprehensive load balancer performance data:

Metric Name	CloudWatch Metric	Description	Unit
`open_connections`	OpenConnections	Current number of active connections	Count
`connections_per_second`	ConnectionsPerSecond	New connections established per second	Count/Second
`requests_per_second`	RequestsPerSecond	HTTP/HTTPS requests handled per second	Count/Second
`bandwidth.in`	BandwidthIn	Inbound network bandwidth	Bytes/Second
`bandwidth.out`	BandwidthOut	Outbound network bandwidth	Bytes/Second

Example Use Cases:

Monitor connection pool saturation
Alert on traffic spikes
Track request rate patterns
Identify bandwidth bottlenecks
Capacity planning for load balancers

Server Metrics

Comprehensive server performance and resource utilization:

Metric Name	CloudWatch Metric	Description	Unit
`cpu`	CPUUtilization	CPU usage percentage	Percent
`disk.0.iops.read`	DiskReadIOPS	Disk read operations per second	Count/Second
`disk.0.iops.write`	DiskWriteIOPS	Disk write operations per second	Count/Second
`disk.0.bandwidth.read`	DiskReadBandwidth	Disk read throughput	Bytes/Second
`disk.0.bandwidth.write`	DiskWriteBandwidth	Disk write throughput	Bytes/Second
`network.0.bandwidth.in`	NetworkIn	Inbound network bandwidth	Bytes/Second
`network.0.bandwidth.out`	NetworkOut	Outbound network bandwidth	Bytes/Second
`network.0.pps.in`	NetworkPacketsIn	Inbound packets per second	Count/Second
`network.0.pps.out`	NetworkPacketsOut	Outbound packets per second	Count/Second

Example Use Cases:

Monitor CPU utilization trends
Alert on disk I/O saturation
Track network throughput
Identify noisy neighbor issues
Optimize instance sizing

Quick Start: Deploy in 10 Minutes

Prerequisites

Before deploying, ensure you have:

Hetzner Cloud Account with API access
Hetzner Cloud API Token (Read-only permissions sufficient)
AWS Account with permissions for:
- Step Functions
- CloudWatch
- EventBridge
- IAM
- Secrets Manager
Terraform >= 1.0 installed locally

Step 1: Generate Hetzner Cloud API Token

Create a read-only API token in Hetzner Cloud Console:

# Navigate to: https://console.hetzner.cloud/
# → Select your project
# → Security → API Tokens
# → Generate API Token
#   - Description: "CloudWatch Integration (Read-Only)"
#   - Permissions: Read
#   - Copy the token (you won't see it again!)

Security Best Practice: Use read-only tokens. The integration only needs to fetch metrics, not modify resources.

Step 2: Get Hetzner Resource IDs

Find your load balancer or server IDs:

# Install Hetzner CLI (optional)
brew install hcloud  # macOS
# Or download from: https://github.com/hetznercloud/cli/releases

# Login
hcloud context create monitoring

# List load balancers
hcloud load-balancer list

# Output:
# ID        NAME            IPV4              TYPE    LOCATION
# 123456    production-lb   95.217.1.100      lb11    nbg1

# List servers
hcloud server list

# Output:
# ID        NAME         STATUS    IPV4             TYPE      DATACENTER
# 789012    web-01       running   95.217.1.101     cx21      nbg1-dc3
# 789013    web-02       running   95.217.1.102     cx21      nbg1-dc3

Note the resource IDs (e.g., 123456 for load balancer, 789012 for server).

Step 3: Create Terraform Configuration

Create a new directory and main.tf:

terraform {
  required_version = ">= 1.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "eu-central-1"  # Frankfurt - close to Hetzner Germany
}

# Store Hetzner API token securely
variable "hetzner_api_token" {
  description = "Hetzner Cloud API token (read-only)"
  type        = string
  sensitive   = true
}

# Monitor Hetzner Load Balancer
module "hetzner_lb_metrics" {
  source  = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  version = "~> 1.0"

  # API Authentication
  hetzner_cloud_api_token = var.hetzner_api_token

  # Metric Configuration
  metric_type = "loadbalancer"
  name        = "production-lb-metrics"

  # Resource to Monitor
  data = {
    loadbalancer_id = "123456"  # Replace with your LB ID
  }

  # Scheduling
  create_scheduler    = true
  schedule_expression = "rate(5 minutes)"

  tags = {
    Environment = "production"
    ManagedBy   = "terraform"
    Purpose     = "hetzner-monitoring"
  }
}

# Monitor Hetzner Server
module "hetzner_server_metrics" {
  source  = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  version = "~> 1.0"

  # Reuse the same EventBridge Connection (API token) from LB module
  create_event_connection = false
  event_connection_arn    = module.hetzner_lb_metrics.event_connection_arn

  # Metric Configuration
  metric_type = "server"
  name        = "web-01-metrics"

  # Resource to Monitor
  data = {
    server_id = "789012"  # Replace with your server ID
  }

  # Scheduling
  create_scheduler    = true
  schedule_expression = "rate(5 minutes)"

  tags = {
    Environment = "production"
    ManagedBy   = "terraform"
    ServerName  = "web-01"
  }
}

# Outputs
output "lb_state_machine_arn" {
  value       = module.hetzner_lb_metrics.state_machine_arn
  description = "Step Functions state machine ARN for load balancer metrics"
}

output "server_state_machine_arn" {
  value       = module.hetzner_server_metrics.state_machine_arn
  description = "Step Functions state machine ARN for server metrics"
}

output "cloudwatch_namespaces" {
  value = {
    loadbalancer = "HetznerLoadBalancer"
    server       = "HetznerServer"
  }
  description = "CloudWatch namespaces where metrics are published"
}

Create terraform.tfvars:

hetzner_api_token = "your-hetzner-api-token-here"

Security Note: Add terraform.tfvars to .gitignore to prevent committing secrets.

Step 4: Deploy the Integration

# Initialize Terraform
terraform init

# Review planned changes
terraform plan

# Deploy the integration
terraform apply -auto-approve

Deployment takes approximately 1-2 minutes and creates:

2 EventBridge Connections (or 1 if reusing)
2 Step Functions State Machines
2 EventBridge Scheduler rules
IAM roles and policies
CloudWatch Log Groups

Step 5: Verify Metrics Collection

Wait 5-10 minutes for the first metrics to appear, then verify:

# List available metrics for load balancer
aws cloudwatch list-metrics \
  --namespace HetznerLoadBalancer \
  --dimensions Name=LoadBalancerId,Value=123456

# Expected output:
# {
#     "Metrics": [
#         {
#             "Namespace": "HetznerLoadBalancer",
#             "MetricName": "OpenConnections",
#             "Dimensions": [{"Name": "LoadBalancerId", "Value": "123456"}]
#         },
#         {
#             "Namespace": "HetznerLoadBalancer",
#             "MetricName": "RequestsPerSecond",
#             "Dimensions": [{"Name": "LoadBalancerId", "Value": "123456"}]
#         }
#         // ... more metrics
#     ]
# }

# Get recent CPU metrics for server
aws cloudwatch get-metric-statistics \
  --namespace HetznerServer \
  --metric-name CPUUtilization \
  --dimensions Name=ServerId,Value=789012 \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average

# Expected output:
# {
#     "Datapoints": [
#         {
#             "Timestamp": "2026-02-16T10:00:00Z",
#             "Average": 23.5,
#             "Unit": "Percent"
#         },
#         {
#             "Timestamp": "2026-02-16T10:05:00Z",
#             "Average": 25.1,
#             "Unit": "Percent"
#         }
#         // ... more datapoints
#     ]
# }

# Check Step Functions execution history
aws stepfunctions list-executions \
  --state-machine-arn $(terraform output -raw lb_state_machine_arn) \
  --max-results 5

Advanced Configuration Patterns

Pattern 1: Multi-Server Monitoring with Shared Connection

Monitor multiple servers efficiently by reusing a single EventBridge Connection:

# Create shared connection
module "hetzner_connection" {
  source  = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  version = "~> 1.0"

  hetzner_cloud_api_token = var.hetzner_api_token

  # Only create the connection, no scheduler
  create_scheduler = false
  metric_type      = "server"
  name             = "shared-connection"

  data = {
    server_id = "0"  # Placeholder
  }
}

# Monitor multiple servers using the shared connection
module "web_servers" {
  source   = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  version  = "~> 1.0"
  for_each = toset(["789012", "789013", "789014"])

  # Reuse connection
  create_event_connection = false
  event_connection_arn    = module.hetzner_connection.event_connection_arn

  metric_type = "server"
  name        = "web-server-${each.key}"

  data = {
    server_id = each.key
  }

  create_scheduler    = true
  schedule_expression = "rate(5 minutes)"
}

Pattern 2: Environment-Based Monitoring

Different collection frequencies for prod vs non-prod:

variable "environment" {
  type    = string
  default = "production"
}

locals {
  schedule_by_env = {
    production = "rate(5 minutes)"   # High frequency for production
    staging    = "rate(15 minutes)"  # Medium frequency for staging
    development = "rate(30 minutes)" # Low frequency for dev
  }

  servers_by_env = {
    production  = ["789012", "789013", "789014"]
    staging     = ["789020"]
    development = ["789030"]
  }
}

module "server_metrics" {
  source   = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  version  = "~> 1.0"
  for_each = toset(local.servers_by_env[var.environment])

  hetzner_cloud_api_token = var.hetzner_api_token

  metric_type = "server"
  name        = "${var.environment}-server-${each.key}"

  data = {
    server_id = each.key
  }

  create_scheduler    = true
  schedule_expression = local.schedule_by_env[var.environment]

  tags = {
    Environment = var.environment
  }
}

Pattern 3: Custom IAM Role with Additional Permissions

Use a custom IAM role with additional permissions (e.g., SNS for alerting):

# Create custom IAM role
resource "aws_iam_role" "custom_sfn_role" {
  name = "hetzner-cloudwatch-custom-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "states.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })
}

# Attach basic CloudWatch permissions
resource "aws_iam_role_policy" "cloudwatch_metrics" {
  role = aws_iam_role.custom_sfn_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "cloudwatch:PutMetricData"
        ]
        Resource = "*"
        Condition = {
          StringEquals = {
            "cloudwatch:namespace" = [
              "HetznerLoadBalancer",
              "HetznerServer"
            ]
          }
        }
      },
      {
        Effect = "Allow"
        Action = [
          "events:RetrieveConnectionCredentials"
        ]
        Resource = module.hetzner_lb_metrics.event_connection_arn
      }
    ]
  })
}

# Attach additional permissions (e.g., SNS for alerting within Step Functions)
resource "aws_iam_role_policy" "sns_publish" {
  role = aws_iam_role.custom_sfn_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "sns:Publish"
        ]
        Resource = aws_sns_topic.hetzner_alerts.arn
      }
    ]
  })
}

# Use the custom role
module "hetzner_lb_metrics" {
  source  = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  version = "~> 1.0"

  hetzner_cloud_api_token = var.hetzner_api_token

  # Use custom IAM role
  create_role = false
  role_arn    = aws_iam_role.custom_sfn_role.arn

  metric_type = "loadbalancer"
  name        = "production-lb-metrics"

  data = {
    loadbalancer_id = "123456"
  }

  create_scheduler    = true
  schedule_expression = "rate(5 minutes)"
}

Pattern 4: Cron-Based Scheduling

Use cron expressions for precise scheduling:

module "hetzner_server_metrics" {
  source  = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  version = "~> 1.0"

  hetzner_cloud_api_token = var.hetzner_api_token

  metric_type = "server"
  name        = "web-01-metrics"

  data = {
    server_id = "789012"
  }

  create_scheduler = true

  # Collect metrics every 5 minutes during business hours (UTC)
  # Monday-Friday, 8 AM - 6 PM UTC
  schedule_expression = "cron(*/5 8-18 ? * MON-FRI *)"

  # Or collect every minute during peak hours
  # schedule_expression = "cron(* 12-14 ? * * *)"

  # Or daily summary at midnight
  # schedule_expression = "cron(0 0 * * ? *)"
}

Creating CloudWatch Dashboards

Unified Hybrid Cloud Dashboard

Create a comprehensive dashboard showing both AWS and Hetzner resources:

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "metrics": [
          ["HetznerLoadBalancer", "RequestsPerSecond", {"stat": "Average", "label": "Hetzner LB Requests"}],
          ["AWS/ApplicationELB", "RequestCount", {"stat": "Sum", "label": "AWS ALB Requests"}]
        ],
        "view": "timeSeries",
        "stacked": false,
        "region": "eu-central-1",
        "title": "Load Balancer Traffic (Hybrid)",
        "period": 300,
        "yAxis": {
          "left": {
            "label": "Requests/sec",
            "showUnits": false
          }
        }
      }
    },
    {
      "type": "metric",
      "properties": {
        "metrics": [
          ["HetznerServer", "CPUUtilization", {"dimensions": {"ServerId": "789012"}, "label": "Hetzner Web-01"}],
          ["HetznerServer", "CPUUtilization", {"dimensions": {"ServerId": "789013"}, "label": "Hetzner Web-02"}],
          ["AWS/EC2", "CPUUtilization", {"dimensions": {"InstanceId": "i-1234567890abcdef0"}, "label": "AWS Web-01"}]
        ],
        "view": "timeSeries",
        "stacked": false,
        "region": "eu-central-1",
        "title": "Server CPU Utilization (All Clouds)",
        "period": 300,
        "yAxis": {
          "left": {
            "min": 0,
            "max": 100,
            "label": "Percent",
            "showUnits": false
          }
        }
      }
    },
    {
      "type": "metric",
      "properties": {
        "metrics": [
          ["HetznerServer", "NetworkIn", {"stat": "Average"}],
          ["HetznerServer", "NetworkOut", {"stat": "Average"}]
        ],
        "view": "timeSeries",
        "stacked": false,
        "region": "eu-central-1",
        "title": "Hetzner Network Bandwidth",
        "period": 300,
        "yAxis": {
          "left": {
            "label": "Bytes/sec",
            "showUnits": false
          }
        }
      }
    },
    {
      "type": "metric",
      "properties": {
        "metrics": [
          ["HetznerLoadBalancer", "OpenConnections", {"stat": "Maximum"}]
        ],
        "view": "singleValue",
        "region": "eu-central-1",
        "title": "Current Connections",
        "period": 300
      }
    }
  ]
}

Save as hetzner-dashboard.json and create:

aws cloudwatch put-dashboard \
  --dashboard-name HybridCloudMonitoring \
  --dashboard-body file://hetzner-dashboard.json

Access the dashboard in AWS Console:

https://console.aws.amazon.com/cloudwatch/home?region=eu-central-1#dashboards:name=HybridCloudMonitoring

Terraform-Managed Dashboard

Alternatively, manage dashboards as code:

resource "aws_cloudwatch_dashboard" "hybrid_monitoring" {
  dashboard_name = "HybridCloudInfrastructure"

  dashboard_body = jsonencode({
    widgets = [
      {
        type = "metric"
        properties = {
          metrics = [
            ["HetznerServer", "CPUUtilization", { dimensions = { ServerId = "789012" } }],
            ["HetznerServer", "CPUUtilization", { dimensions = { ServerId = "789013" } }]
          ]
          view   = "timeSeries"
          region = "eu-central-1"
          title  = "Hetzner Servers CPU"
          period = 300
        }
      },
      {
        type = "metric"
        properties = {
          metrics = [
            ["HetznerLoadBalancer", "RequestsPerSecond"],
            ["HetznerLoadBalancer", "ConnectionsPerSecond"]
          ]
          view   = "timeSeries"
          region = "eu-central-1"
          title  = "Load Balancer Performance"
          period = 300
        }
      }
    ]
  })
}

Setting Up CloudWatch Alarms

High CPU Alarm

Alert when Hetzner server CPU exceeds threshold:

resource "aws_cloudwatch_metric_alarm" "hetzner_high_cpu" {
  alarm_name          = "hetzner-web-01-high-cpu"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "HetznerServer"
  period              = 300
  statistic           = "Average"
  threshold           = 80
  alarm_description   = "Alert when Hetzner server CPU exceeds 80%"
  treat_missing_data  = "notBreaching"

  dimensions = {
    ServerId = "789012"
  }

  alarm_actions = [aws_sns_topic.infrastructure_alerts.arn]
  ok_actions    = [aws_sns_topic.infrastructure_alerts.arn]
}

Load Balancer Connection Spike

Detect unusual connection spikes:

resource "aws_cloudwatch_metric_alarm" "lb_connection_spike" {
  alarm_name          = "hetzner-lb-connection-spike"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "ConnectionsPerSecond"
  namespace           = "HetznerLoadBalancer"
  period              = 300
  statistic           = "Average"
  threshold           = 1000
  alarm_description   = "Alert on unusual connection spike (potential DDoS)"

  dimensions = {
    LoadBalancerId = "123456"
  }

  alarm_actions = [aws_sns_topic.security_alerts.arn]
}

Disk I/O Saturation

Monitor disk performance bottlenecks:

resource "aws_cloudwatch_metric_alarm" "high_disk_iops" {
  alarm_name          = "hetzner-server-high-disk-iops"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "DiskWriteIOPS"
  namespace           = "HetznerServer"
  period              = 300
  statistic           = "Average"
  threshold           = 5000
  alarm_description   = "Alert when disk write IOPS exceeds capacity"

  dimensions = {
    ServerId = "789012"
  }

  alarm_actions = [aws_sns_topic.infrastructure_alerts.arn]
}

Composite Alarm: Server Health

Combine multiple conditions:

resource "aws_cloudwatch_composite_alarm" "server_health" {
  alarm_name          = "hetzner-web-01-health"
  alarm_description   = "Composite alarm for overall server health"
  actions_enabled     = true
  alarm_actions       = [aws_sns_topic.critical_alerts.arn]

  alarm_rule = "ALARM(${aws_cloudwatch_metric_alarm.hetzner_high_cpu.alarm_name}) OR ALARM(${aws_cloudwatch_metric_alarm.high_disk_iops.alarm_name})"
}

Monitoring Step Functions Execution

Track Integration Health

Monitor the metric collection pipeline itself:

# Alarm on Step Functions execution failures
resource "aws_cloudwatch_metric_alarm" "sfn_execution_failed" {
  alarm_name          = "hetzner-metrics-collection-failure"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "ExecutionsFailed"
  namespace           = "AWS/States"
  period              = 300
  statistic           = "Sum"
  threshold           = 0
  alarm_description   = "Alert when metric collection fails"

  dimensions = {
    StateMachineArn = module.hetzner_lb_metrics.state_machine_arn
  }

  alarm_actions = [aws_sns_topic.infrastructure_alerts.arn]
}

# Monitor execution duration
resource "aws_cloudwatch_metric_alarm" "sfn_slow_execution" {
  alarm_name          = "hetzner-metrics-slow-collection"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "ExecutionTime"
  namespace           = "AWS/States"
  period              = 300
  statistic           = "Average"
  threshold           = 10000  # 10 seconds
  alarm_description   = "Alert when metric collection is slow"

  dimensions = {
    StateMachineArn = module.hetzner_server_metrics.state_machine_arn
  }

  alarm_actions = [aws_sns_topic.infrastructure_alerts.arn]
}

Query Step Functions Logs

Check CloudWatch Logs for execution details:

# Get the log group for Step Functions
LOG_GROUP="/aws/vendedlogs/states/hetzner-metrics"

# Query recent executions
aws logs filter-log-events \
  --log-group-name "$LOG_GROUP" \
  --start-time $(date -d '1 hour ago' +%s)000 \
  --filter-pattern "{ $.type = \"ExecutionSucceeded\" }"

# Query failed executions
aws logs filter-log-events \
  --log-group-name "$LOG_GROUP" \
  --filter-pattern "{ $.type = \"ExecutionFailed\" }" \
  --max-items 10

Cost Analysis

Pricing Breakdown

Components and Monthly Costs (us-east-1 pricing, 5-minute intervals):

1. Step Functions State Transitions

Executions per month: (60 min / 5 min) × 24 hours × 30 days = 8,640 executions
Cost: 8,640 × $0.000025 = $0.22/month per resource

2. CloudWatch Custom Metrics

Metrics per resource:
- Load Balancer: 5 metrics
- Server: 9 metrics

Cost per metric: $0.30/month
Load Balancer cost: 5 × $0.30 = $1.50/month
Server cost: 9 × $0.30 = $2.70/month

3. CloudWatch API Requests

PutMetricData calls: 8,640 executions × 1 API call = 8,640 requests
First 1M requests free, so: $0/month

4. AWS Secrets Manager (via EventBridge Connection)

1 secret (Hetzner API token): $0.40/month
API calls: 8,640 × $0.05 per 10,000 = $0.04/month
Total Secrets Manager: $0.44/month

5. EventBridge Scheduler

Invocations: 8,640/month
First 14M invocations free, so: $0/month

Total Monthly Cost per Resource:

Load Balancer: $0.22 (SFN) + $1.50 (CW) + $0.44 (Secrets) = $2.16/month
Server: $0.22 (SFN) + $2.70 (CW) + $0.44 (Secrets) = $3.36/month

Cost Sharing: When monitoring multiple resources with a shared EventBridge Connection, the $0.44 Secrets Manager cost is shared, reducing per-resource costs.

Example: 10 Servers + 2 Load Balancers

Step Functions: 12 × $0.22 = $2.64
CloudWatch Metrics: (2 × $1.50) + (10 × $2.70) = $30.00
Secrets Manager: $0.44 (shared)
Total: $33.08/month

Cost Optimization Strategies

1. Adjust Collection Frequency

Non-critical resources don’t need 5-minute granularity:

# Production: 5-minute intervals
schedule_expression = "rate(5 minutes)"   # $2.16/month per LB

# Development: 30-minute intervals
schedule_expression = "rate(30 minutes)"  # $0.36/month per LB
# Savings: 83%

2. Selective Metric Collection

Modify the Step Functions state machine to publish only critical metrics (requires custom state machine).

3. Use CloudWatch Metric Retention

Configure shorter retention for non-production metrics:

# Set 7-day retention for dev environment
aws logs put-retention-policy \
  --log-group-name /aws/vendedlogs/states/dev-hetzner-metrics \
  --retention-in-days 7

4. Schedule Collection Hours

Only collect metrics during business hours for non-production:

# Only Mon-Fri, 8 AM - 6 PM UTC
schedule_expression = "cron(*/5 8-18 ? * MON-FRI *)"
# Reduces executions from 8,640 to ~2,400/month
# Savings: ~72%

Security Best Practices

1. API Token Management

Use Read-Only Tokens:

# In Hetzner Cloud Console, create tokens with Read permissions only
# This prevents accidental infrastructure changes via compromised tokens

Rotate Tokens Regularly:

# Generate new token in Hetzner
NEW_TOKEN="new-token-value"

# Update Terraform variable
terraform apply -var="hetzner_api_token=$NEW_TOKEN"

# Old token is automatically replaced in Secrets Manager

Monitor Token Usage:

# CloudWatch alarm for authentication failures
resource "aws_cloudwatch_log_metric_filter" "auth_failures" {
  name           = "hetzner-api-auth-failures"
  log_group_name = "/aws/vendedlogs/states/hetzner-metrics"
  pattern        = "[... , status_code = 401, ...]"

  metric_transformation {
    name      = "HetznerAPIAuthFailure"
    namespace = "HetznerIntegration"
    value     = "1"
  }
}

resource "aws_cloudwatch_metric_alarm" "auth_failure_alert" {
  alarm_name          = "hetzner-api-unauthorized"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "HetznerAPIAuthFailure"
  namespace           = "HetznerIntegration"
  period              = 300
  statistic           = "Sum"
  threshold           = 0
  alarm_description   = "Alert on Hetzner API authentication failures (token may be expired)"

  alarm_actions = [aws_sns_topic.security_alerts.arn]
}

2. IAM Least Privilege

Review and restrict IAM permissions:

# Restrict CloudWatch namespace
{
  "Effect": "Allow",
  "Action": "cloudwatch:PutMetricData",
  "Resource": "*",
  "Condition": {
    "StringEquals": {
      "cloudwatch:namespace": [
        "HetznerLoadBalancer",
        "HetznerServer"
      ]
    }
  }
}

# Restrict EventBridge Connection access
{
  "Effect": "Allow",
  "Action": "events:RetrieveConnectionCredentials",
  "Resource": "arn:aws:events:eu-central-1:123456789012:connection/hetzner-api/*"
}

3. Network Security

VPC Endpoints (for Step Functions in VPC):

# Use VPC endpoints to avoid internet egress
resource "aws_vpc_endpoint" "cloudwatch" {
  vpc_id            = var.vpc_id
  service_name      = "com.amazonaws.eu-central-1.monitoring"
  vpc_endpoint_type = "Interface"

  subnet_ids         = var.private_subnet_ids
  security_group_ids = [aws_security_group.vpc_endpoints.id]
}

API Rate Limiting:

Hetzner Cloud API has rate limits (3600 requests/hour). With 5-minute intervals per resource:

12 requests/hour per resource
Safe limit: ~300 resources per API token

4. Compliance & Audit

Enable CloudTrail for State Machine Execution:

resource "aws_cloudwatch_log_group" "state_machine_logs" {
  name              = "/aws/vendedlogs/states/hetzner-metrics"
  retention_in_days = 90  # Adjust based on compliance requirements

  kms_key_id = aws_kms_key.logs.arn  # Encrypt logs at rest
}

# Update Step Functions to use CloudWatch Logs
# (module should support logging_configuration parameter)

Tag Resources for Compliance:

module "hetzner_lb_metrics" {
  source = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  # ... other config ...

  tags = {
    CostCenter  = "Infrastructure"
    Compliance  = "SOC2"
    DataClass   = "Metrics"
    Owner       = "[email protected]"
    Environment = "production"
  }
}

Troubleshooting Common Issues

Issue 1: No Metrics Appearing in CloudWatch

Symptoms:

CloudWatch namespaces empty
No datapoints when querying metrics

Diagnosis:

# Check Step Functions execution history
aws stepfunctions list-executions \
  --state-machine-arn $(terraform output -raw lb_state_machine_arn) \
  --max-results 5

# Get execution details
EXECUTION_ARN="arn:aws:states:eu-central-1:123456789012:execution:..."
aws stepfunctions describe-execution --execution-arn "$EXECUTION_ARN"

# Check for failed executions
aws stepfunctions list-executions \
  --state-machine-arn $(terraform output -raw lb_state_machine_arn) \
  --status-filter FAILED

Solutions:

Verify Hetzner API Token:

# Test token manually
curl -H "Authorization: Bearer YOUR_TOKEN" \
  https://api.hetzner.cloud/v1/load_balancers/123456/metrics?type=requests&start=2026-02-16T00:00:00Z&end=2026-02-16T01:00:00Z

Check IAM Permissions:
- Ensure Step Functions role has cloudwatch:PutMetricData
- Verify events:RetrieveConnectionCredentials permission

Verify Resource IDs:

# Check if load balancer exists
hcloud load-balancer describe 123456

# Or via API
curl -H "Authorization: Bearer YOUR_TOKEN" \
  https://api.hetzner.cloud/v1/load_balancers/123456

Issue 2: Step Functions Execution Timeout

Symptoms:

Executions showing “TIMED_OUT” status
Partial metrics collected

Diagnosis:

# Check execution duration
aws stepfunctions describe-execution \
  --execution-arn "$EXECUTION_ARN" \
  --query 'startDate,stopDate'

# Review CloudWatch Logs for errors
aws logs filter-log-events \
  --log-group-name /aws/vendedlogs/states/hetzner-metrics \
  --start-time $(date -d '1 hour ago' +%s)000

Solutions:

Increase State Machine Timeout (custom state machine modification needed)

Check Hetzner API Latency:

time curl -H "Authorization: Bearer YOUR_TOKEN" \
  https://api.hetzner.cloud/v1/servers/789012/metrics?type=cpu

Verify Network Connectivity (if Step Functions in VPC):
- Ensure NAT Gateway or VPC endpoints configured
- Check security group rules

Issue 3: High AWS Costs

Symptoms:

Unexpected CloudWatch Metrics charges
High Step Functions execution costs

Diagnosis:

# Count custom metrics
aws cloudwatch list-metrics \
  --namespace HetznerLoadBalancer \
  | jq '.Metrics | length'

aws cloudwatch list-metrics \
  --namespace HetznerServer \
  | jq '.Metrics | length'

# Count Step Functions executions
aws stepfunctions describe-state-machine \
  --state-machine-arn $(terraform output -raw lb_state_machine_arn) \
  --query 'stateMachineArn' \
  | xargs -I {} aws cloudwatch get-metric-statistics \
    --namespace AWS/States \
    --metric-name ExecutionsSucceeded \
    --dimensions Name=StateMachineArn,Value={} \
    --start-time $(date -d '30 days ago' +%Y-%m-%dT%H:%M:%S) \
    --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
    --period 2592000 \
    --statistics Sum

Solutions:

Reduce Collection Frequency:

schedule_expression = "rate(15 minutes)"  # Down from 5 minutes
# Reduces costs by ~67%

Disable Non-Production Monitoring:

create_scheduler = var.environment == "production" ? true : false

Use Cost Allocation Tags:

tags = {
  CostCenter = "infrastructure-monitoring"
}

Then track in AWS Cost Explorer.

Issue 4: Metric Data Gaps

Symptoms:

Intermittent missing datapoints
Irregular metric intervals

Diagnosis:

# Check for failed executions
aws stepfunctions list-executions \
  --state-machine-arn $(terraform output -raw server_state_machine_arn) \
  --status-filter FAILED \
  --max-results 20

# Check EventBridge Scheduler success rate
aws scheduler get-schedule \
  --name "hetzner-server-metrics-schedule"

Solutions:

Enable Step Functions Retry: Modify state machine to add retry logic (custom state machine)

Check Hetzner API Rate Limits:

# Review response headers
curl -I -H "Authorization: Bearer YOUR_TOKEN" \
  https://api.hetzner.cloud/v1/servers/789012/metrics?type=cpu

# Look for: X-RateLimit-Remaining

Monitor EventBridge Scheduler:

resource "aws_cloudwatch_metric_alarm" "scheduler_failed" {
  alarm_name          = "hetzner-scheduler-failures"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "TargetErrorCount"
  namespace           = "AWS/Scheduler"
  period              = 300
  statistic           = "Sum"
  threshold           = 0

  dimensions = {
    ScheduleName = "hetzner-server-metrics-schedule"
  }

  alarm_actions = [aws_sns_topic.infrastructure_alerts.arn]
}

Real-World Use Cases

Use Case 1: Cost-Optimized Frontend Hosting

Scenario: Company uses Hetzner for static asset CDN (lower bandwidth costs) and AWS for dynamic API backend.

Challenge: Need unified monitoring to correlate frontend performance (Hetzner) with backend latency (AWS).

Solution:

# Monitor Hetzner load balancer serving static assets
module "cdn_lb_metrics" {
  source  = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  version = "~> 1.0"

  hetzner_cloud_api_token = var.hetzner_api_token
  metric_type             = "loadbalancer"
  name                    = "cdn-lb-metrics"

  data = {
    loadbalancer_id = var.hetzner_cdn_lb_id
  }

  create_scheduler    = true
  schedule_expression = "rate(1 minute)"  # High frequency for CDN
}

# Create correlation dashboard
resource "aws_cloudwatch_dashboard" "frontend_backend" {
  dashboard_name = "FullStackPerformance"

  dashboard_body = jsonencode({
    widgets = [
      {
        type = "metric"
        properties = {
          metrics = [
            ["HetznerLoadBalancer", "RequestsPerSecond", { label = "CDN Requests (Hetzner)" }],
            ["AWS/ApplicationELB", "RequestCount", { stat = "Sum", label = "API Requests (AWS)" }]
          ]
          title = "Frontend vs Backend Traffic"
        }
      }
    ]
  })
}

Benefits:

Correlate CDN traffic spikes with backend load
Identify caching effectiveness
Unified alerting for full-stack incidents

Use Case 2: Database Server Monitoring

Scenario: PostgreSQL databases hosted on Hetzner dedicated servers for cost savings.

Challenge: Need to monitor database server resources alongside AWS RDS instances.

Solution:

# Monitor Hetzner database servers
module "db_servers" {
  source   = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  version  = "~> 1.0"
  for_each = var.hetzner_db_servers

  hetzner_cloud_api_token = var.hetzner_api_token
  metric_type             = "server"
  name                    = "db-${each.key}"

  data = {
    server_id = each.value
  }

  create_scheduler    = true
  schedule_expression = "rate(1 minute)"  # Database needs high frequency

  tags = {
    Role     = "database"
    DBEngine = "postgresql"
  }
}

# Alert on database server resource saturation
resource "aws_cloudwatch_metric_alarm" "db_cpu_high" {
  for_each = var.hetzner_db_servers

  alarm_name          = "db-${each.key}-cpu-high"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "CPUUtilization"
  namespace           = "HetznerServer"
  period              = 60
  statistic           = "Average"
  threshold           = 85

  dimensions = {
    ServerId = each.value
  }

  alarm_actions = [aws_sns_topic.database_alerts.arn]
}

Benefits:

Unified database monitoring across clouds
Consistent alerting for all database resources
Cost savings vs AWS RDS while maintaining observability

Use Case 3: Multi-Region Failover Monitoring

Scenario: Primary infrastructure in AWS, disaster recovery in Hetzner.

Challenge: Monitor DR resources without maintaining separate monitoring infrastructure.

Solution:

# Monitor DR load balancer in Hetzner
module "dr_lb_metrics" {
  source  = "KamranBiglari/hetzner-cloudwatch-integration/aws"
  version = "~> 1.0"

  hetzner_cloud_api_token = var.hetzner_api_token
  metric_type             = "loadbalancer"
  name                    = "dr-lb-metrics"

  data = {
    loadbalancer_id = var.hetzner_dr_lb_id
  }

  create_scheduler    = true
  schedule_expression = "rate(5 minutes)"
}

# Composite alarm for failover health
resource "aws_cloudwatch_composite_alarm" "dr_readiness" {
  alarm_name    = "disaster-recovery-readiness"
  alarm_actions = [aws_sns_topic.critical_alerts.arn]

  alarm_rule = join(" AND ", [
    "ALARM(hetzner-dr-lb-health)",           # DR LB is healthy
    "NOT ALARM(aws-primary-region-health)"   # Primary is down
  ])
}

Benefits:

Monitor DR resources from primary AWS account
Automated failover readiness checks
Single alerting pipeline for multi-cloud DR

Comparison: Alternative Monitoring Solutions

Solution	Hetzner → CloudWatch (This Module)	Grafana Cloud	Datadog	Prometheus + Thanos
Setup Complexity	Low (Terraform module)	Medium	Low	High
Cost (10 servers)	~$34/month	~$50/month	~$150/month	Self-hosted ($?)
AWS Native Integration	✅ Excellent	⚠️ Limited	✅ Good	⚠️ Limited
Unified Dashboards	✅ CloudWatch	✅ Grafana	✅ Datadog	✅ Grafana
Alerting	CloudWatch Alarms	Grafana Alerts	Datadog Monitors	Alertmanager
Retention	15 months (CW)	13 months	15 months	Configurable
Learning Curve	Low (if familiar with AWS)	Medium	Low	High
Vendor Lock-in	AWS	Grafana Labs	Datadog	None
Multi-Cloud Support	AWS + Hetzner	All clouds	All clouds	All clouds

Conclusion

Integrating Hetzner Cloud metrics into AWS CloudWatch provides a pragmatic solution for hybrid cloud observability without the complexity and cost of third-party monitoring platforms. By leveraging serverless AWS services (Step Functions, EventBridge, CloudWatch), you gain:

Key Benefits:

Unified Observability: Single pane of glass for all infrastructure
Cost Efficiency: ~$2-3/resource/month vs $15-20/resource with commercial tools
AWS Native: Seamless integration with existing CloudWatch dashboards and alarms
Infrastructure as Code: Full Terraform automation for repeatable deployments
Serverless: No monitoring infrastructure to manage or patch
Secure: API tokens in Secrets Manager, least-privilege IAM, encrypted transit

When to Use This Solution:

You have hybrid infrastructure across AWS and Hetzner
You want to leverage existing AWS CloudWatch expertise
You need cost-effective monitoring ($2-3/resource/month)
You prefer infrastructure as code (Terraform)
You want AWS-native alerting and automation

When to Consider Alternatives:

You need monitoring across 3+ cloud providers → Grafana Cloud or Datadog
You require advanced anomaly detection → Datadog or AWS CloudWatch Insights
You have complex custom metric requirements → Prometheus/Thanos
You need real-time alerting (< 1 minute) → Direct Hetzner API integration

This Terraform module solves a real pain point for hybrid cloud teams, enabling centralized monitoring without vendor lock-in to expensive observability platforms.

Additional Resources

GitHub Repository: terraform-aws-hetzner-cloudwatch-integration
Terraform Registry: KamranBiglari/hetzner-cloudwatch-integration/aws
Hetzner Cloud API: docs.hetzner.cloud
AWS Step Functions: docs.aws.amazon.com/step-functions
CloudWatch Custom Metrics: docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html

Have questions or suggestions? Open an issue on GitHub or reach out at [email protected]

Integrate Hetzner Cloud Metrics into AWS CloudWatch with Terraform

The Hybrid Cloud Monitoring Challenge

The Problem

The Solution: Centralized Monitoring in CloudWatch

Architecture Overview

Key Components

Supported Hetzner Metrics

Load Balancer Metrics

Server Metrics

Quick Start: Deploy in 10 Minutes

Prerequisites

Step 1: Generate Hetzner Cloud API Token

Step 2: Get Hetzner Resource IDs

Step 3: Create Terraform Configuration

Step 4: Deploy the Integration

Step 5: Verify Metrics Collection

Advanced Configuration Patterns

Pattern 1: Multi-Server Monitoring with Shared Connection

Pattern 2: Environment-Based Monitoring

Pattern 3: Custom IAM Role with Additional Permissions

Pattern 4: Cron-Based Scheduling

Creating CloudWatch Dashboards

Unified Hybrid Cloud Dashboard

Terraform-Managed Dashboard

Setting Up CloudWatch Alarms

High CPU Alarm

Load Balancer Connection Spike

Disk I/O Saturation

Composite Alarm: Server Health

Monitoring Step Functions Execution

Track Integration Health

Query Step Functions Logs

Cost Analysis

Pricing Breakdown

Cost Optimization Strategies

Security Best Practices

1. API Token Management

2. IAM Least Privilege

3. Network Security

4. Compliance & Audit

Troubleshooting Common Issues

Issue 1: No Metrics Appearing in CloudWatch

Issue 2: Step Functions Execution Timeout

Issue 3: High AWS Costs

Issue 4: Metric Data Gaps

Real-World Use Cases

Use Case 1: Cost-Optimized Frontend Hosting

Use Case 2: Database Server Monitoring

Use Case 3: Multi-Region Failover Monitoring

Comparison: Alternative Monitoring Solutions

Conclusion

Additional Resources

Related Articles

The Translation Layer: Kubernetes on Hetzner Cloud

Kubernetes Health Probes: Liveness, Readiness, and Startup Explained

Kubernetes Controllers: Deployment, StatefulSet, and ReplicaSet Explained