Monitoring Everything: Setting Up Prometheus, Grafana, and Loki with Terraform

Introduction

Modern infrastructure demands comprehensive observability through metrics, logs, and alerts. This guide explores implementing a complete monitoring stack using Prometheus, Grafana, and Loki, orchestrated through Infrastructure as Code with Terraform. The architecture enables data-driven operations while maintaining scalability and reliability across cloud-native environments.

Core Components

Prometheus Architecture

Prometheus implements a pull-based metrics collection system utilizing time-series data storage:

Key Features

  • Service discovery integration
  • PromQL query language
  • Alert management
  • Target scraping
  • Data retention policies

Technical Implementation


_17
# Prometheus Terraform configuration
_17
resource "helm_release" "prometheus" {
_17
name = "prometheus"
_17
repository = "https://prometheus-community.github.io/helm-charts"
_17
chart = "prometheus"
_17
namespace = "monitoring"
_17
_17
set {
_17
name = "server.persistentVolume.size"
_17
value = "50Gi"
_17
}
_17
_17
set {
_17
name = "server.retention"
_17
value = "15d"
_17
}
_17
}

Grafana Deployment

Grafana provides visualization and analytics capabilities through:

Core Functions

  • Dashboard management
  • Data source integration
  • Alert configuration
  • User authentication
  • Plugin ecosystem

Implementation Details


_17
# Grafana Terraform setup
_17
resource "helm_release" "grafana" {
_17
name = "grafana"
_17
repository = "https://grafana.github.io/helm-charts"
_17
chart = "grafana"
_17
namespace = "monitoring"
_17
_17
set {
_17
name = "persistence.enabled"
_17
value = "true"
_17
}
_17
_17
set {
_17
name = "adminPassword"
_17
value = var.grafana_admin_password
_17
}
_17
}

Loki Configuration

Loki implements log aggregation with label-based indexing:

Technical Components

  • Log streaming
  • Label indexing
  • Query processing
  • Storage optimization
  • Retention management

Deployment Specification


_17
# Loki setup with Terraform
_17
resource "helm_release" "loki" {
_17
name = "loki"
_17
repository = "https://grafana.github.io/helm-charts"
_17
chart = "loki-stack"
_17
namespace = "monitoring"
_17
_17
set {
_17
name = "persistence.enabled"
_17
value = "true"
_17
}
_17
_17
set {
_17
name = "loki.auth_enabled"
_17
value = "true"
_17
}
_17
}

Kubernetes Integration

Service Discovery

Kubernetes service discovery enables automatic target detection:

Implementation


_13
# prometheus-configmap.yaml
_13
apiVersion: v1
_13
kind: ConfigMap
_13
metadata:
_13
name: prometheus-config
_13
data:
_13
prometheus.yml: |
_13
kubernetes_sd_configs:
_13
- role: pod
_13
namespaces:
_13
names:
_13
- default
_13
- production

Resource Monitoring

Monitor container and node metrics through:

Key Metrics

  • CPU utilization
  • Memory consumption
  • Network traffic
  • Disk operations
  • Pod health status

Alert Configuration

Alert Rules

Define meaningful alert thresholds:


_11
# prometheus-rules.yaml
_11
groups:
_11
- name: kubernetes
_11
rules:
_11
- alert: HighCPUUsage
_11
expr: container_cpu_usage_seconds_total > 0.8
_11
for: 5m
_11
labels:
_11
severity: warning
_11
annotations:
_11
description: "Container CPU usage exceeding 80%"

Alert Routing

Configure notification channels through:

  • Email integration
  • Slack notifications
  • PagerDuty alerts
  • Custom webhooks
  • OpsGenie integration

Dashboard Implementation

Resource Visualization

Create actionable dashboards displaying:

System Metrics

  • Infrastructure utilization
  • Application performance
  • Error rates
  • Latency measurements
  • Throughput statistics

Example Dashboard Query