Filip Gajic

Blog

Guides

About

Monitoring Everything: Setting Up Prometheus, Grafana, and Loki with Terraform

Introduction

Modern infrastructure demands comprehensive observability through metrics, logs, and alerts. This guide explores implementing a complete monitoring stack using Prometheus, Grafana, and Loki, orchestrated through Infrastructure as Code with Terraform. The architecture enables data-driven operations while maintaining scalability and reliability across cloud-native environments.

Core Components

Prometheus Architecture

Prometheus implements a pull-based metrics collection system utilizing time-series data storage:

Key Features

Service discovery integration
PromQL query language
Alert management
Target scraping
Data retention policies

Technical Implementation


_17# Prometheus Terraform configuration
_17resource "helm_release" "prometheus" {
_17  name       = "prometheus"
_17  repository = "https://prometheus-community.github.io/helm-charts"
_17  chart      = "prometheus"
_17  namespace  = "monitoring"
_17
_17  set {
_17    name  = "server.persistentVolume.size"
_17    value = "50Gi"
_17  }
_17
_17  set {
_17    name  = "server.retention"
_17    value = "15d"
_17  }
_17}

Grafana Deployment

Grafana provides visualization and analytics capabilities through:

Core Functions

Dashboard management
Data source integration
Alert configuration
User authentication
Plugin ecosystem

Implementation Details


_17# Grafana Terraform setup
_17resource "helm_release" "grafana" {
_17  name       = "grafana"
_17  repository = "https://grafana.github.io/helm-charts"
_17  chart      = "grafana"
_17  namespace  = "monitoring"
_17
_17  set {
_17    name  = "persistence.enabled"
_17    value = "true"
_17  }
_17
_17  set {
_17    name  = "adminPassword"
_17    value = var.grafana_admin_password
_17  }
_17}

Loki Configuration

Loki implements log aggregation with label-based indexing:

Technical Components

Log streaming
Label indexing
Query processing
Storage optimization
Retention management

Deployment Specification


_17# Loki setup with Terraform
_17resource "helm_release" "loki" {
_17  name       = "loki"
_17  repository = "https://grafana.github.io/helm-charts"
_17  chart      = "loki-stack"
_17  namespace  = "monitoring"
_17
_17  set {
_17    name  = "persistence.enabled"
_17    value = "true"
_17  }
_17
_17  set {
_17    name  = "loki.auth_enabled"
_17    value = "true"
_17  }
_17}

Kubernetes Integration

Service Discovery

Kubernetes service discovery enables automatic target detection:

Implementation


_13# prometheus-configmap.yaml
_13apiVersion: v1
_13kind: ConfigMap
_13metadata:
_13  name: prometheus-config
_13data:
_13  prometheus.yml: |
_13    kubernetes_sd_configs:
_13      - role: pod
_13        namespaces:
_13          names:
_13            - default
_13            - production

Resource Monitoring

Monitor container and node metrics through:

Key Metrics

CPU utilization
Memory consumption
Network traffic
Disk operations
Pod health status

Alert Configuration

Alert Rules

Define meaningful alert thresholds:


_11# prometheus-rules.yaml
_11groups:
_11- name: kubernetes
_11  rules:
_11  - alert: HighCPUUsage
_11    expr: container_cpu_usage_seconds_total > 0.8
_11    for: 5m
_11    labels:
_11      severity: warning
_11    annotations:
_11      description: "Container CPU usage exceeding 80%"

Alert Routing

Configure notification channels through:

Email integration
Slack notifications
PagerDuty alerts
Custom webhooks
OpsGenie integration

Dashboard Implementation

Resource Visualization

Create actionable dashboards displaying:

System Metrics

Infrastructure utilization
Application performance
Error rates
Latency measurements
Throughput statistics