Blog

Writing about DevOps, platform engineering, Web3, and cloud infrastructure.

7 articles

How We Cut GPU Training Costs by 80% with Karpenter on EKS

2026-03-15

How We Cut GPU Training Costs by 80% with Karpenter on EKS

Building on-demand GPU infrastructure for distributed AI training on blockchain — and making it cost-efficient with Karpenter auto-scaling.

AWSEKSKarpenter
Read more
Zero-Trust Access for Exposed Services on ECS Fargate

2026-02-20

Zero-Trust Access for Exposed Services on ECS Fargate

How I secured publicly exposed Prometheus, Grafana, and pgAdmin across 4 environments using Tailscale and AWS WAF.

SecurityECSAWS
Read more
Grafana OIDC with DefGuard: RBAC-to-Org Mapping at Scale

2026-01-10

Grafana OIDC with DefGuard: RBAC-to-Org Mapping at Scale

Setting up Grafana authentication via DefGuard OIDC with automatic RBAC-to-organization mapping and multi-tier AlertManager routing.

GrafanaObservabilityOIDC
Read more
EKS Cost Optimization: From Over-Provisioned Clusters to FinOps-First Infrastructure

2025-11-05

EKS Cost Optimization: From Over-Provisioned Clusters to FinOps-First Infrastructure

A practical guide to rightsizing EKS workloads, implementing Karpenter, and reducing AWS spend by 30% without touching application code.

AWSEKSFinOps
Read more
Operating a PaaS at Scale: Lessons from Globo.com's TSURU

2025-09-18

Operating a PaaS at Scale: Lessons from Globo.com's TSURU

What it's like to operate an internal developer platform serving hundreds of engineers at Brazil's largest media company — on EKS and GKE simultaneously.

Platform EngineeringKubernetesGKE
Read more
Building a Production-Ready Kubernetes Platform

2024-12-15

Building a Production-Ready Kubernetes Platform

A deep dive into designing and operating a multi-tenant Kubernetes platform with GitOps, observability, and self-service capabilities.

KubernetesPlatform EngineeringGitOps
Read more
Terraform at Scale: Lessons from Managing 500+ Resources

2024-11-02

Terraform at Scale: Lessons from Managing 500+ Resources

Practical patterns for structuring Terraform codebases, managing state, and keeping drift under control in large-scale cloud environments.

TerraformIaCAWS
Read more