Agentic AI for DevOps

Build AI agents thatrun your infrastructure

The home for learning Agentic AI for DevOps — CI/CD, Kubernetes, incident response, observability, and cloud ops. Hands-on videos, build-along projects, and learning paths that take you from prompt to production.

10
Projects
3
Learning paths
6
Topic areas
What you'll master

Six ways AI is reshaping DevOps

Every pillar comes with videos, projects, and deep dives — so you don't just understand it, you ship it.

🤖

AI Agents for Ops

Build autonomous agents that triage alerts, investigate incidents, and take action on your infrastructure — not just chat about it.

🚀

Agentic CI/CD

Pipelines that fix their own failing builds, review PRs, and patch security bugs. CI/CD that thinks before it ships.

☸️

Kubernetes & IaC

Natural-language kubectl, AI-reviewed Terraform, and agents that diagnose cluster failures and reconcile drift.

🔭

Observability & On-Call

AI on-call engineers that read Prometheus, Grafana, and logs — correlate signals, find root cause, and summarize incidents.

🔐

DevSecOps

AI-driven security scanning, secret detection, and policy-as-code reviews wired straight into your pull requests.

☁️

Cloud & FinOps

Agents that audit AWS spend, right-size resources, and turn cloud cost reports into concrete, automated savings.

Build along

Agentic DevOps projects

Each one is a complete, real-world build you can ship to your own stack.

🚨
Building now

AI SRE — Autonomous Incident Investigator

The agent that finds root cause before you finish your coffee

An AI SRE that watches your alerts, then investigates across metrics, logs, and Kubernetes in an agentic tool-use loop — forming and testing hypotheses until it posts a ranked root-cause analysis with the evidence trail and a suggested fix to Slack. Read-only by design, so a human approves any action.

Mirrors Datadog Bits AI SRE, Resolve AI, Cleric AI

Claude Tool UsePython / FastAPIPrometheusLoki
IntermediateView guide →
🔌
On the roadmap

MCP DevOps Agent — your own “goose for ops”

One agent, every tool — via Model Context Protocol

An MCP-native agent that connects to your real infrastructure through MCP servers (kubectl, AWS, GitHub, Prometheus) and executes ops tasks end-to-end with human approval. The exact pattern Block and Uber run internally at massive scale.

Mirrors Block “codename goose”, Uber MCP platform

ClaudeModel Context Protocolkubectl MCPAWS MCP
IntermediateView guide →
📚
On the roadmap

RAG On-Call Copilot (Slack)

Answers on-call questions from your runbooks, instantly

A retrieval-augmented copilot that ingests your runbooks, wikis, and past incidents, then answers engineers' on-call questions in Slack with cited sources — cutting the “who knows about X?” tax. Uber's version saved ~13,000 engineering hours.

Mirrors Uber Genie, Moveworks

ClaudeEmbeddingsVector DB (pgvector/Qdrant)Slack
BeginnerView guide →
🔍
On the roadmap

Agentic PR Reviewer

An AI reviewer that actually reads your codebase

A GitHub bot that reviews pull requests like a senior engineer — running shell and Python in a sandbox to navigate the diff, trace symbols, and leave inline comments. Not a single prompt: a real code-execution agent.

Mirrors CodeRabbit, Qodo, Greptile

Claude Tool UseGitHub APISandbox (Docker/microVM)Node.js / Python
AdvancedView guide →
🛡️
On the roadmap

AI Security Autofix

Find a vulnerability, ship the fix PR automatically

An agent that runs a SAST scanner (Semgrep/CodeQL), feeds each finding plus the surrounding code-flow to an LLM, and generates a verified fix as a pull request — the architecture GitHub ships to millions of repos.

Mirrors GitHub Copilot Autofix, Wiz, Snyk AI

ClaudeSemgrep / CodeQLGitHub ActionsPython
IntermediateView guide →
🔁
On the roadmap

Self-Healing CI/CD Pipeline

A pipeline that fixes its own broken builds

When a GitHub Actions run fails, an agent reads the logs, reproduces the error, writes a fix, and opens a pull request — closing the loop on flaky builds and trivial breakages, with a human merging.

Mirrors GitHub Copilot coding agent, self-healing DevOps pattern

Claude APIGitHub ActionsNode.jsOctokit
AdvancedView guide →
Don't know where to start?

Follow a learning path

Curated sequences that thread videos, projects, and articles into a clear route — beginner to advanced.

🌱
Beginner

Zero → Agentic DevOps

Start from nothing. Understand the production agentic loop, then ship your first two real agents — an on-call copilot and an incident investigator.

  1. 1readWhat is Agentic AI for DevOps?
  2. 2projectRAG On-Call Copilot (start here)
  3. 3projectAI SRE — Incident Investigator
  4. 4videoWatch: build your first ops agent
1–2 weekends
🚀
Intermediate

AI-Powered CI/CD

Make your pipelines intelligent — agents that review code, patch security bugs, and fix their own failing builds.

  1. 1projectAgentic PR Reviewer
  2. 2projectAI Security Autofix
  3. 3projectSelf-Healing CI/CD Pipeline
~2 weeks
🛡️
Advanced

AI Platform Engineering

The platform track: Kubernetes, Terraform, and cloud cost agents — then orchestrate them all into one autonomous DevOps team.

  1. 1projectKubernetes Copilot
  2. 2projectTerraform Review & Drift Agent
  3. 3projectAI Cloud Cost / FinOps Agent
  4. 4projectCapstone: Multi-Agent Ops Platform
~3 weeks
Read the deep dives

DevOps articles

Long-form, battle-tested write-ups — pulled straight from the blog.

Deploy Three-Tier DevSecOps Kubernetes Project on AWS EKS with ArgoCD, Prometheus, Grafana, Jenkins
DockerAWSKubernetes

Deploy Three-Tier DevSecOps Kubernetes Project on AWS EKS with ArgoCD, Prometheus, Grafana, Jenkins

Imagine a robust, secure, and scalable web application built with cutting-edge DevSecOps practices — now imagine achieving all that with automation and efficiency. In this guide, we take you…

Nov 19, 2024·11 min read
Deploy Vuejs Application on Google Kubernetes Engine (GKE)- Blue Green Deployment
DockerKubernetesDevOps

Deploy Vuejs Application on Google Kubernetes Engine (GKE)- Blue Green Deployment

I will use GitHub Actions to deploy our VueJS project on Google Kubernetes Engine (GKE).

Jan 29, 2024·1 min read
Understanding Kubernetes: A Comprehensive Guide to Container Orchestration
DockerAWSKubernetes

Understanding Kubernetes: A Comprehensive Guide to Container Orchestration

In the realm of modern software development and deployment, managing and orchestrating containerized applications have become essential. Kubernetes, often abbreviated as K8s, has emerged as a leading…

Dec 13, 2023·10 min read
Github Actions CI/CD Pipeline: Deploy Dockeriz Django on AWS EC2 with PostgreSQL, Celery…
DockerAWSKubernetes

Github Actions CI/CD Pipeline: Deploy Dockeriz Django on AWS EC2 with PostgreSQL, Celery…

If you are not a Medium Member and want to read Free, You can read on My Linkedin .

Jan 1, 2025·8 min read
Deploy Django Application on AWS ECS Fargate using GitHub Actions and Terraform, A Complete CI/CD…
DockerAWSKubernetes

Deploy Django Application on AWS ECS Fargate using GitHub Actions and Terraform, A Complete CI/CD…

Learn how to deploy a Dockerized Django application on AWS ECS Fargate effortlessly with GitHub Actions and Terraform in this comprehensive tutorial.

May 6, 2024·6 min read
DevOps Project: CI/CD Through Git, Jenkins and Tomcat
DockerAWSDevOps

DevOps Project: CI/CD Through Git, Jenkins and Tomcat

Before you get started, Make sure you have set up these things.

Nov 26, 2022·4 min read

Get new agentic DevOps builds in your inbox

No spam — just new projects, videos, and deep dives on building AI agents for DevOps.

Subscribe to Newsletter

Get the latest articles and tutorials delivered to your inbox.

We respect your privacy. Unsubscribe at any time.