Infrastructure and observability for AI Agents?

Aug 02, 2025

You’ve probably used tools like ChatGPT, Grok, or Gemini to ask questions, write content, or get coding help. These tools are called language models. They’re smart at having conversations, answering questions, and giving useful suggestions. But they only work when you ask them something and then they wait for your input.

Now imagine something more powerful. What if the AI didn’t just talk, but actually did things for you? That’s what an AI agent does.

An AI agent is like a personal assistant that not only understands your request but also takes action on its own. It can plan, make decisions, use tools like apps or APIs, and even learn from past experiences. It doesn't just reply, it acts.

Let’s say you want to plan a trip. If you use ChatGPT, it might give you travel tips and links. But an AI agent could go a step further. It can check flight prices, compare hotel options, book the best one, and send you reminders all without you doing each step manually. That’s the difference.

So yes, AI agents use language models like ChatGPT, but they add something more. They also use memory to remember things, tools to get work done, and planning to figure out the steps to reach a goal. This combination → LLM plus memory, tools, and planning is what makes an AI agent so powerful.

Many people wonder, “Isn’t that just tool calling with ChatGPT?” And that’s a good question. In fact, modern ChatGPT can use tools when you enable function calling or plugins. But it still doesn’t plan ahead or remember across sessions by default. It needs a full agent framework around it to become a real AI agent.

You can build these frameworks using platforms like LangChain, OpenAgents, or Google’s Agent Builder. These help connect memory, tools, and actions so that the language model becomes a true AI agent.

Now let’s talk about MCP. MCP stands for Model Connect Platform. It’s not an agent by itself. Instead, it helps developers connect different AI models and tools easily. Think of MCP as the wiring system that helps your AI agent plug into useful tools. So, if you want to build an agent that books hotels, MCP can help connect your agent to the right APIs and services.

In short, MCP supports agents by making it easy to hook into the tools they need, but it’s not the brain or the planner it’s more like the toolbox.

To develop an AI agent, you usually start with a language model like GPT or Claude. Then, you add memory so it can remember things. You give it tools maybe APIs, a calculator, or the ability to look things up. You also give it the ability to plan and take steps toward a goal.

The end result is something that doesn’t just reply to your message it understands what you want, figures out how to help, and gets it done.

AI agents are becoming more common across industries. From customer support to travel planning, writing, coding, and even security, they’re helping people save time and reduce effort. They're not the future → they're already here.

So the next time you’re chatting with an AI, ask yourself: is this just a conversation, or could it become an agent?

Now, here’s where things get tricky.

Once you build these smart agents and run them on platforms like Kubernetes, observability becomes one of the biggest challenges. It's no longer just about whether your app is running or your CPU is peaking. AI agents are unpredictable. They can choose different paths, use different tools, or even interact with other agents. You can’t debug them with basic metrics alone.

Even if you have traditional logging and monitoring in place say, Prometheus and an ELK stack it’s not always enough. You may know that something went wrong, but not why it happened. AI agents can fail silently, make the wrong call, or overload your system if you don’t have visibility into their decisions and actions.

That’s why the industry is shifting focus toward agent observability. Projects like OpenTelemetry are defining new standards to make this easier. They're creating common ways to capture traces, logs, metrics, and events specifically for generative AI and agents.

Without this level of observability, you’re operating blind. And if your agent causes a cascading failure in a shared Kubernetes cluster, it might not be just one user impacted it could bring down an entire platform.

I’ll be sharing a bit more on this at KubeCon India 2025:

Keynote: From Outage to Observability – Lessons From a Kubernetes Meltdown

🗓 Thursday, August 7, 2025
🕤 9:34 AM IST
📍 Hall 3
🎙 Speakers: Saiyam Pathak (LoftLabs) & Arnab Chatterjee (Nomura)

Now lets move on to the stuff I have been reading.

Awesome Reads

Kubernetes v1.34 Sneak Peek - Kubernetes v1.34, expected in August 2025, brings significant enhancements like stable Dynamic Resource Allocation (DRA) for GPUs and hardware, production-ready tracing for kubelet and API Server, and KYAML, a safer Kubernetes-specific YAML format. Other highlights include improved ServiceAccount-based image pulls, fine-grained HPA scaling tolerance, and Deployment pod replacement policies, all aimed at making Kubernetes more scalable, secure, and developer-friendly.
Docker MCP Catalog: Finding the Right AI Tools for Your Project - As large language models evolve into action-driven AI agents, Docker’s new MCP Catalog and Toolkit simplifies how developers discover, run, and connect secure MCP-compatible tools like Redis, Grafana, and Jira directly from Docker Desktop. This centralized, containerized approach removes the messy manual configurations and fragmented discovery process, enabling faster, safer, and more scalable AI agent development.
Celebrating 10 years of GKE: Incredible customer journeys, amazing AI futures - Google Kubernetes Engine (GKE) celebrates 10 years of powering scalable, secure, and innovative workloads—from supporting Pokémon GO’s explosive growth to enabling enterprise AI platforms like Signify and Autopilot-driven AI ops. As AI transforms cloud-native development, GKE continues to evolve as the backbone for modern, intelligent applications with built-in flexibility, performance, and cost efficiency.
vCluster: The Performance Paradox – How Virtual Clusters Save Millions Without Sacrificing Speed - vCluster delivers massive cost savings for Kubernetes multi-tenancy by replacing dozens of full clusters with lightweight virtual clusters that share a single host, enabling better resource utilization, faster provisioning, and shared infrastructure. Despite its lightweight architecture, vCluster maintains strong performance through intelligent syncing, API server rate limiting, and native networking—solving the "performance paradox" where efficiency doesn’t come at the cost of speed.
Building Serverless Functions on Kubernetes using Knative - This blog explains how Knative brings true serverless capabilities—such as auto-scaling (even to zero), traffic splitting, and event-driven execution—to Kubernetes, enabling developers to deploy lightweight functions without managing underlying infrastructure. It walks through setting up Knative, its core components (Serving, Eventing, and Functions), how it compares to other FaaS platforms like AWS Lambda, and demonstrates deploying a Go-based function with autoscaling behavior on a Kubernetes cluster.

Awesome Repo’s and links

Vimo - Vimo: Chat with Your Videos
LLM Reasoning slides
karpenter-provider-gcp

Awesome X reads

https://x.com/Saboo_Shubham_/status/1950744989115588749
https://x.com/pushmeet/status/1950559524277878933
https://x.com/Alibaba_Qwen/status/1948406830688018471

Cloud native with Saiyam

Discussion about this post