My new book is out | 13 Kubernetes hacks you didn't know you needed
I am super excited to announce that I recently got a chance to collaborate with Daniele from LearnKube(formerly Learnk8s) who is my long time friend on a new book. Currently this is published for free as an ebook that you can download from free.
It goes through the GPU concepts, AI infrastructure challenges, and Kubernetes orchestration realities that every engineer eventually runs into. The book starts from the basics of how GPUs differ from CPUs and then walks all the way up to real-world multi-tenant GPU clusters. Along the way, we unpack both the theory and the production trade-offs that make this problem so hard.
Here’s what the book covers:
🌐 Foundations: How GPUs interact with the Linux kernel (syscalls, cgroups, namespaces) and why these assumptions don’t hold for GPUs.
⚙️ GPU & Kubernetes: Why GPU workloads break traditional isolation, scheduling, and resource enforcement in Kubernetes.
🔀 Sharing Strategies: Deep dives into GPU scheduling models, time-slicing, MIG, HAMi, vGPU, and KAI Scheduler — what works, what doesn’t, and why.
📊 Monitoring & Metrics: Why common tools like nvidia-smi and Kubernetes metrics mislead you, and how DCGM and smarter observability reveal true utilization.
🏗 Multi-Tenancy Approaches: Exploring security boundaries, namespace control, and the trust spectrum when running GPUs across teams.
☁️ Virtual Clusters (vCluster): A practical architecture and demo showing how vCluster enables GPU sharing across teams without chaos.
I recently had the chance to attend the ContainerDays conference in Hamburg, with around 1,200 people eager to learn and share their knowledge. It was a fun event filled with great conversations.
The two main highlights were → my talk(which was fun, full room and highly rated), and people coming up to appreciate Kubesimplify. These moments truly made the trip worth it.
I always try to keep learning new things in the cloud-native field and the latest technology trends. That’s why I’ve been diving back into AI → not so much from the foundational algorithms perspective, but from how it has evolved, the broader landscape, and origins like transformers. From there, I explore more from the infrastructure perspective, which is my domain. Drawing from my experience in the Kubernetes space, I decided to put together 13 practical Kubernetes hacks that are actually useful for people working in enterprises.
Kubesimplify updates
AWS Course to be released this month
Operator course to be released next month
Where I will be heading next?
Dubai Gitex
KCD Srilanka
Kubecon NA
Awesome Reads
Building VM golden images with Packer - The new Packer plugin for KubeVirt lets you build and automate VM golden images directly inside a Kubernetes cluster, eliminating the need for local virtualization tools and manual setup. With ISO-based installation, Kickstart automation, and integrated provisioning, it produces reusable bootable volumes that can be versioned, shared, and plugged into CI/CD pipelines.
vLLM Semantic Router: Next Phase in LLM inference - The vLLM Semantic Router introduces an open-source, intent-aware routing layer that classifies queries and directs them to either lightweight or reasoning-enabled inference paths, balancing cost, latency, and accuracy. Built on Rust with cloud-native Kubernetes and Envoy integration, it delivers up to 50% lower latency and token usage while improving accuracy, marking a shift toward smarter, selective LLM inference.
Kubernetes v1.34: Use An Init Container To Define App Environment Variables - Kubernetes v1.34 introduces an alpha feature that lets you define application environment variables using an init container and a file stored in an emptyDir volume, removing the need for ConfigMaps, Secrets, or mounted files. This simplifies configuration for workloads, though care must be taken with sensitive data since it resides in pod-level storage accessible from the node filesystem.
Kubernetes Primer: Dynamic Resource Allocation (DRA) for GPU Workloads - Kubernetes’ new Dynamic Resource Allocation (DRA) framework addresses the limitations of the old device plugin model by enabling GPU sharing, fractional allocation, and attribute-based scheduling. Using concepts like DeviceClass, ResourceClaim, and ResourceSlice, DRA brings flexibility similar to Kubernetes storage provisioning, making it a game-changer for AI, ML, and HPC workloads.
Why Your PostgreSQL Queries Are 10x Slower on Kubernetes (and How to Fix It) - PostgreSQL often runs up to 10x slower on Kubernetes due to hidden issues like network virtualization overhead, restrictive resource limits, storage I/O bottlenecks, misconfigured connection pooling, and default PostgreSQL settings not suited for containers. By tuning resources, using high-performance storage, optimizing networking, deploying PgBouncer, and adjusting PostgreSQL configs, teams can restore — and even improve — database performance in Kubernetes environments.
My Experiments with NotebookLM for Teaching - The article explores how NotebookLM can be used as a teaching companion to make learning more interactive and personalized for students, especially in the K–10 age group. By combining NotebookLM with tools like NapkinAI, CrosswordLabs, Gemini Storybooks, and Google AI Studio, the author demonstrates creative ways to generate timelines, crosswords, flashcards, and bilingual study guides that save educators time while engaging learners.
Awesome Resources/Repos
agent-sandbox : agent-sandbox enables easy management of isolated, stateful, singleton workloads, ideal for use cases like AI agent runtimes.
AIDev Europe: Sarcastically Speaking: Unlocking Multimodal Sentiment Analysis w/ NLP+Facial Landmark
opsmate - AI SRE Agent
Learn from X
https://x.com/ihteshamit/status/1966211223030202781
https://x.com/googleaidevs/status/1966629874246074431
If you like this episode then please do subscribe and share with your friends.




Another loaded newsletter. Looking forward to the GitOps course.