Which Cloud provider to learn in 2025?
Whenever I start talking about a DevOps roadmap or guiding people entering the cloud-native space, the first hurdle is always learning about the cloud. In 2025, there are many cloud providers, but the basics remain the same. New providers are also rising with their own niche offerings to compete with the bigger players, commonly referred to as Tier 1 cloud providers → AWS, GCP, Azure, and Oracle.
Everything is ultimately deployed on servers. When you own the hardware, it’s called bare metal. When someone else owns the hardware and you access it via APIs while paying for usage time, it’s called Cloud → for example, AWS Cloud.
What is the learning flow for any cloud provider?
There are three main foundational parts in any cloud: Compute, Network, and Storage. These are the building blocks of every cloud provider.
Compute – These are the actual servers with different capacity options. Think of it like buying a laptop: you choose the hardware, operating system, CPU, RAM, etc., and pay accordingly. Similarly, in the cloud, you select compute options through a console or CLI, and you can create, access, and delete these machines.
Network – This is the networking setup you define. At home, you connect multiple devices into a network; on the cloud, you do the same by creating networks, defining subnets (IP ranges), and assigning resources within those networks along with specific access controls.
Storage – Just like buying storage drives (HDD, SSD, etc.), the cloud offers various storage types and policies you can choose from.
Apart from these, IAM (Identity and Access Management) is also a common service across all cloud providers. After this foundation, each provider expands with its own specific services → for example, Azure DevOps, Amazon EKS for Kubernetes, AWS Lambda for serverless, or Google Cloud Run, GKE etc.
Once you learn the core services of a provider, you can go for certifications, crash courses, or project-based learning. Switching to another cloud provider becomes easier because the overall architecture is largely the same.
If you’re entering the cloud-native or DevOps ecosystem, learning cloud is almost mandatory. I recommend starting with AWS, as it has the most learning material and the largest market share. Using AWS (or any cloud) for your applications depends on your required services and budget. You can also explore smaller or sovereign cloud providers to keep data within your country’s jurisdiction or to reduce costs. Ultimately, it depends on your use cases.
In 2025, if you want a quick overview of AWS services, we at Kubesimplify collaborated with Sandip Das (AWS Hero) to launch a 2-hour beginner-friendly AWS crash course. After completing it, you can start building mini projects, dive into more complex architectures, or pursue certifications.
Go check out this amazing AWS crash course TODAY! And don’t forget to share it with your network.
In the end, I’d like to say: learning a cloud provider is almost mandatory and learning by doing is even more important. Do let me know your thoughts on this!
Coming to what I have been doing and working on
I wrote a GPU book that’s available to download for free. If you want to learn about GPUs, GPU sharing techniques, and how to use GPUs with Kai Scheduler on Kubernetes clusters, grab your free copy.
I will be giving a talk and a workshop at Dubai Gitex this month.
I will be delivering a Keynote at KCD Sri Lanka this month.
KCD Delhi has been selected as one of the KCDs for 2026, and I am one of the organizers. Make sure to subscribe for further updates.
I am working on a GitOps Bootcamp on my Hindi YouTube channel.
I am updating my CKA and CKS books and also planning to create one-shot videos for CKA and CKS.
Awesome Reads
Introducing vCluster Auto Nodes — Practical deep dive: This blog introduces vCluster Auto Nodes, a feature that lets each vCluster provision its own worker nodes on demand across cloud, private, or bare-metal environments by integrating Karpenter with the vCluster Platform. The step-by-step guide shows how to set up a GKE cluster, install vCluster Platform, wire up Workload Identity, define NodeProviders via Terraform, and create a vCluster that dynamically spins up and scales down nodes seamlessly.
Azure Kubernetes Service Automatic: Fast and frictionless Kubernetes for all - Microsoft announced the general availability of AKS Automatic, a fully managed and optimized Kubernetes service that simplifies cluster setup, scaling, security, and operations with intelligent defaults and automation. It delivers production-ready clusters out of the box, integrates autoscaling and security best practices, and allows teams—from startups to enterprises—to focus on applications rather than infrastructure.
DeepSeek-V3.2-Exp in vLLM: Fine-Grained Sparse Attention in Action - The vLLM team announced Day 0 support for DeepSeek-V3.2-Exp, introducing DeepSeek Sparse Attention (DSA) with new CUDA kernels, FP8 KV cache handling, and out-of-the-box Blackwell GPU support to enable efficient long-context inference. This release highlights usage guides, batching challenges, performance optimizations, and ongoing work to expand support across hardware platforms and large-scale deployments.
Introducing Tunix: A JAX-Native Library for LLM Post-Training - Google has introduced Tunix, an open-source JAX-native library that simplifies post-training for large language models by offering supervised fine-tuning, preference tuning, reinforcement learning, and knowledge distillation in one unified toolkit. Built for TPUs and integrated with MaxText, Tunix empowers researchers and developers with high performance, full customizability, and demonstrated accuracy gains on benchmarks like GSM8K.
Practical Guide to Kueue and Custom Compute Classes - This blog explains how to build a cost-optimized AI platform on Google Kubernetes Engine (GKE) by combining Kueue, a Kubernetes-native job queuing system, with Custom Compute Classes (CCC). Together, they let you automatically queue AI training jobs, prioritize cheaper Spot VMs, and seamlessly fall back to on-demand instances, ensuring efficient scheduling, high utilization, and reduced GPU costs.
Awesome Repos/Learning Resources
RAG-Anything - RAG-Anything: All-in-One RAG Framework
Catwalk - A collection of LLM inference providers and models
If you like this edition, please subscribe for free and share in your network.

