Building a Kubernetes learning platform meant solving multi-tenancy: isolation, security, and resource fairness. Here's why I tried vCluster, abandoned it despite its brilliance, and landed on Kubevirt virtualization.
Multi-tenancy in Kubernetes presents various complex challenges, including security, fairness, and resource allocation. This blog discusses the challenges associated with multi-tenancy and the technology choices made for a Kubernetes-based learning platform called Labs4grabs.io. I will explore the requirements, benefits, and drawbacks of two key technologies: vCluster and Kubevirt. These technologies were experimented with during the development of the backend for Labs4grabs.io. I will also explain why I ultimately decided to completely abandon vCluster, despite its brilliance.
My platform is a Kubernetes learning platform that aims to simulate real-world problems in a lab environment. However, people are required to research and solve problems on their own with minimal guidance, aside from a brief description and a few hints per lab.
The lab content is based on real issues I encountered during my work as a Kubernetes engineer for Berops, as well as my previous experience in the field.
Challenges are started directly from Slack.
Kubernetes multi-tenancy is like managing an apartment building where different tenants share space. Each tenant needs their own space like bathroom, kitchen and bedroom and energies such as water, gas, electricity. But the most important thing is that the apartment tenants cannot access others energies or space. Also each tenant would be deeply disturbed if other tenants accessed their personal space. That means reduced quality of life for other tenants.
The same goes for Kubernetes tenants, they cannot touch each others resources, network bandwidth and so on. That would mean reduced quality of life for people wanting to improve their Kubernetes skill on my platform. Additionally, there’s other, most important component different from apartment tenants, and that’s the host system.
It would be the biggest disaster if tenants in a Kubernetes environment were able to break out of tenant environment and freely access the host system, affecting other tenants, using entire cluster compute power to mine crypto or else. That’s something that I was the most worried about when choosing the correct multi-tenancy technology when engineering Labs4grabs.io tenant environments.
💡 Note on host platform: I have chosen Kubernetes as a platform to host the tenant environments. Kubernetes has scheduling, networking, storage management, security features, observability tooling, and most importantly it has great community behind any question that I might have.

Host Kubernetes cluster with tenants deployed on worker.
When choosing and testing the right solution, there were a few factors that I had to consider:

Host Kubernetes cluster where a student cannot access the control plane of the host cluster directly but can access their own claimed environment freely. The easiest form of multi-tenancy would be to provision a new Kubernetes user for each student, providing them with their own certificates and keys to access the host cluster namespace. This solution is simple but also poses significant risks. It would require students to access their labs environment through the kube API server of the host cluster. However, this approach could lead to issues such as students creating an excessive number of NodePort services, degrading the experience for other students, or flooding the API server with millions of requests, impacting performance for legitimate users.
While implementing policy engines like Kyverno or Gatekeeper could help prevent users from violating certain rules, it would require extensive trial and error to configure them correctly for each individual lab. Moreover, these policies may restrict students from creating their own namespaces, accessing the root file system, or deploying privileged containers, which are important aspects of learning Kubernetes.
vCluster is a Kubernetes cluster that runs on top of host Kubernetes clusters. Instead of having their own node pools or networking, vCluster schedules workloads inside the host cluster while maintaining their own control plane.
vCluster was an amazing solution for my multi-tenancy problem. It offered speed, better security, and ease of use. Its standout feature was the syncer, which replicated student-created resources from tenant environments onto the host cluster. You could specify which resources to replicate and how many of them to replicate. This feature was a game-changer for the content I could provide to students.
💡 For example, in the first intermediate level lab, titled “Debugging a Python Flask application,” vCluster syncer was used to allow students to create an ingress resource and make their deployed application publicly available.

vCluster creates a fake service and pod on the tenant that points to the pod scheduled on the host cluster.
In this demo, we will create a basic vCluster tenant in the student namespace. We will then create an NGINX pod and expose it using a NodePort service. This service will be replicated to the host cluster, allowing the NGINX pod to be accessible from the outside world via my host’s public IP.
---
apiVersion: v1
kind: Namespace
metadata:
name: student
labels:
pod-security.kubernetes.io/enforce: privileged
vcluster create tenant -n student --connect=false
$ vcluster connect tenant --namespace student -- kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-5c96599dbd-fsmwj 1/1 Running 0 116s
$ vcluster connect tenant --namespace student -- kubectl run pod nginx --image nginx
service/nginx exposed
$ vcluster connect tenant --namespace student -- kubectl expose pod nginx --type=NodePort --port 80
service/nginx exposed
$ kubectl get service -n student
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
....
nginx-x-default-x-tenant NodePort 10.43.229.163 <none> 80:**31871**/TCP 2m36s
$ curl $HOST_PUBLIC_IP:31871
<!DOCTYPE html>
GENERIC NGINX OUTPUT
</html>
💡 You can connect to vCluster without the use of vcluster command via a public kubeconfig, which was a requirement for my labs as students cannot be required to install vCluster cli and expect to have a “real Kubernetes experience”.
sync:
services:
enabled: true
ingresses:
enabled: true
persistentvolumeclaims:
enabled: true
In summary, the limitations of vCluster would significantly restrict certain content and scenarios in the learning platform that I wanted to offer to my students. While the syncer did enable access to some content that other solutions could not, it would also block much more content than it allowed, which was not aligned with my goals for the platform. Additionally, I could still explore the possibility of developing a custom syncer to replicate the functionality of vCluster’s syncer on a smaller scale. Therefore, I have abandoned vCluster and decided to go with virtualization.
An overview of two technologies, Firecracker and Kata containers, that enable the Firecracker runtime in Kubernetes. I looked into and experimented with these technologies, but decided not to use Kata containers because it required additional configuration for the Firecracker runtime, specifically with device mapper, which I wasn’t comfortable with. There were also extra steps to configure SSH connections into the Firecracker containers, which would result in a large container that may not achieve my desired outcome: a basic Kubernetes cluster on a complete operating system which I can break however I want.
Considering the limitations of vCluster regarding learning content options it lacks, the research pointed towards virtualization. This approach offers security and complete separation from the host system, allowing for the use of a full operating system and unlimited learning content.
Fortunately, there are two virtualization technologies available for use in Kubernetes: Kubevirt and Virtlet.
Before following up the demo you will need to install Kubevirt alongside QEMU hypervisor as per this guide.
The installation of the actual lab environments is more complicated than vCluster due to userdata scripts and storage considerations, but I’ll spare you the details and focus on the most important aspects. In this demo I’ll use a generic container disk image.
The following YAML is the definition of one of the virtual machines per lab environment, there are three, one control plane node, two worker nodes with slightly different userdata.
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm: myvm
name: controlplane
namespace: student
spec:
running: true
template:
metadata:
labels:
kubevirt.io/vm: myvm
spec:
domain:
devices:
disks:
- name: datavolumedisk1
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
resources:
requests:
memory: 1.5Gi
limits:
memory: 1.5Gi
cpu:
cores: 1
threads: 2
terminationGracePeriodSeconds: 0
volumes:
- name: datavolumedisk1
containerDisk:
image: "quay.io/containerdisks/ubuntu:22.04"
- name: cloudinitdisk
cloudInitNoCloud:
userData: |
#!/bin/bash
echo "ssh-rsa public key" >> /home/ubuntu/.ssh/authorized_keys
apiVersion: v1
kind: Service
metadata:
name: ssh
namespace: student
spec:
externalTrafficPolicy: Cluster
ports:
- name: nodeport
port: 27017
protocol: TCP
targetPort: 22
selector:
kubevirt.io/vm: myvm
type: NodePort
$ kubectl get svc -n student
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ssh NodePort 10.43.79.225 <none> 27017:32322/TCP 111m
$ ssh -i ~/.ssh/student ubuntu@$HOST_PUBLIC_IP -p 32322
...
ubuntu@controlplane:~$ ls /
bin boot dev etc home lib lib32 lib64 libx32 lost+found media mnt opt proc root run
A KubeVirt virtual machine is a tool that allows running virtualized instances within the Kubernetes ecosystem. Essentially, a KubeVirt virtual machine is a Pod that is tightly coupled with a QEMU virtual machine instance.
There are several components to Kubevirt, but I will focus on two important ones for my use case: VirtualMachine and VirtualMachineInstance. Both of these are deployed as Custom Resource Definitions (CRDs) to the host Kubernetes cluster.
When a new VirtualMachine definition is added to the Kubernetes API, Kubevirt performs the following steps:
For more detailed information you can visit the official Kubevirt documentation.
You can run Kubevirt instances on PVCs or container disk images, which are snapshots of an entire operating system mounted as a container.
Initially, I tried using PVCs, but it was cumbersome. I had to create a generic “golden PVC” with all Kubernetes components, and cloning it across namespaces took around 3 minutes. Other platforms can do it instantly!
So I experimented with container disk images. Once these images were pulled onto my Kubernetes cluster, initializing new environments became much quicker, taking around 1 minute and 30 seconds.
Although this was an improvement, it was still too long. To optimize further, I created a cache of ready-to-use tenant environments. This reduced provisioning time to less than 30 seconds, which is acceptable. However, I still plan to make it even quicker in the future.
💡 The cache consists of a database with ready-to-claim environments, running on the host cluster. When a student starts a challenge, the corresponding entry is deleted from the database and the tenant environment is allocated to the student. The cache is refilled using a cron.
To optimize resource management for the lab nodes, I increased the CPU limit per node from 100m to 1000m. This adjustment has resulted in faster provisioning and reduced wait times.
In terms of memory allocation, the control plane is allocated 1.5G, each node is allocated 1G, and an additional 100MB per virtual machine instance is consumed for internal containers in Kubevirt. Each lab consumes approximately 3.5-4G of memory and 3 threads. With a 64G machine, it is possible to run around 12 parallel labs, taking into account other components and no miners on virtual machines.
The networking security is primarily managed through restrictive networking policies. These policies effectively prevent access to other student environments and other Kubernetes components. Only the three Kubernetes components are allowed to communicate with each other, and the student is able to directly communicate with the control plane via SSH or kubeconfig. The student can then decide to hop onto node1 or node2 from the controlplane node.

The network topology from students perspective.
The operating system is Ubuntu 22.04. And as for Kubernetes distribution, I have decided to use Kubeadm for installing Kubernetes on the tenant environments. Initially, I planned to use k3s or its variants, but I realized that if I wanted to intentionally cause issues in k3s, the k3s binary would simply not start. Therefore, troubleshooting k3s might involve finding the correct k3s command, which is not the focus of the content I wanted to create.
My intention is to teach people how Kubernetes works by intentionally breaking various layers, from the operating system to individual components of Kubernetes. The Kubeadm distribution was the only option I could think of that allowed me to achieve my vision.
Overall, my decision was based on the content I can provide to my students, without considering the drawbacks of the technology used. While Kubevirt may not be perfect, particularly in terms of resource consumption, I can alleviate it by allocating a slightly higher budget each month. I could easily run it on a large bare metal server from Hetzner auction for less than 50 euros per month, which would provide me with sufficient RAM and processing power. I ultimately concluded that while Kubevirt may have some drawbacks, it allows for more flexibility in providing learning content and offers better security through virtualization.
My biggest challenge has been, and still is, having knowledge of various technologies but struggling to effectively integrate them. I have conducted experiments on technologies I am familiar with, encountered roadblocks, and then moved on to other technologies, wasting a lot of time. While everything is functioning well, I believe there is huge room for improvement.
💡 For example, I experimented with using Packer to build golden images for Kubevirt for lab environments after abandoning vCluster. However, I later decided to abandon Packer as well since I was already using the Ansible provisioner and didn’t require an additional tool alongside Ansible.
Currently, I am enhancing the infrastructure based on feedback from students and my backlog which is ever growing. However, progress is slow due to the diverse tasks related to content, user experience, Slack bot, marketing, security and various other tasks. I constantly have to switch and juggle between these tasks. There are almost 70 tasks in my queue, making it a large side project alongside my full time job.
I am now realizing that Slack may not be the ideal tool for building a community. It is more suited for team collaboration and has limitations on API calls for free accounts. Additionally, its pricing is based on every Slack member, which would be costly if I wanted to access API calls that are only available for paid accounts, especially considering my growing community of students in my Slack workspace.

The network topology from students perspective.