Thursday, November 21, 2024

How we built a dynamic Kubernetes API Server for the API Aggregation Layer in Cozystack

How we built a dynamic Kubernetes API Server for the API Aggregation Layer in Cozystack

Hi there! I'm Andrei Kvapil, but you might know me as @kvaps in communities dedicated to Kubernetes and cloud-native tools. In this article, I want to share how we implemented our own extension api-server in the open-source PaaS platform, Cozystack.

Kubernetes truly amazes me with its powerful extensibility features. You're probably already familiar with the controller concept and frameworks like kubebuilder and operator-sdk that help you implement it. In a nutshell, they allow you to extend your Kubernetes cluster by defining custom resources (CRDs) and writing additional controllers that handle your business logic for reconciling and managing these kinds of resources. This approach is well-documented, with a wealth of information available online on how to develop your own operators.

However, this is not the only way to extend the Kubernetes API. For more complex scenarios such as implementing imperative logic, managing subresources, and dynamically generating responses—the Kubernetes API aggregation layer provides an effective alternative. Through the aggregation layer, you can develop a custom extension API server and seamlessly integrate it within the broader Kubernetes API framework.

In this article, I will explore the API aggregation layer, the types of challenges it is well-suited to address, cases where it may be less appropriate, and how we utilized this model to implement our own extension API server in Cozystack.

What Is the API Aggregation Layer?

First, let's get definitions straight to avoid any confusion down the road. The API aggregation layer is a feature in Kubernetes, while an extension api-server is a specific implementation of an API server for the aggregation layer. An extension API server is just like the standard Kubernetes API server, except it runs separately and handles requests for your specific resource types.

So, the aggregation layer lets you write your own extension API server, integrate it easily into Kubernetes, and directly process requests for resources in a certain group. Unlike the CRD mechanism, the extension API is registered in Kubernetes as an APIService, telling Kubernetes to consider this new API server and acknowledge that it serves certain APIs.

You can execute this command to list all registered apiservices:

kubectl get apiservices.apiregistration.k8s.io

Example APIService:

NAME SERVICE AVAILABLE AGE
v1alpha1.apps.cozystack.io cozy-system/cozystack-api True 7h29m

As soon as the Kubernetes api-server receives requests for resources in the group v1alpha1.apps.cozystack.io, it redirects all those requests to our extension api-server, which can handle them based on the business logic we've built into it.

When to use the API Aggregation Layer

The API Aggregation Layer helps solve several issues where the usual CRD mechanism might not enough. Let's break them down.

Imperative Logic and Subresources

Besides regular resources, Kubernetes also has something called subresources.

In Kubernetes, subresources are additional actions or operations you can perform on primary resources (like Pods, Deployments, Services) via the Kubernetes API. They provide interfaces to manage specific aspects of resources without affecting the entire object.

A simple example is status, which is traditionally exposed as a separate subresource that you can access independently from the parent object. The status field isn't meant to be changed

But beyond /status, Pods in Kubernetes also have subresources like /exec, /portforward, and /log. Interestingly, instead of the usual declarative resources in Kubernetes, these represent endpoints for imperative operations like viewing logs, proxying connections, executing commands in a running container, and so on.

To support such imperative commands on your own API, you need implement an extension API and an extension API server. Here are some well-known examples:

  • KubeVirt: An add-on for Kubernetes that extends its API capabilities to run traditional virtual machines. The extension api-server created as part of KubeVirt handles subresources like /restart, /console, and /vnc for virtual machines.
  • Knative: A Kubernetes add-on that extends its capabilities for serverless computing, implementing the /scale subresource to set up autoscaling for its resource types.

By the way, even though subresource logic in Kubernetes can be imperative, you can manage access to them declaratively using Kubernetes standard RBAC model.

For example this way you can control access to the /log and /exec subresources of the Pod kind:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
 namespace: default
 name: pod-and-pod-logs-reader
rules:
- apiGroups: [""]
 resources: ["pods", "pods/log"]
 verbs: ["get", "list"]
- apiGroups: [""]
 resources: ["pods/exec"]
 verbs: ["create"]

You're not tied to use etcd

Usually, the Kubernetes API server uses etcd for its backend. However, implementing your own API server doesn't lock you into using only etcd. If it doesn't make sense to store your server's state in etcd, you can store information in any other system and generate responses on the fly. Here are a few cases to illustrate:

  • metrics-server is a standard extension for Kubernetes which allows you to view real-time metrics of your nodes and pods. It defines alternative Pod and Node kinds in its own metrics.k8s.io API. Requests to these resources are translated into metrics directly from Kubelet. So when you run kubectl top node or kubectl top pod, metrics-server fetches metrics from cAdvisor in real-time. It then returns these metrics to you. Since the information is generated in real-time and is only relevant at the moment of the request, there is no need to store it in etcd. This approach saves resources.

  • If needed, you can use a backend other than etcd. You can even implement a Kubernetes-compatible API for it. For example, if you use Postgres, you can create a transparent representation of its entities in the Kubernetes API. Eg. databases, users, and grants within Postgres would appear as regular Kubernetes resources, thanks to your extension API server. You could manage them using kubectl or any other Kubernetes-compatible tool. Unlike controllers, which implement business logic using custom resources and reconciliation methods, an extension API server eliminates the need for separate controllers for every kind. This means you don't have to sync state between the Kubernetes API and your backend.

One-Time resources

  • Kubernetes has a special API used to provide users with information about their permissions. This is implemented using the SelfSubjectAccessReview API. One unusual detail of these resources is that you can't view them using get or list verbs. You can only create them (using the create verb) and receive output with information about what you have access to at that moment.

    If you try to run kubectl get selfsubjectaccessreviews directly, you'll just get an error like this:

    Error from server (MethodNotAllowed): the server does not allow this method on the requested resource
    

    The reason is that the Kubernetes API server doesn't support any other interaction with this type of resource (you can only CREATE them).

    The SelfSubjectAccessReview API supports commands such as:

    kubectl auth can-i create deployments --namespace dev
    

    When you run the command above, kubectl creates a SelfSubjectAccessReview using the Kubernetes API. This allows Kubernetes to fetch a list of possible permissions for your user. Kubernetes then generates a personalized response to your request in real-time. This logic is different from a scenario where this resource is simply stored in etcd.

  • Similarly, in KubeVirt's CDI (Containerized Data Importer) extension, which allows file uploads into a PVC from a local machine using the virtctl tool, a special token is required before the upload process begins. This token is generated by creating an UploadTokenRequest resource via the Kubernetes API. Kubernetes routes (proxies) all UploadTokenRequest resource creation requests to the CDI extension API server, which generates and returns the token in response.

Full control over conversion, validation, and output formatting

  • Your own API server can have all the capabilities of the vanilla Kubernetes API server. The resources you create in your API server can be validated immediately on the server side without additional webhooks. While CRDs also support server-side validation using Common Expression Language (CEL) for declarative validation and ValidatingAdmissionPolicies without the need for webhooks, a custom API server allows for more complex and tailored validation logic if needed.

    Kubernetes allows you to serve multiple API versions for each resource type, traditionally v1alpha1, v1beta1 and v1. Only one version can be specified as the storage version. All requests to other versions must be automatically converted to the version specified as storage version. With CRDs, this mechanism is implemented using conversion webhooks. Whereas in an extension API server, you can implement your own conversion mechanism, choose to mix up different storage versions (one object might be serialized as v1, another as v2), or rely on an external backing API.

  • Directly implementing the Kubernetes API lets you format table output however you like and doesn't force you to follow the additionalPrinterColumns logic in CRDs. Instead, you can write your own formatter that formats the table output and custom fields in it. For example, when using additionalPrinterColumns, you can display field values only following the JSONPath logic. In your own API server, you can generate and insert values on the fly, formatting the table output as you wish.

Dynamic resource registration

  • The resources served by an extension api-server don't need to be pre-registered as CRDs. Once your extension API server is registered using an APIService, Kubernetes starts polling it to discover APIs and resources it can serve. After receiving a discovery response, the Kubernetes API server automatically registers all available types for this API group. Although this isn't considered common practice, you can implement logic that dynamically registers the resource types you need in your Kubernetes cluster.

When not to use the API Aggregation Layer

There are some anti-patterns where using the API Aggregation Layer isn't recommended. Let's go through them.

Unstable backend

If your API server stops responding for some reason due to an unavailable backend or other issues it may block some Kubernetes functionality. For example, when deleting namespaces, Kubernetes will wait for a response from your API server to see if there are any remaining resources. If the response doesn't come, the namespace deletion will be blocked.

Also, you might have encountered a situation where, when the metrics-server is unavailable, an extra message appears in stderr after every API request (even unrelated to metrics) stating that metrics.k8s.io is unavailable. This is another example of how using the API Aggregation Layer can lead to problems when the api-server handling requests is unavailable.

Slow requests

If you can't guarantee an instant response for user requests, it's better to consider using a CustomResourceDefinition and controller. Otherwise, you might make your cluster less stable. Many projects implement an extension API server only for a limited set of resources, particularly for imperative logic and subresources. This recommendation is also mentioned in the official Kubernetes documentation.

Why we needed it in Cozystack

As a reminder, we're developing the open-source PaaS platform Cozystack, which can also be used as a framework for building your own private cloud. Therefore, the ability to easily extend the platform is crucial for us.

Cozystack is built on top of FluxCD. Any application is packaged into its own Helm chart, ready for deployment in a tenant namespace. Deploying any application on the platform is done by creating a HelmRelease resource, specifying the chart name and parameters for the application. All the rest logic is handled by FluxCD. This pattern allows us to easily extend the platform with new applications and provide the ability to create new applications that just need to be packaged into the appropriate Helm chart.

Interface of the Cozystack platform

Interface of the Cozystack platform

So, in our platform, everything is configured as HelmRelease resources. However, we ran into two problems: limitations of the RBAC model and the need for a public API. Let's delve into these

Limitations of the RBAC model

The widely-deployed RBAC system in Kubernetes doesn't allow you to restrict access to a list of resources of the same kind based on labels or specific fields in the spec. When creating a role, you can limit access across the resources in the same kind only by specifying specific resource names in resourceNames. For verbs like get or update it will work. However, filtering by resourceNames using list verb doesn't work like that. Thus you can limit listing certain resources by kind but not by name.

  • Kubernetes has a special API used to provide users with information about their permissions. This is implemented using the SelfSubjectAccessReview API. One unusual detail of these resources is that you can't view them using get or list verbs. You can only create them (using the create verb) and receive output with information about what you have access to at that moment.

So, we decided to introduce new resource types based on the names of the Helm charts they use and generate the list of available kinds dynamically at runtime in our extension api-server. This way, we can reuse Kubernetes standard RBAC model to manage access to specific resource types.

Need for a public API

Since our platform provides capabilities for deploying various managed services, we want to organize public access to the platform's API. However, we can't allow users to interact directly with resources like HelmRelease because that would let them specify arbitrary names and parameters for Helm charts to deploy, potentially compromising our system.

We wanted to give users the ability to deploy a specific service simply by creating the resource with corresponding kind in Kubernetes. The type of this resource should be named the same as the chart from which it's deployed. Here are some examples:

  • kind: Kuberneteschart: kubernetes
  • kind: Postgreschart: postgres
  • kind: Redischart: redis
  • kind: VirtualMachinechart: virtual-machine

Moreover, we don't want to have to add a new type to codegen and recompile our extension API server every time we add a new chart for it to start being served. The schema update should be done dynamically or provided via a ConfigMap by the administrator.

Two-Way conversion

Currently, we already have integrations and a dashboard that continue to use HelmRelease resources. At this stage, we didn't want to lose the ability to support this API. Considering that we're simply translating one resource into another, support is maintained and it works both ways. If you create a HelmRelease, you'll get a custom resource in Kubernetes, and if you create a custom resource in Kubernetes, it will also be available as a HelmRelease.

We don't have any additional controllers that synchronize state between these resources. All requests to resources in our extension API server are transparently proxied to HelmRelease and vice versa. This eliminates intermediate states and the need to write controllers and synchronization logic.

Implementation

To implement the Aggregation API, you might consider starting with the following projects:

  • apiserver-builder: Currently in alpha and hasn't been updated for two years. It works like kubebuilder, providing a framework for creating an extension API server, allowing you to sequentially create a project structure and generate code for your resources.
  • sample-apiserver: A ready-made example of an implemented API server, based on official Kubernetes libraries, which you can use as a foundation for your project.

For practical reasons, we chose the second project. Here's what we needed to do:

Disable etcd support

In our case, we don't need it since all resources are stored directly in the Kubernetes API.

You can disable etcd options by passing nil to RecommendedOptions.Etcd:

Generate a common resource kind

We called it Application, and it looks like this:

This is a generic type used for any application type, and its handling logic is the same for all charts.

Configure configuration loading

Since we want to configure our extension api-server via a config file, we formed the config structure in Go:

We also modified the resource registration logic so that the resources we create are registered in scheme with different Kind values:

As a result, we got a config where you can pass all possible types and specify what they should map to:

Implement our own registry

To store state not in etcd but translate it directly into Kubernetes HelmRelease resources (and vice versa), we wrote conversion functions from Application to HelmRelease and from HelmRelease to Application:

We implemented logic to filter resources by chart name, sourceRef, and prefix in the HelmRelease name:

Then, using this logic, we implemented the methods Get(), Delete(), List(), Create().

You can see the full example here:

At the end of each method, we set the correct Kind and return an unstructured.Unstructured{} object so that Kubernetes serializes the object correctly. Otherwise, it would always serialize them with kind: Application, which we don't want.

What did we achieve?

In Cozystack, all our types from the ConfigMap are now available in Kubernetes as-is:

kubectl api-resources | grep cozystack
buckets apps.cozystack.io/v1alpha1 true Bucket
clickhouses apps.cozystack.io/v1alpha1 true ClickHouse
etcds apps.cozystack.io/v1alpha1 true Etcd
ferretdb apps.cozystack.io/v1alpha1 true FerretDB
httpcaches apps.cozystack.io/v1alpha1 true HTTPCache
ingresses apps.cozystack.io/v1alpha1 true Ingress
kafkas apps.cozystack.io/v1alpha1 true Kafka
kuberneteses apps.cozystack.io/v1alpha1 true Kubernetes
monitorings apps.cozystack.io/v1alpha1 true Monitoring
mysqls apps.cozystack.io/v1alpha1 true MySQL
natses apps.cozystack.io/v1alpha1 true NATS
postgreses apps.cozystack.io/v1alpha1 true Postgres
rabbitmqs apps.cozystack.io/v1alpha1 true RabbitMQ
redises apps.cozystack.io/v1alpha1 true Redis
seaweedfses apps.cozystack.io/v1alpha1 true SeaweedFS
tcpbalancers apps.cozystack.io/v1alpha1 true TCPBalancer
tenants apps.cozystack.io/v1alpha1 true Tenant
virtualmachines apps.cozystack.io/v1alpha1 true VirtualMachine
vmdisks apps.cozystack.io/v1alpha1 true VMDisk
vminstances apps.cozystack.io/v1alpha1 true VMInstance
vpns apps.cozystack.io/v1alpha1 true VPN

We can work with them just like regular Kubernetes resources.

Listing S3 Buckets:

kubectl get buckets.apps.cozystack.io -n tenant-kvaps

Example output:

NAME READY AGE VERSION
foo True 22h 0.1.0
testaasd True 27h 0.1.0

Listing Kubernetes Clusters:

kubectl get kuberneteses.apps.cozystack.io -n tenant-kvaps

Example output:

NAME READY AGE VERSION
abc False 19h 0.14.0
asdte True 22h 0.13.0

Listing Virtual Machine Disks:

kubectl get vmdisks.apps.cozystack.io -n tenant-kvaps

Example output:

NAME READY AGE VERSION
docker True 21d 0.1.0
test True 18d 0.1.0
win2k25-iso True 21d 0.1.0
win2k25-system True 21d 0.1.0

Listing Virtual Machine Instances:

kubectl get vminstances.apps.cozystack.io -n tenant-kvaps

Example output:

NAME READY AGE VERSION
docker True 21d 0.1.0
test True 18d 0.1.0
win2k25 True 20d 0.1.0

We can create, modify, and delete each of them, and any interaction with them will be translated into HelmRelease resources, while also applying the resource structure and prefix in the name.

To see all related Helm releases:

kubectl get helmreleases -n tenant-kvaps -l cozystack.io/ui

Example output:

NAME AGE READY
bucket-foo 22h True
bucket-testaasd 27h True
kubernetes-abc 19h False
kubernetes-asdte 22h True
redis-test 18d True
redis-yttt 12d True
vm-disk-docker 21d True
vm-disk-test 18d True
vm-disk-win2k25-iso 21d True
vm-disk-win2k25-system 21d True
vm-instance-docker 21d True
vm-instance-test 18d True
vm-instance-win2k25 20d True

Next Steps

We don’t intend to stop here with our API. In the future, we plan to add new features:

  • Add validation based on an OpenAPI spec generated directly from Helm charts.
  • Develop a controller that collects release notes from deployed releases and shows users access information for specific services.
  • Revamp our dashboard to work directly with the new API.

Conclusion

The API Aggregation Layer allowed us to quickly and efficiently solve our problem by providing a flexible mechanism for extending the Kubernetes API with dynamically registered resources and converting them on the fly. Ultimately, this made our platform even more flexible and extensible without the need to write code for each new resource.

You can test the API yourself in the open-source PaaS platform Cozystack, starting from version v0.18.

Continue Reading

Monday, November 18, 2024

2024-11-18

Observability Day videos

The video recordings of the Observability Day NA 2024 are now available via a YouTube playlist. Enjoy!

Prom 3.0

Earlier this year, during PromCon, the beta was announced and now Prometheus 3.0 is in GA: from a new UI to OTLP support to native histograms, check out the Prometheus team announcement to learn more.

AWS observability solutions

Very happy to finally being able to share what we've been working on: check out our announcement for the AWS observability solutions, covering telemetry collection, dashboards, and more.

Continuous profiling sandbox

Want to try out continuous profiling without setting up anything? Visit the Pyroscope Go Playground and give it a try!

Syslog and OTel

Introducing Axoflow FilterX: Revolutionizing Log Parsing and Filtering for Complex Data by Balázs Scheidler describes how Syslog-ng meets OpenTelemetry.


That was it for this edition of the o11y newsletter. Feel free to share news items with me via Twitter—DMs are open if you like to share something in private. Stay safe and hope to see you around next week!

Continue Reading

Monday, November 18, 2024

The AI for Science Forum: A new era of discovery

The AI Science Forum highlights AI's present and potential role in revolutionizing scientific discovery and solving global challenges, emphasizing collaboration between the scientific community, policymakers, and industry leaders.

Continue Reading

Monday, November 18, 2024

SRE Weekly Issue #451

View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team’s Secret Training Ground. https://firehydrant.com/blog/the-hidden-value-of-lower-severity-incidents/ DEATHTRAP! The Strange story of Air France flight 736 Most fascinating air incident report I’ve seen in awhile! The pilots deviated from the non-normal checklist, and it immediately made me think […]

Continue Reading

Friday, November 15, 2024

5 Tips for Avoiding Common Rookie Mistakes in Machine Learning Projects

It's easy enough to make poor decisions in your machine learning projects that derail your efforts and jeopardize your outcomes, especially as a beginner.

Continue Reading

Thursday, November 14, 2024

Mastering the Art of Hyperparameter Tuning: Tips, Tricks, and Tools

Machine learning (ML) models contain numerous adjustable settings called hyperparameters that control how they learn from data.

Continue Reading

Monday, November 11, 2024

2024-11-11

OpenTelemetry at KubeCon

Via the official OpenTelemetry blog, read Severin Neumann's post Join us for OpenTelemetry Talks and Activities at KubeCon NA 2024 to learn where and how you can engage. Enjoy!

OTel for CI/CD

Dotan Horovits and Adriel Perkins published OpenTelemetry Is expanding into CI/CD observability via the CNCF blog, providing some context on what happened so far and a peek into the future.

Checkly

Exciting times as more and more OpenTelemetry native observability providers are emerging: the most recent example I was made aware of is Checkly, currently in beta. Have a look yourself via Getting started with Checkly Traces and OpenTelemetry.

KEDA and OTel

Kubernetes Event-driven Autoscaling (KEDA) is a popular project used to scale apps running on Kubernetes. Read this article from Jirka Kremser to learn how to use the Kedify add-on for an OpenTelemetry-native way to scale, based on application metrics.


That was it for this edition of the o11y newsletter. Feel free to share news items with me via Twitter—DMs are open if you like to share something in private. Stay safe and hope to see you around next week!

Continue Reading

Monday, November 11, 2024

A Practical Guide to Choosing the Right Algorithm for Your Problem: From Regression to Neural Networks

This article explains, through clear guidelines, how to choose the right machine learning (ML) algorithm or model for different types of real-world and business problems.

Continue Reading

Monday, November 11, 2024

SRE Weekly Issue #450

View on sreweekly.com A message from our sponsor, FireHydrant: Practice Makes Prepared: Why Every Minor System Hiccup Is Your Team’s Secret Training Ground. https://firehydrant.com/blog/the-hidden-value-of-lower-severity-incidents/ The Unofficial SRE Track for KubeCon NA ’24 If you’re heading to KubeCon this week, here are some talks to consider.   JJ Tang — Rootly Choosing the right Postgres indexes This […]

Continue Reading

Friday, November 08, 2024

KubeWeekly #410

KubeWeekly #410

Table of Contents KubeWeekly—the newsletter for all things Kubernetes and beyond Find out more about the KubeWeekly newsletter and join the list to receive an email each time a new issue is available. Headlines Come see...

Continue Reading