Scaling apps and workloads is not a new phenomena for software engineers. Some may experience that their apps has high usage surges (Ticketmaster anyone). Others have workers and jobs with varying throughput because of their business-logic.
As more and more workloads run in Kubernetes, many software engineers has probably appreciated how easy it is to scale workloads in Kubernetes using the built-in Horizontal Pod Autoscaler. The HPA allow you to automatically scale the number of pods your app are running based on resource metrics such CPU and memory usage. This works fine for some scenarios, but in other cases the autoscaling should optimally be more proactive.
Kubernetes Event-driven Autoscaling (KEDA) extends the capabilities of the HPA and allows software engineers to scale workloads on external metrics, such as number of messages in a queue, files in a blob storage, Elasticsearch queries and many more, in a simple and cloud-native way. As the name implies, it fits well with workloads running in an event-driven architecture.
At Intility, we have successfully implemented KEDA to autoscale our Mobile Operator Data Platform. KEDA delivers on its promise, but there are several implications one should consider before implementing it in production. More on that in our summary.
This article will show how you install and use KEDA to autoscale an app receiving messages from a Service Bus Queue. As Kubernetes is complex, we will use some time setting up our local environment and explaining what's going on.
Okay. The list of prerequisites below is a lot. Each tool is not strictly necessary either, but they will allow you to use more time on fun stuff, like testing KEDA, and less time on time-consuming stuff, like "how do you setup a built-in registry in my cluster again?".
Clone the GitHub repository and navigate to the folder: <highlight-mono>git clone firstname.lastname@example.org:daniwk/app-scaling-keda.git && cd app-scaling-keda/<highlight-mono>
To install KEDA you will need to have access to a k8s cluster with cluster-admin privileges, either locally or a hosted deployment. In this article we'll use kind to create a local cluster. Additionally, we'll use ctlptl to configure the cluster declaratively and create it with a built-in container registry.
<highlight-mono>k9s<highlight-mono> offers a bit more user-friendly interface than the standard kubectl CLI to your Kubernetes cluster.
Tilt deserves a blog post in its own. It's a toolkit which greatly improves the local development of Kubernetes apps. It provides a tight feedback loop by automatically building your Dockerfile and deploying it to your local k8s cluster live with every code edit, in addition to a whole lot more. We'll use it here to skip the docker build and push process.
Log into your Azure account. If you don't have any, sign up to a free subscription here.
Use the Azure Cloud Shell or AZ CLI to get going:
Now that our local environment is up and running, we are ready to deploy our demo app and use KEDA to autoscale it. The demo app consists of:
To deploy the demo app you simply run <highlight-mono>tilt up<highlight-mono> from the root of the directory. Press <highlight-mono>space<highlight-mono> and you will be sent to the Tilt UI which shows the resources being provisioned. You can also view the logs from the sender app by selecting <highlight-mono>sq-queue-sender<highlight-mono> and the subsequent logs from the receiver app by selecting the <highlight-mono>sb-queue-receiver<highlight-mono>.
You can also view your resources by using k9s. You can use the following commands to view your resource:
Put simply, KEDA scales apps by monitoring an external source and feed those metrics in to the Horizontal Pod Autoscaler and its external metric capability. Thus, KEDA extends existing functionality in Kubernetes, Like other Kubernetes tools also does.
Specifically, we will:
Install KEDA by running <highlight-mono>kubectl kustomize deploy/keda --enable-helm | kubectl apply -f -<highlight-mono>.
The ScaledObject is defined in <highlight-mono>deploy/autoscale/scaledobject.yaml<highlight-mono>:
Deploy it by running the following command: <highlight-mono>kubectl apply -f deploy/autoscale/scaledobject.yaml<highlight-mono>. To view this resource in k9s simply type <highlight-mono>:scaledobject<highlight-mono> and press <highlight-mono>d<highlight-mono>. If type <highlight-mono>:hpa<highlight-mono> you will also see that KEDA has deployed a HorizontalPodAutoscaler resource.
Now switch to view your deployments in k9s (<highlight-mono>:deploy<highlight-mono>). Under the READY column you see that the receiver app has been scaled down to run zero pods. This is controlled by the ScaledObjects <highlight-mono>spec.minReplicaCount<highlight-mono> parameter which we configured above. To see the autoscaling in action, simply trigger an update for the sb-queue-sender in the Tilt UI. This will post 1000 new messages in the Service Bus Queue. After a few seconds, KEDA will see that there is 1000 unhandled messages in the queue and feed this data to the HPA, which will scale our receiver app. The scaling of our app will be visible in the READY column (shows the number of ready pods and total pods) or by pressing enter on the <highlight-mono>sb-queue-receiver<highlight-mono> deployment
We have now seen how we can use KEDA to autoscale our Kubernetes apps based on external data source, such as number of messages in a Service Bus Queue. Once you have your k8s cluster and external data source available and ready, it's actually straightforward to implement KEDA. However, after having implemented it and running in production for a while, we want to share our learnings.
KEDA delivers on its promise to be a single-purpose and lightweight component. As we have seen, from the developers viewpoint, implementation of autoscaling is simple. Additionally, KEDA offers a large number of ready-made scalers.
But you should plan ahead with your platform team before rolling it out in production. First off, it's important to define resource request and limits for your workloads. This helps the Kubernetes scheduler schedule your pods without bringing down your production environment. Secondly, if your workloads need to scale out to 50+ replicas you should consider setting up a dedicated node(s) to separate more stable production workloads from the workloads with surges. Implementing autoscaling on your k8s cluster could also be good idea.
Additionally, we recommend reading the KEDA docs, specifically the KEDA Concepts and the comments about long-running executions. In short, if you use k8s deployments with long-running executions, KEDA may scale down a replica which isn't finished with it's processing. For these scenarios it's recommended to tap into the lifecycle hooks or change the app to run k8s Job instead.