Automation of Kubernetes Testing

In the data engineering and DevOps worlds, Google Kubernetes Engine (GKE) has become an increasingly popular container orchestration tool. However, utilizing this technology requires an understanding of Kubernetes clusters, nodes, container replicas (pods), and services. Because all of these pieces are needed to run a working application, testing their weaknesses and checking for errors is a vital key to success when using GKE. Creating a universal GKE test script comprised of health checks and stress tests will save time on future deployments and give developers and admins a sense of comfort when deploying their applications on GKE.

Kubectl is an excellent GKE command-line tool to perform these checks and tests. The recommended approach is to write a bash script containing “kubetcl get” commands and the jsonpath output option. This allows test writers to parse through json and return what is needed for each test in an automated fashion. Upon cluster creation, nodes should be tested and following application deployment, the pods and services should also be tested.

Node Test

The emphasis for most tests will be placed on the pods and services, but basic node health testing is a necessary first step. An example includes checking the node and kubelet’s statuses and ensuring that they are both “Ready”. Start by getting the count of nodes and then iterating through that list. Check the “status.conditions[7].reason” and “status.conditions[7].type” fields. Here is an example below of how simple it is to parse the JSON output and return what’s needed.

node_list=$(kubectl get nodes -o=jsonpath='{range .items[*]}|{.metadata.name}’)

#count the list of names -> needed for the loop

node_count=$(echo $node_list | grep -o “|” | wc -l)

for((i=0;i<$node_count;i++));do

 #get the node reason for node i

 node_reason=$(kubectl get nodes -o=jsonpath='{.items[‘”$i”‘].status.conditions[7].reason}’)

 #get the node type for node i

 node_type=$(kubectl get nodes -o=jsonpath='{.items[‘”$i”‘].status.conditions[7].type}’)

 #logic to check for the “KubeletReady” reason and “Ready” type

done

Pod Tests

While testing nodes is important, not all users have permissions to view or edit node pools and node health can be monitored through Stackdriver or other monitoring tools. Most developers will simply deploy their applications on nodes and perform tests on the pods and services. An example of a pod health check may include checking the number of ready pods versus the number that were requested in the workload1. Another approach may include checking the status of each pod.

An example of a pod stress test is ensuring that pods are resilient and restart when a node is unavailable or the pod is deleted. If a pod is scheduled on a node and that node goes down2, the pod should be rescheduled on an available node3. In the kubectl command, the pod index order does not change if a pod is deleted/unscheduled. Therefore to test that a pod is rebuilt on a different node, users can test the pod’s node name before and after the rebuild using the same pod index4. To drain the node try: “kubectl drain $node1_name –grace-period=900  –force –delete-local-data –ignore-daemonsets”.

The stress test for deleting pods requires testing the pod’s IP address. For stateful sets this method is necessary, but for other workloads, verifying that the pod name has changed would also suffice. This example draws inspiration from chaoskube. Rather than delete specific pods, this test selects pods at random, deletes them, and checks their IP address. Start by constructing a loop that returns a random number between index 0 and n-1 (n is equal to the number of pods). Then find that Pod’s IP address, delete the pod, and add another loop to check for the pod’s new IP address. The first loop allows multiple pods to be tested, while the second takes pod restart time into consideration. Rebuild times can vary, so the loop allows for multiple IP address checks.

The final pod stress test allows users to test their auto-scaling setup by utilizing their preferred load testing tool (such as Apache JMeter) and monitoring the number of pods in a workload over a given number of loops. Users can automate this test by including a load testing tool in the container image and inserting a command to increase the CPU/memory usage depending on their auto-scaling parameters. Otherwise, manually run the load test separately and automatically monitor the number of pods. This test is similar to the IP address test because it uses a finite number of loops to see if the workload has been scaled.

Container Image Test

A popular feature in GKE is the rolling update. When an application needs to be updated, rather than creating a new workload and deleting the previous one, rolling updates are used to gradually update pods to point to a different container image or version5. There are a few options for performing this test such as monitoring a service endpoint to watch as traffic is gradually routed to a new application, or running a kubectl command to check a pod’s container image. This test could be automated by hardcoding a desired container image in the rolling update, or checking for a recent version of the same image and using that,

Service Test

The final test involves testing a service endpoint’s health to ensure that it’s routing traffic to pods. Test developers could include a couple of scenarios such as services that do not expose the application externally and therefore do not have an IP address versus situations where services cannot route traffic because the pod is failing. To check that a service is routing traffic to the pod try checking the endpoint’s headers and look for a 202 status.

Summary

These tests are just examples of how the kubectl tool can be used to automate Kubernetes testing. Additional tests outside of kubectl can be performed on logging, monitoring, or tests on the application itself. Hopefully, this gives GKE developers a sample of how testing could be done and where to start.

  • Note that there is no command for getting information for all workload types at once, so one approach is getting all pods, the workload types and workload names for those pods, creating a unique list of those combinations and then running commands to check the number of ready versus requested pods.
  • Note that draining a node is going to be the proxy for making a node unavailable
  • Note that this could also be coupled with cluster autoscaling. If a pod needs to be scheduled and there are no available nodes that can schedule that pod, then a new node could be automatically created.
  • Note that statefulset workloads allow pods to maintain their names when they are restarted, so testing a specific pod by name is possible after it has been deleted or unscheduled.
  • Note that this can be coupled with liveness and readiness checks to ensure that traffic is only routed to working pods.

Leave a Comment