Unleashing Testing at Scale: How GovTech Edu Built a 200K RPS Load Testing Platform

GovTech Edu

Published in

GovTech Edu

15 min readJul 20, 2023

Writers: Hamnah Suhaeri, Damarananta & Estu Fardani

TLDR

A product can be deemed exceptional when it successfully withstands an immensely heavy load test. But what does “heavy load” entail? One can envision the sheer number of users who would rely on and utilize the product daily following its launch. It can be considered outstanding if the product can perform flawlessly under such demanding conditions.

Prologue

As an organization that supports the Government in building technology products to serve millions of users, we must ensure that we build technology capable of enhancing the Indonesian community’s proficiency in existing systems. This is accomplished by conducting comprehensive performance testing. With a target of 2–4 million users and a system that will allow simultaneous access, we need to ensure that the quality of the launched system has undergone rigorous performance testing. On the other hand, government systems often need to pay more attention to the quality of the system used in the eyes of the public, leading to many citizens utilizing state-owned systems with subpar performance.

Now, what is the process behind choosing K6 Distributed? Why did we select this tool over other options like Locust, Vegetta, Jmeter, or the others?

The decision to use K6 is based on several clear reasons. Various teams have conducted independent experiments to perform load tests using the K6 framework. However, there are specific comparisons that led GovTech to choose K6 ultimately. Essentially, K6 utilizes the JavaScript programming language. Javascript as its base also helps in setting it up easier. Moreover, we use various OS, and Javascript can be easily set up anywhere. Our non-technical team can still use JMeter to create the script because K6 provides a JMX converter for a simple JMX script (without any advanced plugin).

What is K6

K6 is an open-source load-testing tool used to test the performance and scalability of web applications and APIs. It allows you to simulate a large number of concurrent virtual users, also known as “VUs,” and measure how your system performs under load.

Load testing with k6 involves creating test scenarios that simulate user behavior, sending HTTP requests, and collecting performance metrics. Here are the key components and steps involved in a k6 load test:

Scripting: Load tests in k6 are written as JavaScript code using the k6 scripting API. You define the behavior of virtual users, including the sequence of HTTP requests, headers, payloads, and any dynamic data needed.
Virtual Users (VUs): VUs are simulated users that generate traffic on your application or API. You define the number of VUs to simulate during the load test. Each VU executes the specified script independently, simulating real-world user behavior.
Results and Analysis: After the load test completes, k6 provides detailed test results, including graphs and statistics. You can analyze these results to identify performance issues, measure system capacity, validate SLAs, and make improvements to optimize your application’s performance.

K6’s flexibility, ease of use, and robust reporting capabilities make it a popular choice for load-testing web applications and APIs. It allows developers and performance engineers to validate the performance of their systems, detect performance bottlenecks, and make data-driven decisions to optimize application performance.

How GovTech Utilizes K6 for Load Testing

K6 is one such tool that can be used to perform a load test on a web service. This load test determines the limit of users/requests served by the system being tested in a unit of time.

K6 is already used at GovTech to test loads on several GovTech services and products. On a test load greater than 500, virtual users will use a VM that has K6 tools installed. If the test load is only 500 virtual users or smaller, it will use the K6 tools installed on the Kubernetes Cluster.

And then, we face a challenge to maximize K6 to test the load on a system with more than 200,000 requests per second to support one of the government-based applications. The National Assessment (a system used in various levels of education in Indonesia), needs to undergo traffic load testing with a target of 2 million users and achieve a rate of 246,000 requests per second. The government has now developed a product that can be used by the Indonesian community with exceptional quality amidst other government products that fail to satisfy users in many existing systems

How K6 works compared to other tools

The K6 was used at GovTech to replace the earlier load testing tool, Typhoon. Typhoon is an interface service for easy testing using Apache JMeter.

Unlike JMeter, which uses the XML format in writing test scenarios, K6 uses the JS (javascript) file type for writing test scenarios. This file also has fewer lines of code.

Here’s an example of a distributed.js test file:

// Copyright 2021 Kementerian Pendidikan dan Kebudayaan Republik Indonesia

import http from 'k6/http';
import { check, group, sleep } from 'k6';
import { htmlReport } from 'https://raw.githubusercontent.com/benc-uk/k6-reporter/main/dist/bundle.js';

export function handleSummary(data) {
  return {
    'public/summary.html': htmlReport(data),
    'target/summary_distributed.json': JSON.stringify(data),
  };
}

export const options = {
  scenarios: {
    Test: {
      executor: 'per-vu-iterations',
      vus: 240000,
      iterations: 100,
      maxDuration: '90s',
      exec: 'test',
    },
  },
};

export function test() {
  group('Test 1', () => {
    const params = {
      timeout: '60s',
    };

    const response = http.get('https://example.com/index.html', params);
    check(response, {
      'Test URL 1 Success': () => response.status.toString() === '200',
    });
    sleep(0.5);
  });
}

In using K6, there are several terms and variables that will often be mentioned and used, namely:

Duration (in second), how long the test will be carried out
Iteration, repeating requests for the duration

K6 Before Distributed Testing

K6 is currently deployed via the gitlab-runner mechanism. There are two types of gitlab-runner available:

Kubernetes Gitlab-Runner

This type of gitlab-runner powers the Kubernetes cluster. To run this job, a custom container image is built for this job, using a k6 image and adding with small tuning. This type of runner is only used if the target Virtual User (VU) is less than 500.

Gitlab Runner VMs

This type of Gitlab-runner empowers virtual machine resources. This runner will use the shell as an executor to run jobs. K6 binaries and other tools are installed with the help of ansible-playbook. This type of runner is only used if the target Virtual User (VU) is greater than 500.

We do several tests with many scenarios using a single VM with Gitlab Runner installed. The final specification is

n1-standard-64 (64 Cores, 240GB RAM)
64,510 Virtual Users (VUs) or equivalent to a maximum of 101,540 Requests Per Second (RPS)

Config sysctl:

The result is here: maximum VUs is the same as the maximum port that the operating system can open. We set the port range from 1024–65535. So we have 64.510 ports only. This value matches our last Virtual Users. If we increase VU above 64.510, the error rate will increase too. So by increasing the VMs count, we can create more VU.

As we have a target RPS of 240K and each VM can create 64K VU, we must create ~6 VMs. Let’s make it 8 for a better size.

Based on experiment data, we are implementing distributed testing using K6.

Service to be targeted for testing

Before we increase RPS by raising VU, we need a service that can handle any RPS we set during load testing. We need some K6 target testing. We decided to build a dedicated service for K6 target load testing. This decision was made to reduce unpredictable variables when load testing happens.

This service builds using stacks:

Google Cloud Load Balancer (GCLB)
Cloud CDN(content delivery networks)
Google Cloud Storage(GCS)
Logs Sink (for monitoring and logging metrics)

This service only has single page HTML with a dummy interface. But we must set up a complex stack to ensure this service can survive with heavy testing.

How Distributed Testing using K6 Works at GovTech

Before we decided to use distributed testing, we tried many approaches to solve our problem, being able to produce more than 200.000 requests per second (RPS).

Design of Distributed Testing using K6

We have two options for solving this problem. Here are:

K6 VM Based

After simulating a single VM that we used for load testing using K6, we have decided to utilize distributed K6 as an improvement to maximize the utilization of K6, adapting to the needs of the numerous users of the GovTech application. By employing distributed K6, this option will create 2 or more VMs already installed with K6 and supporting tools, ready to perform the testing.

Architecture

In this option, there is a coordinator VM with an installed gitlab-runner. Then when Pipeline GitLab triggers from the K6 repository, gitlab pipeline will notify the K6 Worker VM that a new load test is running. Then the coordinator VM will create a new K6 Worker VM with a Public IP created from the image template (Golden Image K6 ). After the load test is complete, the K6 Worker VM will be removed.

K6 Kubernetes based

This option will create a Kubernetes Cluster/Node Pool only or both for pairing. This option is based on https://github.com/grafana/k6-operator/. K6 Engineer highly recommends this option.

Architecture

In this option, the GitLab Runner and K6 Operator will be installed on a specific Kubernetes cluster. When the GitLab pipeline is triggered, it will notify the gitlab runner pod that a new load test is underway. Then deploy a new deployment to command the K6 operator for spawn jobs with custom resources.

Once the load test is completed, the k6 operator will destroy K6 pods and the node will be scaled down.

Using the table above to compare 2 options, we chose Distributed Testing using Multi K6 VM Base.

Support Tools

To be able to support the distributed testing process of K6, several supporting tools are required, including

CI/CD jobs are implemented to facilitate the addition and removal of K6 Worker VMs

This job is designed to create 8 K6-worker VMs, which will be executed from the K6 coordinator VM.

start-vm-group:
  tags:
  - k6-main
  stage: build
  script:
    - gcloud compute instance-groups managed resize k6-worker --size=8
    - gcloud compute instance-groups managed wait-until --stable k6-worker
  rules:
    - if: '$BUILD_DISTRIBUTED_VM == "start"'

This job is responsible for removing the K6-worker VMs. It is executed from the K6 coordinator VM.

stop-vm-group:
  tags:
    - k6-main
  stage: build
  script:
    - gcloud compute instance-groups managed resize k6-worker --size=0
    - gcloud compute instance-groups managed wait-until --stable k6-worker
  rules:
    - if: '$BUILD_DISTRIBUTED_VM == "stop"'

CI/CD jobs are implemented to facilitate the distribution mechanism of K6

These jobs implement parallel technology in Gitlab Runner. Here is a snippet of the code:

k6-parallel:
  extends: .distributed-rule
  stage: test
  tags:
    - k6-main
  before_script:
    - readarray -t vm < <(gcloud compute instances list --project distributed-testing --zones asia-southeast2-a --format "value(name)" | grep k6-worker)
    - ./tools/getenv.sh
    - source k6.vars
  script:
    - gcloud compute ssh ${vm[$CI_NODE_INDEX-1]} --command "k6 run --execution-segment ${SEGMENT} --execution-segment-sequence ${SEQUENCE} --out json=target/reports/report.json --no-thresholds $SCRIPT_PATH"
    - gcloud compute ssh ${vm[$CI_NODE_INDEX-1]} --command 'split -l 100000 target/reports/report.json target/reports/split/report_'
    - gcloud compute ssh ${vm[$CI_NODE_INDEX-1]} --command 'node tools/summary-parser.mjs'
  parallel: 8

The number “8” in parallel is used as a parameter to indicate that the incoming jobs will be divided into eight, and this number will be utilized in the distribution script. It is then called a variable in the before_script step.

Pipeline GitLab Example:

Distributed Script

This script will generate two series of numbers in the following format:

0:1/8 1/8:2/8 ..n

0,1/8,2/8,3/8,4/8,5/8,6/8,7/8,1

The numbers above are generated using the parallel number parameter eight (8) for example, from the previous CICD job. Here is the content of the script, which is called in the before_script step with the filename tools/getenv.sh:

#!/usr/bin/env bash

get_segment(){
  if [ $(( $CI_NODE_INDEX-1 )) -eq 0 ]; then
    a=0
    b="$CI_NODE_INDEX/$CI_NODE_TOTAL"
  else
    a="$(( CI_NODE_INDEX-1))/$CI_NODE_TOTAL"
    if [ $CI_NODE_INDEX -eq $CI_NODE_TOTAL ]; then
      b=1
    else
      b="$CI_NODE_INDEX/$CI_NODE_TOTAL"
    fi
  fi
  echo "'$a:$b'"
}

get_sequence(){
  for i in $(seq 0 $CI_NODE_TOTAL); do
    if [ "$i" = "0" ]; then
      # the first always 0
      echo -n "'$i"
      echo -n ","
    elif [ "$i" = "$CI_NODE_TOTAL" ]; then
      echo "1'"
    else
      # the rest
      echo -n $i/$CI_NODE_TOTAL
      # delimiter
      echo -n ","
    fi
  done
}

echo "export SEGMENT=$(get_segment);" > k6.vars
echo "export SEQUENCE=$(get_sequence);" >> k6.vars

CI/CD Jobs for Collect Test Result in each K6 Worker VM

These jobs run after the test finishes. We actually copy `summary_distributed.json` summary.json, summary.html for each VM worker to the K6 coordinator VM. These files will be set as artifacts for these jobs, so the next jobs can use these files directly to generate final HTML reports.

get-report:
  stage: after-test
  tags:
    - k6-main
  before_script:
    - readarray -t vm < <(gcloud compute instances list --format "value(name)" | grep k6-worker)
  script:
    - >-
      for i in ${!vm[@]}; do
        echo "mesin $i adalah ${vm[$i]}";
        mkdir -p target
        gcloud compute scp ${vm[$i]}:~/target/summary_distributed.json target/summary-${vm[$i]}.json;
        gcloud compute scp ${vm[$i]}:~/target/summary.json target/${vm[$i]}-summary.json;
        gcloud compute scp ${vm[$i]}:~/public/summary.html public/${vm[$i]}-summary.html;
      done
  artifacts:
    when: always
    paths:
      - target/
      - public/
    expire_in: 1 week

Scripts for Processing the Testing Results

The main challenge of the processing test result architecture is to merge all results (JSON) from workers and convert them into 2 kinds of readable HTML’s report and ensure test result data accuracy. As mentioned before, our distributed.js code will generate 2 kinds of json, summary.json and summary_distributed.json. Each of them is used for different purposes.

1. Benc-uk Report

This HTML report consumes summary-${vm[$i]}.json from summary_distributed.json (collected from each machine). So we need to merge the json value from each machine into a summary of summary-${vm[$i]}.json. We merge the reports using this node package. We created this node-package that is specialized for this case. We push to node-package so other people can use it directly.

Every part is calculated and divided by the number of machines and stored in bencuk_merged_report.json. We use the following K6 default function, bencuk-html-generator.js, to generate HTML results.

// Copyright 2022 Kementerian Pendidikan dan Kebudayaan Republik Indonesia

import { htmlReport } from 'https://raw.githubusercontent.com/benc-uk/k6-reporter/main/dist/bundle.js';

// eslint-disable-next-line no-restricted-globals
const initData = JSON.parse(open('../../target/bencuk_merged_report.json'));

export function handleSummary(data) {
  const modifiedData = initData;
  return {
    'public/merged-ben-cuk-summary-report.html': htmlReport(modifiedData),
  };
}

export default function triggeringK6() {
  console.log('Triggering html converter. .');
}

2. Jmeter-Like Report

Jmeter-like report consumes report.json report from each machine (we split the json reports into multiple files due to its size). This summary parser processes this raw report to produce the required JSON structure. Once the summary report, ${vm[$i]}-summary.json, is created, we collect them from each machine and merge them using this node package into the final summary.json. This is the 2nd node-package we created.

To generate a jmeter-like html report, we need to do the last process for summary.json. We use this html generator , to convert JSON into respective HTML reports.

Here is the process generating 2 html outputs in our gitlab:

pages:
  stage: process-qa-report
  script: |
    if [ "$RUNNER_TYPE" == "distributed" ]; then
      npm ci
      node tools/report/report-merger.mjs target/summary-*.json > target/bencuk_merged_report.json
      k6 run tools/report/bencuk-html-generator.js
      node tools/report/report-merger.mjs target/*-summary.json > target/summary.json
      node tools/html-generator.mjs
    else
      node tools/html-generator.mjs
    fi
  artifacts:
    paths:
      - target/
      - public/
    expire_in: 1 week
    when: always

Run Distributed Testing using K6

After the tools were prepared, the team proceeded with distributed testing. The final specifications for this testing are as follows:

K6 Runner VM: 1 VM n1-standard-2 (2 Cores, 7.5GB RAM). This VM serves as the GitLab Runner and sends K6 commands to each K6 Worker.
K6 Worker VMs: 8(eight) VMs n1-standard-16 (16 Cores, 60GB RAM)
Testing target: internal K6 target service
Virtual Users (VU): 240,000
Configuration file: distributed.js with same content in previous section
iterations: 100
maxDuration: ‘90s’

This is the architecture of K6 distributed testing by GovTech Edu

Testing Result and Conclusion

From monitoring system we get metrics about our VM performance:

We run several tests with fixed VU and specification. Here is the average result:

From the Benc-uk template report, we get the total request from distributed K6, around 23.9 million. Reports also give value to the detail of the request, for example, the duration for average, 90th percentile, and 95th percentile (in milliseconds).

From this report, we get checks that all requests are passed. We have 240K VU, with 300K requests per second. Then continue with data received and data sent.

The report above uses the Jmeter-Like template report. Our focus is to see throughput reports. We get the value of how many average requests per second and per minute, also peak requests per second in the 60s interval.

And at the end, the graph above represents the successful load limits achieved by the K6 distribution. From the data, the target RPS of 200K has been exceeded. The obtained RPS value from 240,000 VUs, 100 iterations, and a duration of 90 seconds is 426,000 RPS.

From the above results, the distributed K6 testing has successfully met the research targets. The outcomes of this testing can now be promptly utilized for other purposes.

Lesson learned after we use K6 for distributed testing

The selection of K6 tools has gone through considerations and comparisons with various other options for load testing tools.

Based on the information we obtained from the official K6 website, it is stated that K6 supports conducting load tests according to our planned requirements. By using K6 distributed, we can achieve load testing needs of up to 240K RPS, supported by simulations we conducted by experimenting on our internal API, which we treated as a testing playground. The testing on the overall API endpoint we hit was performed using the GET method, resulting in the following output.

With the final results of the K6 Distributed simulation experiment on the internal API endpoint, generating a report using the built-in K6 report did not sufficiently facilitate readers in analyzing the final report. Therefore, we searched for other supporting sources and utilized specific libraries to generate the report in a different manner. We obtained such a report from the repository https://github.com/benc-uk/k6-reporter, which provides a comparison of HTML K6 reports as follows.

Based on the results of applying the report we used for The National Assessment load testing, generating a K6 HTML report proved to be quite challenging due to the default handling of query parameters in the summary-json report. This resulted in the report taking the form of multiple endpoints, even though they had the same endpoint but different query parameters. The K6 HTML report is large and takes a considerable amount of time to generate, depending on the conditions of the tested application. In our case, with approximately 30K RPS and a relatively high error rate, the K6 HTML report could not handle the reporting load. Thus, using the benc-uk K6 report is necessary for a general overview. However, the information presented in the benc-uk K6 report could be more detailed, making it easier to perform an in-depth analysis using that report.

We approach solving problems with distributed and autoscaling, this includes using vertical and horizontal scaling.

We need to try using maximum vertical to find what we can do using a single vm, including increasing the size of vm, (CPU, RAM) until we find a stable value, then move to horizontal by using how many vm we need to reach our target.

This flow can use as standard how-to research by starting with a simple config and then try with more complect procedures.

Epilogue

In the realm of GovTech, we have embarked on a transformative journey that we believe inspires others. Our relentless efforts have harnessed the potential of load testing tools, particularly the modified K6, to achieve remarkable milestones. As we proudly reflect on our accomplishments, we aspire to inspire individuals and organizations alike to tackle testing challenges with audacious goals, embracing the idea of super big requests per second. By sharing our experiences and lessons learned, we hope to ignite a passion for pushing boundaries and driving innovation in the testing world, ultimately fostering a community that constantly strives for excellence.

About the writers

Hamnah Suhaeri

Engineering Manager in GovTech Edu. She is involved in the field of engineering as a software engineer in test. With over 5 years of experience in the field of quality engineering, she has developed a keen interest in entering the managerial realm of Quality Engineering platforms or Core QA at GovTech Edu.

Damarananta

Test Architect in GovTech Edu. He started working in Quality Engineering since 2016. He has experience working in various multinational companies such as OVO, Bukalapak, and Kompas.

Estu Fardani

Cloud Platform Engineer in GovTech Edu. He is part of cloud operation and security teams. This time he joined the research team to optimize utility distributed testing using K6 on GovTech. Estu started working with the cloud in 2015 until now. He contributes to some open-source projects like BlankOn and openSUSE and often joins as a speaker for open-source community events, local and international.