Subscribe for Pipekit updates.

Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Follow Pipekit

Upgrade to K8s 1.27+ to Take Advantage of Argo Workflows Performance Improvements

Upgrade to Kubernetes 1.27 or later to take advantage of key performance improvements

When using Argo Workflows at scale, you may encounter issues with performance. A large number of workflows and workflow tasks can cause the Kubernetes API requests to be rate limited. Whilst there are a number of configuration changes you can make to Argo Workflows to improve performance, you are still ultimately at the mercy of the Kubernetes API server.

In this post, I’ll briefly highlight some of the key performance improvements in Kubernetes 1.27 that allowed us to run Argo Workflows at scale, as well as walk you through the outcomes of the performance tests we ran.

{% cta-1 %}

Testing Argo Workflows at Scale

We ran a test to evaluate the performance of Argo Workflows on Kubernetes 1.27. We created a simple workflow that created 50 pods in parallel, each of which slept for a random amount of time between 120 and 150 seconds. We then invoked this workflow 150 times in parallel.

This will result in 7,500 pods being requested as quickly as possible. We strongly recommend that you do not run these scripts in a cluster used for any production workloads.

Our testing was admittedly unscientific but is indicative of what you should expect in your own cluster.

We ran the bash script against an EKS 1.26 cluster using Argo Workflows v3.4.8.

Argo Workflows had no performance tuning whatsoever. We used the Cluster Autoscaler to provision additional nodes as they were required.

When we ran this test on an EKS 1.26 cluster, we observed that after approximately 2,500 pods were scheduled, the Kubernetes API became unresponsive to basic queries. The Workflow Controller logs indicated that it was unable to query the Kubernetes API, and the Kubernetes API server logs indicated that it was rate limiting requests. Ultimately, we temporarily lost administrative access to the cluster while the workflows eventually ran and cleaned themselves up.

When we repeated the test on an EKS 1.27 cluster, all 7,500 pods were scheduled without any issues. The Kubernetes API server logs indicated that it was not rate limiting requests and we could continue performing other administrative tasks on the cluster.

Conclusion

The performance improvements introduced in Kubernetes 1.27 can help you run Argo Workflows at scale without impacting the performance of your cluster. If you are using Argo Workflows to run complex workflows, we recommend upgrading to Kubernetes 1.27 or later.

Are your data pipelines scalable and reliable?

Operating data pipelines at scale doesn't have to be unreliable and costly. Put an end to the stress of unreliable data pipelines and data engineering backlogs and turn data into revenue-boosting insights. Pipekit can help.

Pipekit is a self-serve data platform that configures Argo Workflows on your infrastructure to offer simplicity and efficiency when it comes to data workflows. Achieve higher scalability for your data pipelines while significantly reducing your cloud spend. Our platform is designed to align your data infrastructure seamlessly with your full-stack infrastructure, all on Kubernetes.

Try out Pipekit for free today - pipekit.io/signup

Try Pipekit free

Join Pipekit for a free 30-day trial.
No credit card required.

Start free trial
  • blue checkmark vector

    Boost pipeline speed & reliability

  • blue checkmark vector

    Streamline engineering resources

  • blue checkmark vector

    Accelerate data-to-value

  • blue checkmark vector

    Standardize workflow and app deployments

More

Guides

Unlock Workflow Parallelism by Configuring Volumes for Argo Workflows

6 min read
Guides

How to Fine-Tune an LLM with Argo Workflows and Hera

8 min read
Guides

Why it’s Time to Migrate Your CI/CD from Jenkins to Argo

6 min read