Top 10 Argo Workflows Examples
April 12, 2022
In this post, we’ll run 10 Argo Workflows that will help you automate experiments, reproduce environments, and manage code.
We all know how Argo Workflows makes it easy to orchestrate parallel jobs on Kubernetes. While it’s most often associated with data processing and ETL projects, it’s useful for a lot more! These 10 workflows will change the way you see this Kubernetes orchestrator.
Let’s dive in!
Argo Workflows Setup
If you don't currently have a workflow running, I suggest you create your first Argo Workflow to understand what we'll discuss in this post. To do so, follow the instructions here to create a local Argo Workflows deployment on your cluster. I also suggest using k3d for your local Kubernetes control plane; this tutorial uses a k3d cluster named argo. Feel free to reproduce the command below to create it in your environment:
Now let's jump into looking at our first example!
1. Enhancing Your Workflow Using Parameters
Argo uses custom resource definitions stored on YAML files to manage its deployments. So no need to learn new specs to manage your infrastructure; you can follow the same pattern used on your Kubernetes and Kustomize scripts, which helps you remain consistent. Below we can see how to use parameters on your workflows, and passing parameters is handy when your configuration uses runtime values. As a result, you will know some components only after creating them, such as access tokens.
In our template, the parameter message will have the default value of Message string default value. However, this value can be overwritten at runtime, as we can see by running the command below:
We can validate the output from the Argo Workflows Logs UI. (You can access the UI by default at https://localhost:2746/ if you quickly follow the port forwarding instructions while creating your cluster.)
2. Pulling Images From Your Secured Repository
One of the features I like when automating an ecosystem is using rotational access keys while managing my services' access. This is useful in cases where your company uses private container repositories to host your container images. Argo Workflows helps you achieve this with the native support of Kubernetes secrets. In our example, we can see that the secret docker-registry-secret will pull the image docker/whalesay:latest.
3. Using Sidecar Containers
One of my favorite things to do is to use sidecars while starting my pods. Kubernetes sidecars are useful helpers that can handle recurring tasks, such as syncing your Git repositories, as shown here. Argo Workflows has this covered with neat support for sidecar containers out of the box.
To deploy it, save the above code as sidecar-nginx.yml and submit it with the command argo submit -n argo sidecar-nginx.yml --watch. And as a result, you'll deploy an NGINX's reverse proxy sidecar instance.
4. Archiving Your Current Workflow State on Persistent Storage
Workflow Archive is a nice feature that Argo Workflows provides so you can have previous workflow states stored on a relational database (Postgres or MySQL for now). However, Argo's archive won't keep detailed execution logs; you'll need to configure an artifact repository, like MinIO, to do so.
To use the archive feature, you'll first need to configure your Argo server's persistent storage option. You'll need more information about how to configure it to do so. Following this link will help you with the authentication piece required for the Argo archive; then base your configuration on this file. You'll need to have them appropriately configured with your Argo server to benefit from this feature. Once it's configured, you can store your workflows with the spec.archiveLocation.archiveLogs as demonstrated below.
5. Passing a Git Repository as an Input Artifact
Another cool feature that Argo Workflows provides out of the box is the possibility to sync your Git repository without the need for extra sidecars or init containers. The code below connects to the https://github.com/argoproj/argo-workflows.git repository. You can choose from HTTP or SSH pull requests for the authentication piece. In the first template, git-clone, you'll need to use the combination of usernameSecret and passwordSecret Kubernetes secrets to access a URL in its HTTP format. You can see an example of an HTTP Git configuration in the code below.
Argo Workflows also supports SSH connectivity (e.g., firstname.lastname@example.org:argoproj/argo-workflows.git). However, it needs the URL format following the SSH connectivity and the sshPrivateKeySecret Kubernetes secret instead of the usernameSecret and passwordSecret ones.
6. Creating Directed Acyclic Graph Workflows
I feel the directed acyclic graph (DAG) is now getting the attention it deserves on the analytics domains because of how it impressively handles data processing workload steps on Apache Spark and its use as a common data orchestration pattern with Apache Airflow. With Argo Workflows, you'll have a Kubernetes-friendly interface instead of the need to configure a Kubernetes executor for Airflow which is less stable.
I suggest checking this link to learn more about how a DAG works. Below, you can see how Argo Workflows instantiates it.
Each task will be passed to the Argo server using the target parameter name, with the target names separated by spaces. Argo Workflows will execute only the ones you specify; however, it'll run each dependency until it reaches the informed targets. In plain English, say we save our file as dag-targets.yml and execute using the following command:
argo submit -n argo dag-targets.yml -p target="B E" --watch
It will skip only target D, as demonstrated below:
7. Execute Python Scripts
Containers already make it easy to manage runtime environments. So, it’s easy to build a Python container with the libraries and version you need for your Python-based workflow steps.
With Argo Workflows you can call a Python script that’s already installed on the container by name, or pass in code via a source field in workflow description. You can specify any valid code in the source block.
Here’s an example:
8. Implementing a Retry Strategy
Sometimes, multiple targets can implement some retry logic, and Argo Workflows configures your retry strategy on the Workflow level.
In our example, the target retry-container will try to restart three times in the cases that it finishes with an Error status on Kubernetes.
9. Adding Conditional Workflows
Conditional workflows are also among my favorites and are so simple to implement. You can deploy your architecture based on the return statuses of previous steps, which is very handy when orchestrating a set of containers. Argo Workflows grants you the possibility of executing targets based on a boolean condition. Under the hood, it uses govaluate to allow you to use Golang's expr statements.
So you'll be able to orchestrate your conditions in the same way you handle your Golang helpers on your Kubernetes ecosystem—another nice extra benefit of using Kubernetes CRDs.
Saving the upper code as cond.yml and executing with argo submit -n argo cond.yml --watch will give the following output:
10. Managing Kubernetes Resources From Your Workflow
Argo Workflows can create Kubernetes components; this is very handy when you need to develop temporary kubelet actions in a declarative way. This feature follows the same principle of the inline scripts to deploy Kubernetes components responsible for applying patches to your environment. However, Argo Workflows handles this code's Kubernetes CRD YAML inline files.
This feature covers you as you directly run all kubectl actions, which allows you to create/update/delete any Kubernetes resource on your cluster using inline Kubernetes API groups definitions.
The advances we’ve seen in systems management and development give us many reasons to be optimistic. For instance, infrastructure as code allows you to have the same infrastructure on your scalable servers and your local workstation. Tools like Argo Workflows help us create scalable production-ready infrastructure on our local workstation, and that by itself is something to cherish.
With constant infrastructure requirement changes such as dynamic DNS, you need to adapt your deployments to a more modular approach. These workflows are the must-haves for any DevOps admin. But this list is only the beginning. I would highly suggest implementing these scripts in your development and data pipelines.
Reach out to Pipekit if you want to have them orchestrated seamlessly without the need for in-house capacity. Give your users the peace of mind of experimenting and developing new features for your application with a better cost ratio for your ROI.
Special thanks to Eric Goebelbecker and Caelan Urquhart for help reviewing this post.
Until next time!
Subscribe for Pipekit updates.
Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.