Everything You Need to Know About Argo Workflows

Your business thrives on data and your ability to process it quickly, efficiently, and effectively. And like everyone else, you need to process larger and larger volumes of data with each passing week, if not each day. Processing data sets in small batches—or worse, by hand—doesn't work anymore. You need the ability to process large quantities of data in parallel. You need tools like Kubernetes and Argo Workflows. 

In this post, we'll look at Argo Workflows and how it can help you. If you want to see Argo Workflows in action, book your personalized demo with us.

What Is Argo Workflows?

Argo Workflows is a workflow engine for Kubernetes (K8s) clusters. It runs as a custom resource definition, so it's container native, and you can run it on EKS, GKE, or any other K8s implementation. Argo Workflows is a CNCF incubating open-source project maintained by Intuit. 

With Argo, each step in your workflow runs in a container. So, depending on how you describe the steps and their dependencies, it's easy to run them sequentially or in parallel. You define your workflow and those dependencies using Argo's Workflow spec, a YAML format that's easy to follow.

argo workflow

With Argo Workflows, you can run and scale pipelines for nearly any purpose. For example, many companies use it for machine learning, data processing, continuous integration/continuous deployment (CI/CD), and infrastructure automation. 

Let's look at a few examples and how easy they are to create and run.

Argo Workflows Examples

Getting Set up

For many, the best way to learn how things work is to roll up their sleeves and get their hands dirty. All of the example workflows we'll cover here work, so you can follow along. 

What you'll need is a K8s cluster. If you're not familiar with setting up K8s, Docker Desktop comes with a convenient Kubernetes cluster built-in.  If you have another Kubernetes cluster already available, feel free to use that too.

Once you have a pod up and running, follow the Argo Quick Start guide, and you're ready to go. 

Let's run some workflows! 

Hello, World!

In accordance with prevailing custom, let's say hello to the world. 

Here's the first example workflow from the Argo Core Concepts guide. This is a simple one-step workflow: 

Let's run this before examining it line by line. 

First, save the YAML to a file named hello.yaml. Then, use the Argo CLI to pass it to your Kubernetes pod. Assuming you named your pod argo, here's the command:

(Screen refreshes several times)

Your terminal will refresh a few times before the workflow completes. Where's the message? We need to check the logs. 

argo logs -n <pod name> @latest retrieves the latest logs from your pod.

The Docker whale says hello! 

What happened in this workflow? Let's break it down. 

The first few lines identify the kind of document this file contains. An Argo workflow is a special type of K8s resource, so we need a document header to identify it. 

The only user-serviceable part here is the workflow name defined by generateName: hello-world-

The next block, the spec, defines the workflow. 

The first field is the entrypoint. This is the first step in the workflow. In this example, it's the one and only step. 

So, logically, the definition of whalesay follows. 

Templates are the basic building block of Argo Workflows. In this case, we have one: 

All templates use a container. This one uses docker/whalesay. When K8s starts the container, it executes the cowsay command and passes in the listed args. So we get our "Hello, world!" message in the Docker logs. 

That's a simple one-step job. What does running more than one job look like? 

Managing Multiple Steps With a DAG

Running a single step was a great intro, but the real power in Argo Workflows comes from managing multiple steps with multiple dependencies.

argo workflow

This workflow uses a directed acyclic graph (DAG) to establish dependencies between steps. While the name can be a little intimidating, DAGs are straightforward tools for establishing dependencies between steps in a workflow. 

Let's run this workflow. 

Here are the output and the logs on my system: 

argo workflow

The output from the running workflow shows the four steps, and the logs reflect that Argo executed each step. The logs show that they ran in numerical order this time. As we'll see below, this won't always be the case. 

This workflow has two templates. The first is an alpine container that executes the echo shell command with a string passed in as an argument. So, each time this template is called, it will echo the text to the standard output, which will end up in the Docker logs. 

The next template is the DAG. 

It defines four tasks. Each task uses the echo template to send its name to the Docker log. Below, you can see one template can refer to another via the dependencies field. You can also see where templates get their names. They're robust tools for implementing DRY in your workflows. If you have some code that you need to use more than once, put it in a template. 

The working part of the DAG is in tasks Second, Third, and Fourth. Each has a dependencies field that tells Argo which tasks need to complete before it can run. Let's look at this graph as, well, a graph. We can do this in the Argo UI.

First, tell kubectl to forward the TCP port for the UI to the host operating system.

Then, point your browser at port 2746 on the Kubernetes host. For me, that’s http://genosha:2746. You may have to tell your browser to ignore that the site isn’t secure, since it’s not running HTTPS.

Click the workflows icon.

argo workflow

Find the dag-hello-XXXX workflow, click on it, and then click the graph button.

argo workflow

You’ll see a graphic representation of your workflow.

The lines represent how Argo executes the workflow. First must be completed successfully before Second and Third can run. Only after that will Fourth commence. 

Adding a Template

If you'll pardon the pun, let's take this template one step further. 

Let's add the whalesay template from the "Hello, world!" example and call it from tasks Second and Fourth

The output is what we expect. Although, it's worth noting that this time Third finished executing before Second. Since there are no dependencies between them, there's no guarantee that Second will run first. The order in the workflow definition is not important—only the dependencies count. 

argo workflow

Argo Workflows for Your Pipelines

In this post, we covered Argo Workflows basics. You saw how to create a basic workflow with a single step. Then we covered how to use DAGs to define more complicated workflows with multiple steps that depend on being executed in the correct order. While we walked through the examples, you learned how Argo templates are defined and reused to make up workflows. 

Argo Workflows makes it easy to build complex workflows for processing large amounts of data quickly and efficiently. Put them to work on your data today! 

Subscribe for Pipekit updates.

Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Follow Pipekit

Subscribe for the latest Argo content and news from Pipekit

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

More

Guides

Canva Decides: Argo Workflows vs. Airflow for Kubernetes-native Workflows

4 min read
Guides

Two Ways to Debug an Argo Workflow

5 min read
Guides

MLflow vs. Argo Workflows

5 min read