Subscribe for Pipekit updates.

Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Follow Pipekit

Everything You Need to Know About Argo Workflows

Your business thrives on data and your ability to process it quickly, efficiently, and effectively. And like everyone else, you need to process larger and larger volumes of data with each passing week, if not each day. Processing data sets in small batches—or worse, by hand—doesn't work anymore. You need the ability to process large quantities of data in parallel. You need tools like Kubernetes and Argo Workflows. 

In this post, we'll look at Argo Workflows and how it can help you. If you want to see Argo Workflows in action, book your personalized demo with us.

What Is Argo Workflows?

Argo Workflows is a workflow engine for Kubernetes (K8s) clusters. It runs as a custom resource definition, so it's container native, and you can run it on EKS, GKE, or any other K8s implementation. Argo Workflows is a CNCF incubating open-source project maintained by Intuit. 

With Argo, each step in your workflow runs in a container. So, depending on how you describe the steps and their dependencies, it's easy to run them sequentially or in parallel. You define your workflow and those dependencies using Argo's Workflow spec, a YAML format that's easy to follow.

argo workflow

With Argo Workflows, you can run and scale pipelines for nearly any purpose. For example, many companies use it for machine learning, data processing, continuous integration/continuous deployment (CI/CD), and infrastructure automation. 

Let's look at a few examples and how easy they are to create and run.

{% cta-1 %}

Argo Workflows Examples

Getting Set up

For many, the best way to learn how things work is to roll up their sleeves and get their hands dirty. All of the example workflows we'll cover here work, so you can follow along. 

What you'll need is a K8s cluster. If you're not familiar with setting up K8s, Docker Desktop comes with a convenient Kubernetes cluster built-in.  If you have another Kubernetes cluster already available, feel free to use that too.

Once you have a pod up and running, follow the Argo Quick Start guide, and you're ready to go. 

Let's run some workflows! 

Hello, World!

In accordance with prevailing custom, let's say hello to the world. 

Here's the first example workflow from the Argo Core Concepts guide. This is a simple one-step workflow: 

Let's run this before examining it line by line. 

First, save the YAML to a file named {% c-line %}hello.yaml{% c-line-end %}. Then, use the Argo CLI to pass it to your Kubernetes pod. Assuming you named your pod {% c-line %}argo{% c-line-end %}, here's the command:

(Screen refreshes several times)

Your terminal will refresh a few times before the workflow completes. Where's the message? We need to check the logs. 

{% c-line %}argo logs -n <pod name> @latest{% c-line-end %} retrieves the latest logs from your pod.

The Docker whale says hello! 

What happened in this workflow? Let's break it down. 

The first few lines identify the kind of document this file contains. An Argo workflow is a special type of K8s resource, so we need a document header to identify it. 

The only user-serviceable part here is the workflow name defined by {% c-line %}generateName: hello-world-{% c-line-end %}. 

The next block, the spec, defines the workflow. 

The first field is the {% c-line %}entrypoint{% c-line-end %}. This is the first step in the workflow. In this example, it's the one and only step. 

So, logically, the definition of {% c-line %}whalesay{% c-line-end %} follows. 

Templates are the basic building block of Argo Workflows. In this case, we have one: 

All templates use a container. This one uses docker/whalesay. When K8s starts the container, it executes the {% c-line %}cowsay{% c-line-end %} command and passes in the listed {% c-line %}args{% c-line-end %} So we get our "Hello, world!" message in the Docker logs. 

That's a simple one-step job. What does running more than one job look like? 

Managing Multiple Steps With a DAG

Running a single step was a great intro, but the real power in Argo Workflows comes from managing multiple steps with multiple dependencies.

argo workflow

This workflow uses a directed acyclic graph (DAG) to establish dependencies between steps. While the name can be a little intimidating, DAGs are straightforward tools for establishing dependencies between steps in a workflow. 

Let's run this workflow. 

Here are the output and the logs on my system: 

argo workflow

The output from the running workflow shows the four steps, and the logs reflect that Argo executed each step. The logs show that they ran in numerical order this time. As we'll see below, this won't always be the case. 

This workflow has two templates. The first is an {% c-line %}alpine{% c-line-end %} container that executes the {% c-line %}echo{% c-line-end %} shell command with a string passed in as an argument. So, each time this template is called, it will echo the text to the standard output, which will end up in the Docker logs. 

The next template is the DAG. 

It defines four {% c-line %}tasks{% c-line-end %}. Each task uses the {% c-line %}echo{% c-line-end %} template to send its name to the Docker log. Below, you can see one template can refer to another via the {% c-line %}dependencies{% c-line-end %} field. You can also see where templates get their names. They're robust tools for implementing DRY in your workflows. If you have some code that you need to use more than once, put it in a template. 

The working part of the DAG is in {% c-line %}tasks Second{% c-line-end %}, {% c-line %}Third{% c-line-end %}, and {% c-line %}Fourth{% c-line-end %}. Each has a {% c-line %}dependencies{% c-line-end %} field that tells Argo which {% c-line %}tasks{% c-line-end %} need to complete before it can run. Let's look at this graph as, well, a graph. We can do this in the Argo UI.

First, tell {% c-line %}kubectl{% c-line-end %} to forward the TCP port for the UI to the host operating system.

Then, point your browser at port 2746 on the Kubernetes host. For me, that’s http://genosha:2746. You may have to tell your browser to ignore that the site isn’t secure, since it’s not running HTTPS.

Click the workflows icon.

argo workflow

Find the {% c-line %}dag-hello-XXXX{% c-line-end %} workflow, click on it, and then click the graph button.

argo workflow

You’ll see a graphic representation of your workflow.

The lines represent how Argo executes the workflow. {% c-line %}First{% c-line-end %} must be completed successfully before {% c-line %}Second{% c-line-end %} and {% c-line %}Third{% c-line-end %} can run. Only after that will {% c-line %}Fourth{% c-line-end %} commence. 

{% related-articles %}

Adding a Template

If you'll pardon the pun, let's take this template one step further. 

Let's add the {% c-line %}whalesay{% c-line-end %} template from the "Hello, world!" example and call it from {% c-line %}tasks Second{% c-line-end %} and {% c-line %}Fourth{% c-line-end %}. 

The output is what we expect. Although, it's worth noting that this time {% c-line %}Third{% c-line-end %} finished executing before {% c-line %}Second{% c-line-end %}. Since there are no dependencies between them, there's no guarantee that Second will run first. The order in the workflow definition is not important—only the dependencies count. 

argo workflow



Argo Workflows for Your Pipelines

In this post, we covered Argo Workflows basics. You saw how to create a basic workflow with a single step. Then we covered how to use DAGs to define more complicated workflows with multiple steps that depend on being executed in the correct order. While we walked through the examples, you learned how Argo templates are defined and reused to make up workflows. 

Argo Workflows makes it easy to build complex workflows for processing large amounts of data quickly and efficiently. Put them to work on your data today! 

Are your data pipelines scalable and reliable?

Operating data pipelines at scale doesn't have to be unreliable and costly. Put an end to the stress of unreliable data pipelines and data engineering backlogs and turn data into revenue-boosting insights. Pipekit can help.

Pipekit is a self-serve data platform that configures Argo Workflows on your infrastructure to offer simplicity and efficiency when it comes to data workflows. Achieve higher scalability for your data pipelines while significantly reducing your cloud spend. Our platform is designed to align your data infrastructure seamlessly with your full-stack infrastructure, all on Kubernetes.

Try out Pipekit for free today - pipekit.io/signup

Try Pipekit free

Join Pipekit for a free 30-day trial.
No credit card required.

Start free trial
  • blue checkmark vector

    Boost pipeline speed & reliability

  • blue checkmark vector

    Streamline engineering resources

  • blue checkmark vector

    Accelerate data-to-value

  • blue checkmark vector

    Standardize workflow and app deployments

More

Guides

Unlock Workflow Parallelism by Configuring Volumes for Argo Workflows

6 min read
Guides

How to Fine-Tune an LLM with Argo Workflows and Hera

8 min read
Guides

Why it’s Time to Migrate Your CI/CD from Jenkins to Argo

6 min read