Prefect vs. Argo Workflows
December 6, 2022
6 min read
Prefect or Argo Workflows? This side-by-side comparison will help you determine which workflow platform is right for your data pipelines.
Data Science has moved out of research and into operations. Today, companies are building data pipelines and running them alongside their continuous integration/continuous delivery (CI/CD) systems. Most of these data pipelines start with a workflow orchestrator. Two of the more popular choices are Prefect and Argo Workflows. Which one of these workflow orchestrators is the best fit for you?
In the post, we'll look at Prefect vs. Argo Workflows. We'll compare their features, how easy they are to get started with, and help you decide which is best for your data pipelines.
What is a Workflow Orchestrator?
You use an orchestrator to start, stop, and organize tasks. It has tools for defining a set of related steps, establishing relationships between them, and scheduling their execution.
You can use an orchestrator for many different applications, and Prefect and Argo Workflows are two examples that aren't limited to data pipelines. For example, Argo Workflows is flexible enough for organizations to use it for CI/CD. Prefect's API makes it possible to orchestrate any Python code.
Let's look at Prefect vs. Argo Workflows. Which workflow orchestrator is best for your data pipelines? How do these popular tools approach the problem of coordinating tasks?
Defining Tasks Dependencies with DAGs
Running tasks in a specific order is one thing. Running them based on dependencies — how they relate to each other — is more complicated, but it's what you need to organize a data pipeline properly. Argo Workflows and Prefect model these dependencies as Directed Acyclic Graphs (DAGs). DAGs are:
- Directed because tasks flow in one, and only one, direction
- Acyclic because they don't cycle, there are no loops
- Graphs because they represent the relationships between tasks
A DAG illustrates tasks and execution flow with vertices and lines.
The vertices are the circles numbered one through four, and the arrows represent the workflow. In this illustration, the workflow must execute task #1 first. Then it can execute tasks #2 and #3 in parallel. When both of those tasks are complete, the system can run task #4.
Prefect and Argo Airflows both support DAGs but in slightly different ways. We'll look at this further below.
Prefect vs. Argo Workflows
Argo Workflows Overview
You can run Argo Workflows on any Kubernetes (K8s) system, including Docker desktop GCP and AWS. It's remarkably easy to install a development system: Download their manifest onto a Kubernetes cluster, and it's ready to run workflows without any further modifications. A production system requires more configuration, of course.
Argo Workflow steps are containers, which is to say each step exists as its own container. You can either create your own self-contained containers or specify a container and arguments to pass to it. Argo Workflows cleanly integrates with Kubernetes and runs as a custom resource definition. Because each step is a container, your workflows aren't limited by the number of workers in your K8s cluster. Also, any work that you've already defined as a container plugs into the workflow. You'll see how below in the example workflows section.
You define your workflows via two methods. Argo's native interface is a human-readable YAML DSL for defining workflows. You can access all of Argo's features via YAML, including a powerful templating feature for defining repetitive tasks. Or, you can use the Couler Python API to integrate Argo Workflows into your codebase.
Argo Workflows has a UI for starting and stopping workflows, checking status, and viewing logs. It installs as a separate pod in your K8s cluster.
Instead of integrating with a runtime system, Prefect is a Python-based workflow executor. Prefect Core Engine is an open-source Python library, and Prefect Orchestration is an open-server server or a commercial cloud tool with a free tier. To run Prefect, you need to install their Python library and install a local orchestration server or sign up for the cloud product. The server works with agents to execute tasks, and there are agents for Docker, Kubernetes, Amazon ECS, and local processes.
Prefect's tasks are functions. So, anything you can model as a function in Python code Prefect can add to a workflow.
Prefect Core Engine only has command-line management tools. Prefect Cloud and Prefect Server have a UI for managing workflows.
Basic Workflow Definitions
Let's compare workflows and see how definitions differ between Prefect vs. Argo Workflows.
First, here's Argo Workflows' Hello, World example. It's part of their introductory documentation here.
The first four lines are metadata that identifies the file as a workflow and give it a name.
The spec tag starts the workflow specification. It defines a single template that serves as the workflow's only task. It uses Docker's whalesay image to start a container that says "hello world" to the Kubernetes log by running the cowsay command with a string argument.
This YAML file demonstrates how easy it is to load a Docker container and pass arguments to it as a workflow step. You can load any image available to your K8s cluster by name, and you can pass in the name of any command and optional arguments.
Next, we can run the same example using Couler instead of Argo's YAML DSL.
Couler's run_container makes it easy to specify the image, the command, and the arguments similar to how we did in the YAML example.
Prefect also uses a "Hello, World" example in its introductory documentation.
This is an example from Prefect's functional API. It defines a simple function and passes to a Flow, which acts as a container for tasks. Flow is where the orchestration work is done and needs to be coordinated with your infrastructure.
Now, let's look at a DAG using Argo's YAML and Prefect.
This workflow generates two numbers in parallel and then sums them.
This file defines two templates. Both use the python:alpine3.6 image to run a short python script. The first generates a random number — the second sums two arguments.
The workflow follows the dag field. It calls the random template twice. Then the third task has a dependencies field that specifies the first two tasks by name. It captures the output of those tasks and sums them.
Declaring this workflow as a dag and specifying the dependencies fields does all of the work. Argo will run these tasks in the correct order and schedule the random tasks in parallel.
The Couler code would look similar since the individual steps need to be Python scripts in containers.
Here's a similar workflow in Prefect.
It's in native Python, so the random_num function and sum_numbers functions run in place.
But, we have to define the code as running in a flow capable of parallel execution. It has to be run with an Executor that can manage parallel tasks. So while the code is more concise, it requires additional system setup to run correctly.
Deployment and Ease of Use
Argo Workflows is container-native and relies heavily on K8s for its managing workflows. You install it by selecting the manifest that suits your situation, editing it for your specific needs, and applying the manifest to your K8s cluster.
For advanced scheduling and parallel tasks in Prefect, you can use a Dask cluster, a Prefect Server, or both, and both require additional configuration and installation. For Prefect Server, you can use docker-compose if you're running it on a single node. Prefect Server needs a PostgreSQL database for persistence.
For running Prefect jobs on a Kubernetes cluster, you need the Kubernetes Agent and have to add additional flow configuration to take advantage of K8s features in the workflow.
Each step in an Argo workflow is a container. Therefore each workflow, including scheduling parallel steps, is managed by K8s for you. Prefect is more complicated since you need to register a separate flow definition in your code.
Prefect vs. Argo Workflows: You Decide
In this post, we compared Argo Workflows and Prefect side-by-side. We started with the basics of workflow orchestration and what it takes to coordinate data pipelines. Then we compared Argo Workflows' container-native characteristics to Prefect's Python API and various orchestration tools. Parallel execution works in Argo out of the box, while Prefect requires additional configuration even with a Kubernetes cluster.
While Prefect and Argo Workflows share many essential features, they don't take the same approach to running your pipelines. Prefect is Python-native, but it requires more infrastructure and configuration work. Argo Workflows lets you hit the ground running — especially if you're looking for a cloud-native option.
Which one is better for your data pipelines? Try one and see!
Subscribe for Pipekit updates.
Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.