Subscribe for Pipekit updates.

Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Follow Pipekit

Prefect vs. Argo Workflows

Data Science has moved out of research and into operations. Today, companies are building data pipelines and running them alongside their continuous integration/continuous delivery (CI/CD) systems. Most of these data pipelines start with a workflow orchestrator. Two of the more popular choices are Prefect and Argo Workflows. Which one of these workflow orchestrators is the best fit for you?

In the post, we'll look at Prefect vs. Argo Workflows. We'll compare their features, how easy they are to get started with, and help you decide which is best for your data pipelines.

What is a Workflow Orchestrator?

You use an orchestrator to start, stop, and organize tasks. It has tools for defining a set of related steps, establishing relationships between them, and scheduling their execution.

You can use an orchestrator for many different applications, and Prefect and Argo Workflows are two examples that aren't limited to data pipelines. For example, Argo Workflows is flexible enough for organizations to use it for CI/CD. Prefect's API makes it possible to orchestrate any Python code.

Let's look at Prefect vs. Argo Workflows. Which workflow orchestrator is best for your data pipelines? How do these popular tools approach the problem of coordinating tasks?

Defining Tasks Dependencies with DAGs

Running tasks in a specific order is one thing. Running them based on dependencies — how they relate to each other — is more complicated, but it's what you need to organize a data pipeline properly. Argo Workflows and Prefect model these dependencies as Directed Acyclic Graphs (DAGs). DAGs are:

  • {% c-line %}Directed{% c-line-end %} because tasks flow in one, and only one, direction
  • {% c-line %}Acyclic{% c-line-end %} because they don't cycle, there are no loops
  • {% c-line %}Graphs{% c-line-end %} because they represent the relationships between tasks

A DAG illustrates tasks and execution flow with vertices and lines.

The vertices are the circles numbered one through four, and the arrows represent the workflow. In this illustration, the workflow must execute task #1 first. Then it can execute tasks #2 and #3 in parallel. When both of those tasks are complete, the system can run task #4.

Prefect and Argo Airflows both support DAGs but in slightly different ways. We'll look at this further below.

{% cta-1 %}

Prefect vs. Argo Workflows

Argo Workflows Overview

You can run Argo Workflows on any Kubernetes (K8s) system, including Docker desktop GCP and AWS. It's remarkably easy to install a development system: Download their manifest onto a Kubernetes cluster, and it's ready to run workflows without any further modifications. A production system requires more configuration, of course.

Argo Workflow steps are containers, which is to say each step exists as its own container. You can either create your own self-contained containers or specify a container and arguments to pass to it. Argo Workflows cleanly integrates with Kubernetes and runs as a custom resource definition. Because each step is a container, your workflows aren't limited by the number of workers in your K8s cluster. 

Also, any work that you've already defined as a container plugs into the workflow. You'll see how below in the example workflows section.

You define your workflows via two methods. Argo's native interface is a human-readable YAML DSL for defining workflows. You can access all of Argo's features via YAML, including a powerful templating feature for defining repetitive tasks. Or, you can use the Hera Python API to integrate Argo Workflows into your codebase.

Argo Workflows has a UI for starting and stopping workflows, checking status, and viewing logs. It installs as a separate pod in your K8s cluster.

Prefect Overview

Instead of integrating with a runtime system, Prefect is a Python-based workflow executor. Prefect Core Engine is an open-source Python library, and Prefect Orchestration is an open-server server or a commercial cloud tool with a free tier. To run Prefect, you need to install their Python library and install a local orchestration server or sign up for the cloud product. 

The server works with agents to execute tasks, and there are agents for Docker, Kubernetes, Amazon ECS, and local processes.

Prefect's tasks are functions. So, anything you can model as a function in Python code Prefect can add to a workflow.

Prefect Core Engine only has command-line management tools. Prefect Cloud and Prefect Server have a UI for managing workflows.

Basic Workflow Definitions

Let's compare workflows and see how definitions differ between Prefect vs. Argo Workflows.

First, here's Argo Workflows' {% c-line %}Hello, World{% c-line-end %} example. It's part of their introductory documentation here.

The first four lines are metadata that identifies the file as a workflow and give it a name.

The spec tag starts the workflow specification. It defines a single {% c-line %}template{% c-line-end %} that serves as the workflow's only task. It uses Docker's whalesay image to start a container that says "hello world" to the Kubernetes log by running the {% c-line %}cowsay{% c-line-end %} command with a string argument.

This YAML file demonstrates how easy it is to load a Docker container and pass arguments to it as a workflow step. You can load any image available to your K8s cluster by name, and you can pass in the name of any command and optional arguments.

Next, we can run the same example using Hera instead of Argo's YAML DSL.

Hera makes it easy to specify the image, the command, and the arguments in Python instead of YAML. Hera also has full parity with Argo Workflows features as of version 5.

Prefect also uses a "Hello, World" example in its introductory documentation.

This is an example from Prefect's functional API. It defines a simple function and passes to a Flow, which acts as a container for tasks. Flow is where the orchestration work is done and needs to be coordinated with your infrastructure.

DAG Workflows

Now, let's look at a DAG using Argo's YAML and Prefect.

This workflow generates two numbers in parallel and then sums them.

This file defines two templates. Both use the {% c-line %}python:alpine3.6{% c-line-end %} image to run a short python script. The first generates a random number — the second sums two arguments.

The workflow follows the {% c-line %}dag{% c-line-end %} field. It calls the {% c-line %}random{% c-line-end %} template twice. Then the third task has a {% c-line %}dependencies{% c-line-end %} field that specifies the first two tasks by name. It captures the output of those tasks and sums them.

Declaring this workflow as a {% c-line %}dag{% c-line-end %} and specifying the {% c-line %}dependencies{% c-line-end %} fields does all of the work. Argo will run these tasks in the correct order and schedule the random tasks in parallel.

The Hera code would look similar since the individual steps need to be Python scripts in containers.

Here's a similar workflow in Prefect.

It's in native Python, so the {% c-line %}random_num{% c-line-end %} function and {% c-line %}sum_numbers{% c-line-end %} functions run in place.

But, we have to define the code as running in a flow capable of parallel execution. It has to be run with an Executor that can manage parallel tasks. So while the code is more concise, it requires additional system setup to run correctly.

Deployment and Ease of Use

Argo Workflows is container-native and relies heavily on K8s for its managing workflows. You install it by selecting the manifest that suits your situation, editing it for your specific needs, and applying the manifest to your K8s cluster.

For advanced scheduling and parallel tasks in Prefect, you can use a Dask cluster, a Prefect Server, or both, and both require additional configuration and installation. For Prefect Server, you can use docker-compose if you're running it on a single node. Prefect Server needs a PostgreSQL database for persistence.

For running Prefect jobs on a Kubernetes cluster, you need the Kubernetes Agent and have to add additional flow configuration to take advantage of K8s features in the workflow.

Each step in an Argo workflow is a container. Therefore each workflow, including scheduling parallel steps, is managed by K8s for you. Prefect is more complicated since you need to register a separate flow definition in your code.

{% related-articles %}

Prefect vs. Argo Workflows: You Decide

In this post, we compared Argo Workflows and Prefect side-by-side. We started with the basics of workflow orchestration and what it takes to coordinate data pipelines. Then we compared Argo Workflows' container-native characteristics to Prefect's Python API and various orchestration tools. Parallel execution works in Argo out of the box, while Prefect requires additional configuration even with a Kubernetes cluster.

Text reading: Argo lets you hit the ground running — especially if you're looking for a cloud-native option.

In this post, we compared Argo Workflows and Prefect side-by-side. We started with the basics of workflow orchestration and what it takes to coordinate data pipelines. Then we compared Argo Workflows' container-native characteristics to Prefect's Python API and various orchestration tools. 

Parallel execution works in Argo out of the box, while Prefect requires additional configuration even with a Kubernetes cluster.

While Prefect and Argo Workflows share many essential features, they don't take the same approach to running your pipelines. Prefect is Python-native, but it requires more infrastructure and configuration work. Argo Workflows lets you hit the ground running — especially if you're looking for a cloud-native option.

Which one is better for your data pipelines? Try one and see!

Are your data pipelines scalable and reliable?

Operating data pipelines at scale doesn't have to be unreliable and costly. Put an end to the stress of unreliable data pipelines and data engineering backlogs and turn data into revenue-boosting insights. Pipekit can help.

Pipekit is a self-serve data platform that configures Argo Workflows on your infrastructure to offer simplicity and efficiency when it comes to data workflows. Achieve higher scalability for your data pipelines while significantly reducing your cloud spend. Our platform is designed to align your data infrastructure seamlessly with your full-stack infrastructure, all on Kubernetes.

Try out Pipekit for free today - pipekit.io/signup

Try Pipekit free

Join Pipekit for a free 30-day trial.
No credit card required.

Start free trial
  • blue checkmark vector

    Boost pipeline speed & reliability

  • blue checkmark vector

    Streamline engineering resources

  • blue checkmark vector

    Accelerate data-to-value

  • blue checkmark vector

    Standardize workflow and app deployments

More

Guides

Unlock Workflow Parallelism by Configuring Volumes for Argo Workflows

6 min read
Guides

How to Fine-Tune an LLM with Argo Workflows and Hera

8 min read
Guides

Why it’s Time to Migrate Your CI/CD from Jenkins to Argo

6 min read