Flyte vs. Argo Workflows
March 24, 2023
6 min read
Flyte vs. Argo Workflows. Which one is best? Here's a head-to-head comparison of Flyte's data first and Argo's workflow first approaches.
Data has migrated from mission-critical information you need to protect to pipelines you need to automate and manage at scale. Building and maintaining the systems to accomplish that costs money, effort, and personnel. But you don't have to build expensive custom infrastructure for MLOps. You can harness the power of the cloud with Kubernetes (K8s) and Argo Airflows or Flyte. Both of these workflow orchestrators can help you build data pipelines that scale. But which one is better for you?
This article will look at Flyte vs. Argo Workflows. Both solutions harness the power of K8s to build scalable workflows for data processing and machine learning. While they share many powerful traits, they also take a different approach to solving the same problem.
What Is a Workflow Automation Platform?
Orchestrating a pipeline involves a lot more than simply starting and stopping processes. Your pipeline has steps, and each step has dependencies. Some stages simply need data and configuration information. Others rely on earlier steps in the workflow, and your orchestrator can't run them until the others have completed their work. You can run some steps in parallel, while you need to run others one at a time.
Your automation platform manages these rules and dependencies for you. Argo Workflows and Flyte orchestrate your workflows with K8s and know how to take advantage of the container platform for large-scale automation. They can run parallel tasks when possible and serial tasks when necessary.
Let's look at Flyte vs. Argo Workflows and see which one is the best fit for your data processing requirements.
Flyte vs. Argo Workflows
Both workflow automation platforms are container-native and open-source under the Apache 2.0 license. But that's where most of the similarities end.
Lyft released Flyte to the public in 2019. Since then, it's become a Linux Foundation graduate project. It boasts a large number of features, and integrations and deployments have expanded around the world. Lyft and Spotify both run it in production.
You write your Flyte workflows using a combination of Python code and a DSL. The DSL has task, node, and workflow annotations for decorating Python code to build your data pipelines.
Flyte provides Helm charts for building K8s clusters and supplies documentation for deployment to AWS EKS and GCP GKE. Instead of using K8s for local development. Flyte will run with k3s in a limited sandbox environment.
Similar to Flyte, you can run Argo Workflows on any Kubernetes (K8s) system. Argo integrates cleanly with Kubernetes as a custom resource definition so that you can install it on any cluster, or you can customize your own deployment with Helm charts, too. Argo is a Cloud Native Computing Foundation (CNCF) hosted project.
Argo's workflow steps are Kubernetes pods. You define your workflows by specifying an image with an optional command and arguments. So, you can build custom images or use defaults like Alpine to run your code. Since the steps are containers, you can use Argo to run any kind of pipeline, including continuous integration/continuous deployment.
You define your workflows using Argo's native YAML or the Couler Python API. The YAML markup has access to all of Argo's workflow features, including representing dependencies between steps, limiting container resources, and mounting container volumes.
Let's take a look at a basic workflow for both platforms.
Opening tutorials with "Hello, World!" is a venerated tradition, so let's start there.
First, here's Argo Workflows' Hello, World example in YAML. It's part of their introductory documentation here. It prints "hello world" to the Docker logs using the cowsay command.
Argo's workflows start with metadata defining the API version, the type of document, and a name for the workflow. Argo uses the name to generate a unique id for each workflow.
Spec starts the workflow definition. This workflow has a single step named whalesay. It loads Docker's whalesay image and runs the cowsay command with "hello world." This step illustrates how easy it is to run a container in Argo. With three lines of code, it pulls the image and runs it with custom arguments.
Each step in a workflow is a template. Templates are reusable and act similar to functions. You define them once, and they can accept parameters and return values to other steps in your workflow.
Next, we can run the same example using Couler instead of Argo's YAML DSL.
Couler's run_container pulls and runs a container using syntax similar to Argo's YAML. Then the code submits the couler object to the cluster for execution.
Now, let's look at Flyte's example from their user guide.
The @task annotation indicates that say_hello is a workflow task, while @workflow defines the workflow. By calling the task inside the workflow, you've defined a complete workflow.
Flyte is type aware and uses typing hints to build tasks and workflow with the correct input and output types.
Advantages and Trade-Offs
Argo Workflows' focus is on the workflow itself. Each line of code, whether written in Argo's native YAML or Couler's API, is about the steps in the workflow, the data passed between them, and the dependencies between each step. The markup implements an expression syntax for passing data between steps, branching based on the output from steps, and capturing errors.
Argo's YAML is a different language from Python, R, or whichever language your team uses for processing data. But, its one-and-only focus is on orchestrating workflows, and it does that well. Also, since Argo runs containers, you can write your data processing code in any language you wish and run it on any operating system supported by Docker. You can easily mix different languages, operating systems, and versions in the same pipeline, and introduce updates to your pipeline one step at a time.
Flyte is data first. You write your code in Python and tie it together with Flyte's Python-like DSL. So if you want to stay in Python, Flyte feels more comfortable. You can turn any callable function into a task and plug it into any workflow. But you still have to model your tasks and workflows in Flyte's DSL and create more advanced artifacts like Launch plans if you want more control over how Flyte runs your tasks. So you're still working in a DSL, and it's commingled with your data processing code instead of separated by the Docker container.
Flyte dispatches jobs to K8s on your behalf in containers that you don't manage. In return for staying within Python (and the DSL), you relinquish some control over how your pipelines are run compared to Argo.
Flyte vs. Argo Workflows: Which One?
In this post, we compared Flyte and Argo Workflows. Both workflow orchestrators are open-course and container-native. They run on Kubernetes and take full advantage of K8s ability to scale for large processing tasks and use containerization to take full advantage of hardware resources. But they take a very different approach to building and managing data pipelines. Argo gives you tools to manage your workflows and run them inside containers. Flyte integrates workflows into your code and creates containers for you outside of your control.
Which system is better for you? Now that you understand the fundamental differences, you can decide. Get started building your data pipelines today!
Subscribe for Pipekit updates.
Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.