Subscribe for Pipekit updates.

Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Follow Pipekit

How to Install Argo Workflows on AWS, GCP, and Azure

Argo Workflows, a container-native workflow orchestration tool, can orchestrate parallel jobs on Kubernetes in any cloud platform. A workflow is a frequently complex sequence of steps that performs specific tasks in order to execute a significant action. To simplify this process, an orchestration tool like Argo Workflows can automate and manage multiple workflows. 

This tool can prove useful in a number of use cases, such as machine learning or data processing tasks or running CI/CD pipelines natively on Kubernetes.

In this article, you’ll learn more about Argo Workflows and what it can do. You’ll learn how to set it up in Kubernetes in a cloud provider like AWS, Azure, or GCP, as well as how to create and submit workflows in Argo.

What Is Argo Workflows?

Argo is a Cloud Native Computing Foundation (CNCF)-hosted project that enables you to programmatically author, schedule, and monitor workflows. Argo implements workflows as `CustomResourceDefinitions` (CRDs). Treating workflows as code makes them perfect for GitOps pipelines.

With Argo Workflows, each step is a container. Containerizing steps frees you up to develop language-agnostic workflows, meaning you’re not limited to specific programming languages. You can model complex workflows as directed acyclic graphs (DAG), so that you can capture the dependencies and share artifacts between them.

Argo Workflows allows you to create and run cloud-scale, compute-intensive workflows such as those used in machine learning or big data processing. They’re made up of polyglot, composable tasks that deal with huge amounts of data. With Argo Workflows, these workflows can automatically scale vertically and horizontally.

Setting Up Argo Workflows

To submit workflows in Argo, you’ll first need a Kubernetes cluster. Then install Argo Workflows and the associated tools that handle workflow interactions.

Setting Up Kubernetes and Kubectl

One option to deploy, run, and manage a Kubernetes cluster is to create one yourself. You can also opt for a managed environment. Amazon Web Services (AWS) offers a managed Kubernetes solution called EKS, Google Cloud Platform (GCP) offers GKE, and Azure offers AKS

To create a managed Kubernetes cluster, you’ll need an account with the cloud provider of your choice. You can manually deploy a cluster through the provider portal, or you can use solutions offered by the cloud provider like CloudFormation. You can also consider cloud-agnostic Infrastructure as Code (IaC) solutions like Terraform or Pulumi. Follow the documentation of your chosen provider for details. 

All communication with Kubernetes is handled through its API server. There are several ways to access the API server. To facilitate operations, you’ll need to install kubectl, a command line tool for running commands against Kubernetes clusters. To configure kubectl’s access to the cluster, each cloud provider provides a way to export the kubeconfig.

Check the cloud providers’ documentations to see how to set up kubectl:

Once kubectl is installed and configured, you can test it by accessing your cluster to get information, as shown in the code snippet below:

Output:

The sample cluster above has three nodes. The control plane holds all the administrative Kubernetes resources, while the worker nodes run the workloads.

{% cta-1 %}

Installing Argo Workflows CLI

With a Kubernetes cluster up and running, you can deploy Argo Workflows and submit your first workflow. To submit, watch, and list workflows, you’ll need to install its CLI. You can download the latest Argo CLI version from the releases page.

Installing Argo Workflows

Argo Workflows has several components. Your production setup should factor in elements like scaling, disaster recovery, high availability, and security. The fundamental components you need are:

You’ll need to deploy these components in a Kubernetes namespace,  or a logical isolation of resources in a cluster. Use the following command to create the Argo namespace:

Aside from the infrastructure components, you’ll also need to set up the CRDs, Service Accounts, `ClusterRoles`, and `RoleBindings`.

These are parts of configurations that Argo Workflows needs these components configured to function properly. Each Argo Workflow release has associated manifests that provide the necessary configurations. Although Argo Workflows is cloud agnostic, you may need to enable extra permissions on some cloud platforms. Your cluster configuration will dictate that. For example, on GKE, you will likely need to add the permission to create new `ClusterRoles` to your account.

The components will be installed as deployments and exposed as services. There are several options for the Artifact Repository. One of them is MinIO, which you can install as a deployment. You can create a file named `minio.yaml` and paste the code below into that file:

To create the resource you’ll run:

In the same way, you can deploy the Workflow Controller:

Finally, deploy Argo Server:

Alternatively, you can leverage one of the several configurations available on the Argo Workflows GitHub repository. The Argo Workflows team makes these configurations available and updated. That way you don’t have to create your own deployments from scratch. 

These configurations will deploy all required resources simultaneously. However, they are not suitable for production because they contain hard-coded passwords. The snippet below shows kubectl deploying a minimal, quick-start configuration:

Once it’s deployed, check that all components are up and running:

Output:

To quickly access the UI, you can port-forward to the service:

Then access it at https://localhost:2746.

There are other ways to access the UI, depending on your setup.

Submitting a Workflow

Now you’re ready to deploy a workflow. The Argo CLI provides complete workflow management. Using the CLI, you can submit, list, get information, print logs, and delete workflows. The following configuration represents a simple workflow, illustrating its main components:

A workflow can have several parts. The term _kind_ specifies the workflow CRD; _entrypoint_ specifies the workflow template to be used; _name_ names the template.

With the workflow defined, you will submit it from the CLI:

Output:

And list the workflow:

Output:

Or get detailed information about it:

Checking logs is simple:

Output:

You can also navigate through the UI and obtain the same information.

setting for argo
Argo main interface
Argo workflow information
setting for argo
Argo workflow logs

{% related-articles %}

The following example demonstrates how to pass an artifact from one step to the next. The workflow comprises two steps. The first will send the output of a command to a file that the second step will consume and print. This example was taken from the argo-workflows documentation:

You can now submit the workflow:

Output:

As before, you can check the workflow status and logs using the CLI.

Output:

Output:

You can also use the UI to check your workflow:

setting for argo
Passing artifacts in an Argo workflow

You can now start exploring your Argo Workflows installation!

If you want to learn more about how to access and properly secure the Argo Workflows UI, check out this article about installing Argo Workflows in production environments.

Conclusion

Workflow orchestration can be complex. It involves different components and artifacts, and it can encompass different paths for the steps involved. Argo Workflows aims to make modeling, scheduling, and tracking complex workflows simpler by leveraging Kubernetes and being cloud agnostic. This makes it an attractive solution for running compute-intensive workflows.

However, deploying, configuring, and maintaining Argo Workflows in a production environment and scaling it across several clusters for increased workloads can be daunting. It can take a significant amount of time to set up the infrastructure, the workflows, and the configuration.

Pipekit can solve that problem. It’s a control plane for Argo Workflows that enables you to develop and run large, complex workflows. With Pipekit, you’ll be able to trigger workflows, collect logs, and manage secrets. It allows you to maintain pipelines across multiple environments and multiple clusters.

To learn more about Pipekit, sign up for the waitlist.

Are your data pipelines scalable and reliable?

Operating data pipelines at scale doesn't have to be unreliable and costly. Put an end to the stress of unreliable data pipelines and data engineering backlogs and turn data into revenue-boosting insights. Pipekit can help.

Pipekit is a self-serve data platform that configures Argo Workflows on your infrastructure to offer simplicity and efficiency when it comes to data workflows. Achieve higher scalability for your data pipelines while significantly reducing your cloud spend. Our platform is designed to align your data infrastructure seamlessly with your full-stack infrastructure, all on Kubernetes.

Try out Pipekit for free today - pipekit.io/signup

Try Pipekit free

Join Pipekit for a free 30-day trial.
No credit card required.

Start free trial
  • blue checkmark vector

    Boost pipeline speed & reliability

  • blue checkmark vector

    Streamline engineering resources

  • blue checkmark vector

    Accelerate data-to-value

  • blue checkmark vector

    Standardize workflow and app deployments

More

Tutorials

Using Helm Charts to Deploy Argo Workflows on Kubernetes

6 min read
Tutorials

How to Set up Logging for Argo Workflows

11 min read
Tutorials

How to Pass Key-Values between Argo Workflows Part 2

7 min read