How to Install Argo Workflows on AWS, GCP, and Azure
April 4, 2022
12 min read
Learn how to install and run Argo Workflows on each of the top managed K8s providers: AWS, Azure, and GCP. We'll cover the details of how to get up and running with Argo Workflows for each cloud provider.
Argo Workflows, a container-native workflow orchestration tool, can orchestrate parallel jobs on Kubernetes in any cloud platform. A workflow is a frequently complex sequence of steps that performs specific tasks in order to execute a significant action. To simplify this process, an orchestration tool like Argo Workflows can automate and manage multiple workflows.
This tool can prove useful in a number of use cases, such as machine learning or data processing tasks or running CI/CD pipelines natively on Kubernetes.
In this article, you’ll learn more about Argo Workflows and what it can do. You’ll learn how to set it up in Kubernetes in a cloud provider like AWS, Azure, or GCP, as well as how to create and submit workflows in Argo.
What Is Argo Workflows?
Argo is a Cloud Native Computing Foundation (CNCF)-hosted project that enables you to programmatically author, schedule, and monitor workflows. Argo implements workflows as `CustomResourceDefinitions` (CRDs). Treating workflows as code makes them perfect for GitOps pipelines.
With Argo Workflows, each step is a container. Containerizing steps frees you up to develop language-agnostic workflows, meaning you’re not limited to specific programming languages. You can model complex workflows as directed acyclic graphs (DAG), so that you can capture the dependencies and share artifacts between them.
Argo Workflows allows you to create and run cloud-scale, compute-intensive workflows such as those used in machine learning or big data processing. They’re made up of polyglot, composable tasks that deal with huge amounts of data. With Argo Workflows, these workflows can automatically scale vertically and horizontally.
Setting Up Argo Workflows
To submit workflows in Argo, you’ll first need a Kubernetes cluster. Then install Argo Workflows and the associated tools that handle workflow interactions.
Setting Up Kubernetes and Kubectl
One option to deploy, run, and manage a Kubernetes cluster is to create one yourself. You can also opt for a managed environment. Amazon Web Services (AWS) offers a managed Kubernetes solution called EKS, Google Cloud Platform (GCP) offers GKE, and Azure offers AKS.
To create a managed Kubernetes cluster, you’ll need an account with the cloud provider of your choice. You can manually deploy a cluster through the provider portal, or you can use solutions offered by the cloud provider like CloudFormation. You can also consider cloud-agnostic Infrastructure as Code (IaC) solutions like Terraform or Pulumi. Follow the documentation of your chosen provider for details.
All communication with Kubernetes is handled through its API server. There are several ways to access the API server. To facilitate operations, you’ll need to install kubectl, a command line tool for running commands against Kubernetes clusters. To configure kubectl’s access to the cluster, each cloud provider provides a way to export the kubeconfig.
Check the cloud providers’ documentations to see how to set up kubectl:
Once kubectl is installed and configured, you can test it by accessing your cluster to get information, as shown in the code snippet below:
The sample cluster above has three nodes. The control plane holds all the administrative Kubernetes resources, while the worker nodes run the workloads.
Installing Argo Workflows CLI
With a Kubernetes cluster up and running, you can deploy Argo Workflows and submit your first workflow. To submit, watch, and list workflows, you’ll need to install its CLI. You can download the latest Argo CLI version from the releases page.
Installing Argo Workflows
Argo Workflows has several components. Your production setup should factor in elements like scaling, disaster recovery, high availability, and security. The fundamental components you need are:
- Argo server: exposes a UI for workflows and the API required to work with Argo
- Workflow controller: manages workflows
- Artifact repository: passes artifacts between jobs in a workflow
You’ll need to deploy these components in a Kubernetes namespace, or a logical isolation of resources in a cluster. Use the following command to create the Argo namespace:
These are parts of configurations that Argo Workflows needs these components configured to function properly. Each Argo Workflow release has associated manifests that provide the necessary configurations. Although Argo Workflows is cloud agnostic, you may need to enable extra permissions on some cloud platforms. Your cluster configuration will dictate that. For example, on GKE, you will likely need to add the permission to create new `ClusterRoles` to your account.
The components will be installed as deployments and exposed as services. There are several options for the Artifact Repository. One of them is MinIO, which you can install as a deployment. You can create a file named `minio.yaml` and paste the code below into that file:
To create the resource you’ll run:
In the same way, you can deploy the Workflow Controller:
Finally, deploy Argo Server:
Alternatively, you can leverage one of the several configurations available on the Argo Workflows GitHub repository. The Argo Workflows team makes these configurations available and updated. That way you don’t have to create your own deployments from scratch.
These configurations will deploy all required resources simultaneously. However, they are not suitable for production because they contain hard-coded passwords. The snippet below shows kubectl deploying a minimal, quick-start configuration:
Once it’s deployed, check that all components are up and running:
To quickly access the UI, you can port-forward to the service:
Then access it at https://localhost:2746.
There are other ways to access the UI, depending on your setup.
Submitting a Workflow
Now you’re ready to deploy a workflow. The Argo CLI provides complete workflow management. Using the CLI, you can submit, list, get information, print logs, and delete workflows. The following configuration represents a simple workflow, illustrating its main components:
A workflow can have several parts. The term _kind_ specifies the workflow CRD; _entrypoint_ specifies the workflow template to be used; _name_ names the template.
With the workflow defined, you will submit it from the CLI:
And list the workflow:
Or get detailed information about it:
Checking logs is simple:
You can also navigate through the UI and obtain the same information.
The following example demonstrates how to pass an artifact from one step to the next. The workflow comprises two steps. The first will send the output of a command to a file that the second step will consume and print. This example was taken from the argo-workflows documentation:
You can now submit the workflow:
As before, you can check the workflow status and logs using the CLI.
You can also use the UI to check your workflow:
You can now start exploring your Argo Workflows installation!
If you want to learn more about how to access and properly secure the Argo Workflows UI, check out this article about installing Argo Workflows in production environments.
Workflow orchestration can be complex. It involves different components and artifacts, and it can encompass different paths for the steps involved. Argo Workflows aims to make modeling, scheduling, and tracking complex workflows simpler by leveraging Kubernetes and being cloud agnostic. This makes it an attractive solution for running compute-intensive workflows.
However, deploying, configuring, and maintaining Argo Workflows in a production environment and scaling it across several clusters for increased workloads can be daunting. It can take a significant amount of time to set up the infrastructure, the workflows, and the configuration.
Pipekit can solve that problem. It’s a control plane for Argo Workflows that enables you to develop and run large, complex workflows. With Pipekit, you’ll be able to trigger workflows, collect logs, and manage secrets. It allows you to maintain pipelines across multiple environments and multiple clusters. Book your demo with Pipekit here.
To learn more about Pipekit, sign up for the waitlist.
Subscribe for Pipekit updates.
Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.