How to Clean Up Pods and Save Logs with Argo Workflows
April 28, 2022
Using a proper garbage collection strategy for your Argo Workflows ensures that pods get deleted efficiently to maximize your cluster's performance.
Argo Workflows is an open source container-native workflow engine, hosted by CNCF. Argo Workflows makes it easy to automate and manage complex workflows on Kubernetes. It’s also a very handy tool for various use cases, including machine learning, ETL and data analytics, data streaming pipelines, etc. And you can implement it as a Kubernetes CRD, making it very easy to deploy and manage.
Argo Workflows defines multi-step workflows as a sequence of tasks where each step in the workflow runs as a container. This article explains how to clean up pods created by a workflow once it has run, and save any logs generated during workflow execution to an S3 bucket.
Pod Garbage Collection
Garbage collection is necessary to clean up Kubernetes cluster resources. When a workflow execution completes, pods created by that workflow are not deleted. This can cause clutter if you’re running multiple complex workflows on your Kubernetes cluster, creating unnecessary performance bottlenecks for the API server. Leveraging different garbage collection strategies to delete pods helps you avoid this.
The following are the four garbage collection strategies used by Argo Workflows:
- OnPodCompletion: Deletes the pod once its work is completed. This mode will delete the pod even if it encounters an error or failure.
- OnPodSuccess: Deletes the pod once its workload is completed successfully.
- OnWorkflowCompletion: Deletes all the pods in the workflow once the whole workflow execution is completed. Once again, this mode will delete the pods even if the workflow fails.
- OnWorkflowSuccess: Deletes all the pods in the workflow once the whole workflow has run successfully.
You can apply these four garbage collection strategies at either pod or workflow level. When applied at a pod level, garbage collection will immediately terminate a pod (based on the policy applied and pod execution status) and won’t wait for the execution results of subsequent pods in the workflow. However, if it’s applied at a workflow level, no pod will be terminated until the entire workflow is done. Again, this will be based on the applied policy and the workflow execution status.
The garbage collection strategy you should use will depend on the use case. For example, when creating a workflow, you may want to set the garbage collection strategy to OnPodSuccess so you can debug a step that may have failed. Once you’re confident about the workflow execution and want to deploy to production, you may change the strategy to OnPodCompletion to clear up resources faster. For more complex workflows, setting garbage collection at a workflow level enables you to verify the functionality of every step at a granular level.
Benefits of Storing Logs in S3
For complex workflows, it’s not possible to debug potential issues during workflow execution by connecting to the cluster and checking logs from each pod. Some developers on the team may not even have direct access to a production Kubernetes cluster for security reasons. Storing logs from different pods in a centralized location makes it simpler for them to debug any issues. Once the garbage collection strategy removes a pod, its logs are also gone. It makes sense to save these logs elsewhere—preferably on a cloud object storage medium like AWS S3.
Storing the logs for a certain period can also help analyze the execution behavior of your workflow when encountering issues. These logs can come in handy to figure out what changed within your workflow execution.
This centralized location can also store the output generated by each step and use that as an input for the next step. These are called input/output artifacts and they are critical components of a workflow because different steps in a workflow are interconnected and interdependent.
How to Clean Up Pods and Save Logs to an S3 Bucket
You’ll be executing all the commands on a CloudShell terminal. So open CloudShell and run the following command to install kubectl and verify that the nodes are up and running:
If the cluster is ready (as in our case), run the following commands to install the Argo CLI:
Now run the following commands to install Argo Workflows:
Go to the AWS S3 console and create a bucket. This bucket will store the workflow logs and artifacts.
You now have to attach the following inline policy to the IAM role associated with the cluster nodes. This is so that the nodes can access the S3 bucket (make sure to change the S3 bucket name accordingly):
Edit the Argo Workflow config map using the kubectl edit cm -n argo workflow-controller-configmap command and add the following data to enable it using the S3 bucket as an artifact repository. Once again, change the bucket name accordingly:
Create a workflow file named workflow.yaml with the following definition:
Run the workflow with the argo -n argo submit workflow.yaml --watch command.
The workflow definition has two steps. In the first step, a message is stored in /tmp/hello_argo.txt as an artifact. The second step fetches it, stores it in /tmp/message, and prints the file content. This step doesn’t have an output artifact.
If you now check the S3 bucket, you’ll see a folder with the same name as your workflow. Your artifact will be stored here and you should have a hello-argo.tgz file there. If you check the workflow definition, this is the name specified for the output artifact.
If you download and unzip it, the extracted file name will be hello_argo.txt, which contains the message “hello Argo.” Next, run the argo -n argo logs <workflow_name> command to find the workflow output. Make sure to change the “workflow_name” placeholder accordingly:
Now run the kubectl get pods -n argocommand. You’ll see the pods created by the workflow are still there with a “Completed” status.
To clean up the pods, modify the workflow definition to add a pod garbage collection strategy with the podGC parameter:
Re-run the workflow and check the pods. This time, there won’t be any new pods with the “Completed” status—because the garbage collection policy terminated them.
Although the example workflow in this tutorial was very simple, it shows that setting up S3 as an artifact repository for your Argo workflows is very easy. Setting up an artifact repository not only helps in storing logs for further analysis/debugging but can also be used to pass artifacts between different steps in the job.
You should also now know why garbage collection strategies are important and how to set them up for Argo Workflows. Using a proper garbage collection strategy ensures that pods get deleted efficiently so that the cluster doesn’t get cluttered.
Pipekit is the control plane for Argo Workflows, which seamlessly orchestrates complex workflows. It can also quickly set up your data pipelines and scale them seamlessly. To discover how easy it is to set up a data pipeline with Pipekit, have a look at the Pipekit documentation.
Subscribe for Pipekit updates.
Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.