Subscribe for Pipekit updates.

Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Follow Pipekit

Intro to Backtesting in R: A Comprehensive Tutorial

Backtesting is a time machine for your trading strategies. You use it to test your ideas with historical market data, so you can evaluate how they would have performed. Imagine being able to go back in time and test your strategies in different market conditions before you risk any real money. That's the power of backtesting.

R is one of the best tools you have for backtesting. It's a programming language designed from the ground up for manipulating numbers and analyzing data.

This post will show you, step by step, how to use R for backtesting.

What is Backtesting?

Backtesting is testing a trading strategy with historical data in order to evaluate its performance and determine if it would have been profitable in the past. This process allows traders to simulate real-world scenarios and assess the effectiveness of their strategies before putting money at risk.

It's a crucial step in developing a successful trading strategy, as it helps traders identify potential weaknesses and make informed decisions about the viability of their ideas. With backtesting, traders can fine-tune their strategies and make data-driven decisions, improving their chances for success.

Why Use R for Backtesting?

R is a flexible platform for testing your strategies and seeing how they would have performed in the past. With its vast libraries and packages, R makes it easy to load and manipulate data, calculate performance metrics, and visualize results. R gives you the tools you need to take your backtesting to the next level and achieve your financial goals. 

Why Not Use R for Backtesting?

While R is a highly versatile and powerful tool, it may not be the best option for everyone. One potential drawback is that R has a steep learning curve for those who are not familiar with programming or statistical analysis.

Some traders may find that other software or platforms are better suited to their specific needs or preferences, particularly if they have specific requirements for data analysis or visualization. 

R also lends itself to code that isn't easy to reuse. You have to work around this if you want to create a set of backtesting tools.

{% cta-1 %}

Prerequisites

While the code in this tutorial is simple and easy to follow, you'll need a basic understanding of programming and R to follow. You'll also need to understand basic trading strategies. To keep this tutorial focused on R and backtesting. 

One of R's biggest strengths is the Comprehensive R Archive Network (CRAN.) It hosts a wealth of useful packages to help you build your trading strategies and backtesting system. To follow this tutorial, you'll need two packages: quantmod and PerformanceAnalytics.

I'll be using R Studio to write and execute the code, so I'll share screenshots from there. But you can run the code on the command line or use a different IDE, such as Visual Studio Code, PyCharm, or IntelliJ. 

Download the quantmod and Performance Analytics packages to your R environment, and import them at the top of your source file: 

Now, let's write some code. 

Quote: It hosts a wealth of useful packages to help you build your trading strategies and backtesting system.

Intro to Backtesting With R

You can break the backtesting process down into four essential steps: 

  1. Get historical pricing data.
  2. Use the data to create technical indicators.
  3. Apply the indicators to the prices.
  4. Backtest the strategy to see how it performs.

In order to make this code reusable, we'll create a function for each step. 

Download Pricing Data

Quantmod makes downloading your pricing data easy with the getSymbols() function. It supports many sources, including Yahoo, Google and OANDA. We'll use the default; Yahoo. 

After downloading the prices, we'll graph them. 

This function accepts a ticker and optional starting and ending dates for the data. If you omit the start and stop dates, it retrieves prices for 2010 - 2013.

GetSymbols returns a data frame with columns based on the symbol type.

So, if you call the function with "CSCO" for the ticker: 

GetSymbols returns a data frame with the relevant pricing fields. Here are the first few entries for CSCO:

Data frame with the relevant pricing fields with entries for CSCO

Since we're using functions and not relying on R's environment to pass values to library functions, we use {% c-line %}auto.assign = FALSE{% c-line-end %} to tell it to not save the value and assign it to the {% c-line %}prices{% c-line-end %} variable instead. Then, we use {% c-line %}barChart{% c-line-end %} to build a graph.

Image of a barChart

This graph illustrates how well R and quantmod work with financial data. BY specifying type=hlc we told barChart to inspect the list and use the High, Low, and Close columns to build the graph. It also uses the Volume column to build a secondary chart by default.

The function returns the prices list so we can use it to calculate the technical indicators.

Create Indicators

Now, we'll calculate a simple moving average (SMA) for our technical indicator. This is a very simple indicator that probably won't yield a great result, but our focus is on backtesting, not trading strategy. 

Quantmod gives us the SMA() function for calculating the average. It includes an impressive set of functions that you can mix and match to form your own strategy.

SMA() needs a data frame of prices and an interval.

GetSymbols returned a named list, but it unhelpfully embedded the ticker names in column names. If we were calling it in a one-off script for CSCO, we would do this:

But we want a function that works with any ticker. Quantmod has the OHLC transformations tools for this. By wrapping prices in {% c-line%}CL(){% c-line-end %}, the SMA() function sees this:

So we don't have to know the ticker name in advance.

After calculating the averages, the next line replaces any {% c-line %}NA{% c-line-end %} entries with zero, so the graphing and signal code doesn't throw any errors.

Finally, we graph the moving average against the prices. This code is another example of how R can complicate code reuse.

First, we render a new graph with the prices:

This is the same call we used to graph the prices. Next, we want to add the SMA to the chart. Quantmod has tools that make this easy.

Here's how to use {% c-line %}addTA(){% c-line-end %} to add a line to your graph: 

This will add the indicator as a graph with a green line and "Simple Moving Average" as a caption.

Quantmod has similar functions that will apply the formula to the data in the graph for you, too. So, you could use addSMA() instead, but this is less efficient and could slow down backtesting with large datasets.

Also, none of these calls to add data to an existing graph work inside of a function. When called from inside a function {% c-line %}AddXXX(){% c-line-end %} can't find its required context and the call silently fails. Wrapping it inside a call to {% c-line %}plot(){% c-line-end %} fixes the issue.

Here's the graph with the moving average added under the volume graph:

Create Trading Signals

Now, we compare the prices to the moving average to create trading indicators. In this implementation, the code uses whether the price is higher or lower than the corresponding average to create a signal.

Replace this with your strategy.

Run the Strategy

Next, use the signals to apply your trading strategy to the historical data.

The resulting set has a column named {% c-line %}Lag.1{% c-line-end %}. We rename it to the ticker so it will render better in the graph.

{% related-articles %}

Backtesting

Finally, you have the pieces you need to run a backtest.

Here's where R and its libraries do all the heavy lifting for you. The Performance Analytics package has charts that display how your strategy performs against the data for you.

Quote: The Performance Analytics packages has charts that display how your strategy performs against the data for you.

Here's the result for CSCO:

Image of CSCO results


Test Run

Let's put the functions together and run them against TSLA in 2020: 

Here's the indicators graph:

Indicators graph

Here are the results: 

TSLA performance results on a graph


Wrapping Backtesting With R Up

In the post, we wrote code for backtesting a trading strategy based on simple moving average, but we structured the code in a way that makes it easy to plug in different strategies.

You can take it from here. Get started with backtesting with R today! Once you've put together a set of tests, Pipekit can help you use Kubernetes to run them in parallel.

Are your data pipelines scalable and reliable?

Operating data pipelines at scale doesn't have to be unreliable and costly. Put an end to the stress of unreliable data pipelines and data engineering backlogs and turn data into revenue-boosting insights. Pipekit can help.

Pipekit is a self-serve data platform that configures Argo Workflows on your infrastructure to offer simplicity and efficiency when it comes to data workflows. Achieve higher scalability for your data pipelines while significantly reducing your cloud spend. Our platform is designed to align your data infrastructure seamlessly with your full-stack infrastructure, all on Kubernetes.

Try out Pipekit for free today - pipekit.io/signup

Try Pipekit free

Join Pipekit for a free 30-day trial.
No credit card required.

Start free trial
  • blue checkmark vector

    Boost pipeline speed & reliability

  • blue checkmark vector

    Streamline engineering resources

  • blue checkmark vector

    Accelerate data-to-value

  • blue checkmark vector

    Standardize workflow and app deployments

More

Guides

Unlock Workflow Parallelism by Configuring Volumes for Argo Workflows

6 min read
Guides

How to Fine-Tune an LLM with Argo Workflows and Hera

8 min read
Guides

Why it’s Time to Migrate Your CI/CD from Jenkins to Argo

6 min read