Intro to Backtesting in R: A Comprehensive Tutorial
March 24, 2023
7 min read
Learn how to use R for backtesting in trading and the steps involved, from downloading pricing data to testing a strategy.
Backtesting is a time machine for your trading strategies. You use it to test your ideas with historical market data, so you can evaluate how they would have performed. Imagine being able to go back in time and test your strategies in different market conditions before you risk any real money. That's the power of backtesting.
R is one of the best tools you have for backtesting. It's a programming language designed from the ground up for manipulating numbers and analyzing data.
This post will show you, step by step, how to use R for backtesting.
What is Backtesting?
Backtesting is testing a trading strategy with historical data in order to evaluate its performance and determine if it would have been profitable in the past. This process allows traders to simulate real-world scenarios and assess the effectiveness of their strategies before putting money at risk.
It's a crucial step in developing a successful trading strategy, as it helps traders identify potential weaknesses and make informed decisions about the viability of their ideas. With backtesting, traders can fine-tune their strategies and make data-driven decisions, improving their chances for success.
Why Use R for Backtesting?
R is a flexible platform for testing your strategies and seeing how they would have performed in the past. With its vast libraries and packages, R makes it easy to load and manipulate data, calculate performance metrics, and visualize results. R gives you the tools you need to take your backtesting to the next level and achieve your financial goals.
Why Not Use R for Backtesting?
While R is a highly versatile and powerful tool, it may not be the best option for everyone. One potential drawback is that R has a steep learning curve for those who are not familiar with programming or statistical analysis.
Some traders may find that other software or platforms are better suited to their specific needs or preferences, particularly if they have specific requirements for data analysis or visualization.
R also lends itself to code that isn't easy to reuse. You have to work around this if you want to create a set of backtesting tools.
While the code in this tutorial is simple and easy to follow, you'll need a basic understanding of programming and R to follow. You'll also need to understand basic trading strategies. To keep this tutorial focused on R and backtesting.
One of R's biggest strengths is the Comprehensive R Archive Network (CRAN.) It hosts a wealth of useful packages to help you build your trading strategies and backtesting system. To follow this tutorial, you'll need two packages: quantmod and PerformanceAnalytics.
I'll be using R Studio to write and execute the code, so I'll share screenshots from there. But you can run the code on the command line or use a different IDE, such as Visual Studio Code, PyCharm, or IntelliJ.
Download the quantmod and Performance Analytics packages to your R environment, and import them at the top of your source file:
Now, let's write some code.
Intro to Backtesting With R
You can break the backtesting process down into four essential steps:
- Get historical pricing data.
- Use the data to create technical indicators.
- Apply the indicators to the prices.
- Backtest the strategy to see how it performs.
In order to make this code reusable, we'll create a function for each step.
Download Pricing Data
Quantmod makes downloading your pricing data easy with the getSymbols() function. It supports many sources, including Yahoo, Google and OANDA. We'll use the default; Yahoo.
After downloading the prices, we'll graph them.
This function accepts a ticker and optional starting and ending dates for the data. If you omit the start and stop dates, it retrieves prices for 2010 - 2013.
GetSymbols returns a data frame with columns based on the symbol type.
So, if you call the function with "CSCO" for the ticker:
GetSymbols returns a data frame with the relevant pricing fields. Here are the first few entries for CSCO:
Since we're using functions and not relying on R's environment to pass values to library functions, we use auto.assign = FALSE to tell it to not save the value and assign it to the prices variable instead. Then, we use barChart to build a graph.
This graph illustrates how well R and quantmod work with financial data. BY specifying type=hlc we told barChart to inspect the list and use the High, Low, and Close columns to build the graph. It also uses the Volume column to build a secondary chart by default.
The function returns the prices list so we can use it to calculate the technical indicators.
Now, we'll calculate a simple moving average (SMA) for our technical indicator. This is a very simple indicator that probably won't yield a great result, but our focus is on backtesting, not trading strategy.
Quantmod gives us the SMA() function for calculating the average. It includes an impressive set of functions that you can mix and match to form your own strategy.
SMA() needs a data frame of prices and an interval.
GetSymbols returned a named list, but it unhelpfully embedded the ticker names in column names. If we were calling it in a one-off script for CSCO, we would do this:
But we want a function that works with any ticker. Quantmod has the OHLC transformations tools for this. By wrapping prices in CL(), the SMA() function sees this:
So we don't have to know the ticker name in advance.
After calculating the averages, the next line replaces any NA entries with zero, so the graphing and signal code doesn't throw any errors.
Finally, we graph the moving average against the prices. This code is another example of how R can complicate code reuse.
First, we render a new graph with the prices:
This is the same call we used to graph the prices. Next, we want to add the SMA to the chart. Quantmod has tools that make this easy.
Here's how to use addTA() to add a line to your graph:
This will add the indicator as a graph with a green line and "Simple Moving Average" as a caption.
Quantmod has similar functions that will apply the formula to the data in the graph for you, too. So, you could use addSMA() instead, but this is less efficient and could slow down backtesting with large datasets.
Also, none of these calls to add data to an existing graph work inside of a function. When called from inside a function AddXXX() can't find its required context and the call silently fails. Wrapping it inside a call to plot() fixes the issue.
Here's the graph with the moving average added under the volume graph:
Create Trading Signals
Now, we compare the prices to the moving average to create trading indicators. In this implementation, the code uses whether the price is higher or lower than the corresponding average to create a signal.
Replace this with your strategy.
Run the Strategy
Next, use the signals to apply your trading strategy to the historical data.
The resulting set has a column named Lag.1. We rename it to the ticker so it will render better in the graph.
Finally, you have the pieces you need to run a backtest.
Here's where R and its libraries do all the heavy lifting for you. The Performance Analytics package has charts that display how your strategy performs against the data for you.
Here's the result for CSCO:
Let's put the functions together and run them against TSLA in 2020:
Here's the indicators graph:
Here are the results:
Wrapping Backtesting With R Up
In the post, we wrote code for backtesting a trading strategy based on simple moving average, but we structured the code in a way that makes it easy to plug in different strategies.
You can take it from here. Get started with backtesting with R today! Once you've put together a set of tests, Pipekit can help you use Kubernetes to run them in parallel.
Subscribe for Pipekit updates.
Get the latest articles on all things Pipekit & data orchestration delivered straight to your inbox.