So, you’ve launched your A/B test and the data is rolling in. You’re starting to see some great uplift for your KPI, and your other metrics are providing great insight for subsequent A/B tests. So how long should you let the A/B test go for? On the one hand, cut it short and it could turn out that your data hadn’t yet stabilized and that clear win you saw wasn’t actually a win. Let it run too long, and you might lose out on the traffic still seeing the losing variation, or unrelated site changes may begin to influence the A/B test in question. The answer to this question depends on a couple of factors. These are some guidelines to help you determine when it’s time to end your A/B test.
When Your A/B Test Reaches Statistical Significance
One guiding answer to know if your A/B test has run long enough is to evaluate the statistical significance of your key results, and to make sure they remain significant for two to three days. Statistical significance will be displayed in your SiteSpect performance matrix as a percentage, and indicates the confidence that a statistic is the result of an experience rather than just chance. For example, let’s say you A/B test a blue button versus a green button. The blue button sees 51 clicks, and the green button sees 49 clicks. Depending on your traffic levels, the difference there is likely just due to chance versus the color of the button actually affecting user behavior. In this case, you would let the test continue to run until you had a higher level of confidence that one variation actually did increase clicks on the button. In other words until your key metric values reach statistical significance. Remember that statistical significance is calculated and reached independently for each value (Uniques, Totals, etc.) of each Metric. The smaller the actual delta between control and variation, the more samples of that Metric Value will be required to reach statistical significance.
For an A/B test to be most useful, you need to wait until your KPI and other important metrics reach at least 90% significance. At this point, you can be sure that your winning variation is actually winning. However, there are other confidence indicators you can also look to. SiteSpect also provides the Z-Score for each metric, which measures the standard deviations from the mean. A Z-Score of ±1 means that there is a 33% probability that the result occurred due to chance. A Z-Score of ±2 puts that probability at 5%, and ±3 indicates 0.1% probability that the difference in results is due to chance. This score can help complete the picture of how confident you should be in the results from your A/B test.
Managing Low Traffic
If you’re dealing with a lower traffic site, you might have additional factors to consider since you won’t reach significance as quickly. This can be tricky, especially if your A/B test needs to occur on a timeline. In these cases, you may make the informed decision to end the A/B test at a lower significance level, though I wouldn’t recommend relying on anything below 80% significance. This is especially true if the decision is binary, in other words whether the experience is better or worse than the control, rather than judging between multiple variations. Alternatively, you may need to weigh the importance of your metrics, and you’ll need to rely further on micro-conversions rather than KPIs deep in the funnel that will naturally get less data.
For example, in that A/B test were you to test the efficacy of a blue versus green button, you may be measuring the following: button clicks, form submissions, page views, scroll depth, time on page. It may be the case that your metric for form submissions never reaches significance, but your metric for button clicks does reach 80% significance. At this point, it would be a good idea to end the A/B test, and create a follow up experiment to get additional data. Your next A/B test may focus on elements of the form the button leads to, or add in metrics that tend to get more traffic.
Managing High Traffic
When your site or your A/B test receives a very high amount of traffic, you may find you reach significance very quickly, and so it is critical to run it through multiple business cycles. For example, if your A/B test has run from Tuesday – Thursday and your metrics have reached significance, that data only reflects midweek traffic. Users may behave very differently on the weekends, for example. Most high traffic sites should let A/B tests run for at least two weeks, and often it makes sense to release a variation to a lower percentage of traffic but for longer. You’ll want your important metrics to reach significance and maintain it for at least three to five days.
Alternatively, you might be confident in your data after two weeks, but still be tempted to let your A/B test run a little longer to see what happens. However, here you face a couple of risks. First, if you have a clear win, why miss out on the traffic that is still seeing the losing variation? Second, once an experience is no longer new to a user, the difference in conversion will begin to wear off. You may risk artificially lowering the significance of your results by letting your A/B test run for too long.
In Summary, It Depends…
The most accurate answer to the question, “How long should I run my A/B test,” is, “It depends.” These are the factors that you should consider to determine whether or not you should end your A/B test:
1. What level of significance have your KPI and other metrics reached? Are you comfortable with this level of significance?
Ending an A/B test with your KPI at 80% significance may make a lot of sense, especially if you plan to run follow up A/B tests, you have lower traffic for this experience, or the A/B test has been running for over a month.
2. How long has your A/B test run already, and do you have the time strategically to let it go longer?
Can you strategically let the A/B test run for another week, or another several weeks? Are any of your metrics hurting? If the A/B test isn’t hurting any of your important metrics, and you have the leeway to let it continue, go ahead and see how it plays out a little bit longer.
3. Are you planning to run additional follow up A/B tests to further refine the experience?
If this is the case, you may opt to end the A/B test sooner rather than later so that you can apply your learnings to the next experience. This may end up helping you make the most of your traffic.
While there is no one-size-fits all answer here, working through these factors should give you the confidence you need to determine whether it’s time to end your A/B test. The most important thing to remember is to look at the whole user story, including segmenting your data by device and dollar value metrics, and not just focus in on one or two metrics. This can help you answer not just what metrics have changed, but why.
To learn more about SiteSpect, visit our website.
About Paul Terry
Paul Terry is a SiteSpect Consultant in Customer Support, guiding SiteSpect users on the road to optimization. He has over 15 years experience in optimization, testing, and personalization. He is based in Duluth, Georgia.
Subscribe to our blog: