At SiteSpect, we say that every user tells a story and every test paints a picture. A successful variation not only validates the hypothesis, but hopefully paints a better picture for business KPIs. We see a lift in Orders at “statistical significance” and… yay! However, if you base your test findings solely on a lift at significance in your most K of PIs for tested users, you’re likely missing a lot of information those users are trying to tell you.
Every Face Tells a Story – Capture them
In order to understand the user’s story, your test should include Metrics for each step in the story: landing page through thank-you page and every page or significant mouse click in between. For example, a customer flow on a retail site might look something like: Home > Search > Product List Page > Quick View > Product Page > Add to Cart > Cart > Checkout Step 1 > Checkout Step 2 > Order Confirmation. In that flow, you should include a Metric for each step — including other significant landing pages, browse pages, each Add to Cart button location, Order AOV, and Order Items. These metrics are easy to track — in SiteSpect you can easily drag and drop them into subsequent tests or make them Default Metrics for every new test created.
Each A/B test will also likely contain one or more Metrics which pertain to the specific variation(s) being tested. If you’re testing a new Call to Action, be sure to measure that action, as well as alternative actions that may change in tandem.
Another category of Metrics that can be useful in determining behavior change is scrolling. If you’re testing new content on the page, it is important to know whether the user scrolled to see it. Page percentage and viewport scrolling can be easily measured across variations to determine whether tested content has been consumed.
Our best testers include between 20-80 metrics in every test. Remember, most SiteSpect Metrics add absolutely zero weight to pages or processing.
We always measure the user’s story in this level of detail because it allows us to see not only what behaviors changed, but why they changed. We also want to be able to detect unexpected changes in the user’s story — these are often the source of new insights. Unexpected learnings are often the engine of test iterations, or entirely new tests and can also help affirm test findings, or detect bugs in test or Metric setup.
How to Make Sense of it All
The new user story must make sense. A/B Test analytics is the process of understanding not only where the story changed, but why. Questions to guide your analysis should include: How did the user story change in the variation? Where did the change start? How far did it go? Sure, your variation’s bigger and better positioned search box caused more product searches and more product page views, but why then did conversion decrease or stay flat? Look how browse usage changed. Could it be that increased search usage decreased browse usage which converts better? Perhaps it’s time to start testing the search algorithm and result relevance. And look at that increase in homepage views! That could mean that users are getting lost or frustrated. Overall pageviews, visits per user, and time per visit are Metrics built into every SiteSpect test. These are also important clues to why a variation performed the way it did.
The Good and the Bad
Rich testing data also allows for comprehensive segmentation. Look at users who did what you wanted them to do. How did the variation affect them? How did their journey change? Compare that with the group of users who didn’t do what you wanted. What common aspects of their journey — browser, referrer, device, path, history, etc — might have negatively affected their outcome? As you segment groups of users for analysis, be sure to note that statistical significance is recalculated on the smaller data samples. Lifts that either lose statistical significance or are reduced to low sample sizes through segmentation should be considered accordingly in the new segment.
A critical part of test analytics is outlier segmentation. Remember, the lift (positive or negative) is a change in the Metric average per visit or user. That average can be affected in outsized ways by a single user or group of users. Isolating those outliers that obstruct our view and judging the results on more “normal” users will give us a better sense of the predictive value of lifts, and can also help identify test setup problems or artifacts.
Say a variation is showing flat or negative lifts in KPIs. This could be representative or it could be due to a specific browser or device type that is responding negatively — and differently — than other users. It’s always a best practice, and especially when results are negative or flat, to segment by browser and device type to insure the averages are relatively uniform, and there isn’t some ugly group of users obstructing our view.
Money Money Money
When the Metric is reporting a monetary value, users’ contributions to the average can usually vary much more widely than with other Metrics. Users may click on the ATC button between zero and say, 5 or 6 times. But the order value may be from $5.00 to $500.00. And it might not be unreasonable that one or a few users might spend $5,000.00 – $10,000.00. These outliers can skew the average greatly, and should be isolated through segmentation. We normally recommend segmenting out three standard deviations from the mean, thus measuring a lift in AOV with a group of more “average” purchases.
Web Analytics in 3D
I’ve heard that A/B Testing is like Web Analytics in 3D. Traditional analytics looks at past users under a microscope. We can measure the behaviors in their stories, and determine whether or not they converted, or when and where they finished their journey. A/B Testing looks at active users under a microscope, and measures how they respond to different experiences. This added dimension of variable motion gives unique insights into the elasticities of marketing techniques as well as the effectiveness of our tools for browsing, finding, and converting. Employing liberal measurement and comparing entire stories across variations — good stories, bad stories, and ugly ones too — allows us to not only declare a winning variation, but to gain a better understanding of our users, our site, and how they work together to accomplish our common goals.