Author Archive

Multivariate Testing 101: A Scientific Method Of Optimizing Design

Advertisement in Multivariate Testing 101: A Scientific Method Of Optimizing Design
 in Multivariate Testing 101: A Scientific Method Of Optimizing Design  in Multivariate Testing 101: A Scientific Method Of Optimizing Design  in Multivariate Testing 101: A Scientific Method Of Optimizing Design

In a previous article on Smashing Magazine, I described A/B testing and various resources related to it. I have also covered the basics of multivariate testing in the past, yet in this post I’ll go deeper in the technical details of multivariate testing which is similar to A/B testing but with crucial differences.

In a multivariate test, a Web page is treated as a combination of elements (including headlines, images, buttons and text) that affect the conversion rate. Essentially, you decompose a Web page into distinct units and create variations of those units. For example, if your page is composed of a headline, an image and accompanying text, then you would create variations for each of them. To illustrate the example, let’s assume you make the following variations:

  • Headline: headline 1 and headline 2
  • Text: text 1 and text 2
  • Image: image 1 and image 2

The scenario above has three variables (headline, text and image), each with two versions. In a multivariate test, your objective is to see which combination of these versions achieves the highest conversion rate. By combinations, I mean one of the eight (2 × 2 × 2) versions of the Web page that we’ll come up with when we combine variations of the sections:

  • Headline 1 + Text 1 + Image 1
  • Headline 1 + Text 1 + Image 2
  • Headline 1 + Text 2 + Image 1
  • Headline 1 + Text 2 + Image 2
  • Headline 2 + Text 1 + Image 1
  • Headline 2 + Text 1 + Image 2
  • Headline 2 + Text 2 + Image 1
  • Headline 2 + Text 2 + Image 2

In multivariate testing, you split traffic between these eight different versions of the page and see which combination produces the highest conversion rate — just like in A/B testing, where you split traffic between two versions of a page.

Getting Started With Multivariate Testing

To create your first multivariate test, first choose a tool or framework that supports multivariate testing. You can use one of the tools listed in the section “Tools” in the end of this article. Please note that not all A/B testing tools support multivariate testing, so make sure your tool of choice allows it.

Once you’ve decided which tool to use, choose which sections to include in the test. As you know, a Web page can contain tens or hundreds of different sections (footer, headline, sidebar, log-in form, navigation buttons, etc.). You cannot include all of these sections in the test; creating variations for all of them would be an enormous task (and, as you’ll read below, the traffic requirements for the test will grow exponentially with each new section). Narrow it down to the few sections of the page that you think are most important to the conversion goal.

The following parts of a page (listed in order of importance) are typically included in a multivariate test:

  • Headline and heading,
  • Call-to-action buttons (color, text, size, placement),
  • Text copy (content, length, size),
  • Image (type, placement, size),
  • Form length.

The Difference Between A/B Testing And Multivariate Testing

Conceptually, the two techniques are similar, but there are crucial differences. First and foremost, the traffic requirements are different. As I said, the number of combinations that need to be tested grows exponentially in a multivariate test. You can test three or four versions in an A/B test and tens or hundreds of versions in a multivariate test. Clearly, then, a lot of traffic — and time — is required to arrive at meaningful results.

For example, if you have three sections with three variations each, the number of combinations is 27. Add another section with three variations, and the total number of combinations jumps to 81. If you want meaningful results, you can’t keep adding sections to the test. Be selective. A good rule is to limit the total number of combinations to 25 or fewer.

Variation-testing in Multivariate Testing 101: A Scientific Method Of Optimizing Design
Use A/B testing for large scale changes, not to refine or optimize existing designs. Image by Meet the Chumbeques

Another difference is in how these techniques are used. A/B testing is usually reserved for large radical changes (such as completely changing a landing page or displaying two different offers). Multivariate testing is used to refine and optimize an existing design. For the mathematically inclined, A/B testing is used to optimize for a global optimum, while multivariate testing is used to optimize for a local optimum.

One advantage of multivariate testing over A/B split testing is that it can tell you which part of the page is most influential on conversion goals. Say you’re testing the headline, text and image on your landing page. How do you know which part has the most impact? Most multivariate testing tools will give you a metric, called the “impact factor,� in their reports that tells you which sections influence the conversion rate and which don’t. You don’t get this information from A/B testing because all sections are lumped into one variation.

Types Of Multivariate Tests

Based on how you distribute traffic to your combinations, there are several types of multivariate tests (MVT):

Full factorial testing
This is the kind people generally refer to when they talk about multivariate testing. By this method, one distributes website traffic equally among all combinations. If there are 16 combinations, each one will receive one-sixteenth of all the website traffic. Because each combination gets the same amount of traffic, this method provides all of the data needed to determine which particular combination and section performed best. You might discover that a certain image had no effect on the conversion rate, while the headline was most influential. Because the full factorial method makes no assumptions with regard to statistics or the mathematics of testing, I recommend it for multivariate testing.

Results-by-ItoWorld in Multivariate Testing 101: A Scientific Method Of Optimizing Design
Record and compare the resulting traffic for each tested version. Image by ItoWorld

Partial or fractional factorial testing
As the name suggests, in this method only a fraction of all combinations are exposed to website traffic. The conversion rate for unexposed combinations is inferred from the ones that were included in the test. For example, if there are 16 combinations, then traffic is split among only eight of those. For the remaining eight, we get no conversion data, and hence we need to resort to fancy mathematics (with a few assumptions) for insight. For obvious reasons, I don’t recommend this method: even though there are fewer traffic requirements for partial factorial testing, the method forces too many assumptions. No matter how advanced the mathematics are, hard data is always better than inference.

Taguchi testing
This is the most esoteric method of all. A quick Google search reveals a lot of tools claiming to cut your testing time and traffic requirements drastically with Taguchi testing. Some might disagree, but I believe the Taguchi method is bit of a sham; it’s a set of heuristics, not a theoretically sound method. It was originally used in the manufacturing industry, where specific assumptions were made in order to decrease the number of combinations needing to be tested for QA and other experiments. These assumptions are not applicable to online testing, so you shouldn’t need to do any Taguchi testing. Stick to the other methods.

Do’s And Don’ts

I have observed hundreds of multivariate tests, and I have seen many people make the same mistakes. Here is some practical advice, direct from my experience.

Don’ts

  • Don’t include a lot of sections in the test.
    Every section you add effectively doubles the number of combinations to test. For example, if you’re testing a headline and image, then there are a total of four combinations (2 × 2). If you add a button to the test, there are suddenly eight combinations to test (2 × 2 × 2). The more combinations, the more traffic you’ll need to get significant results.

Do’s

  • Do preview all combinations.
    In multivariate testing, variations of a section (image, headline, button, etc.) are combined to create page variations. One of the combinations might be odd-looking or, worse, illogical or incompatible. For example, one combination might put together a headline that says “$15 off� and a button that says “Free subscription.� Those two messages are incompatible. Detect and remove incompatibilities at the preview stage.
  • Do decide which sections are most worthy of inclusion in the test.
    In a multivariate test, not all sections will have an equal impact on the conversion rate. For example, if you include a headline, a call-to-action button and a footer, you might come to realize that footer variations have little impact, and that headline and call-to-action variations produce winning combinations. You get a powerful section-specific report. Below is a sample report from Visual Website Optimizer. Notice how the button has more impact (91%) than the headline (65%):

    Mvt-small in Multivariate Testing 101: A Scientific Method Of Optimizing Design

  • Do estimate the traffic needed for significant results.
    Before testing, get a clear idea of how much traffic you’ll need in order to get statistically significant results. I’ve seen people add tens of sections to a page that gets just 100 visitors per day. Significant results from such a test would take months to accumulate. I suggest using a calculator, such as this A/B split and multivariate testing duration calculator, to estimate how much traffic your test will require. If it’s more than what’s acceptable, reduce some sections.

Case Studies

A lot of A/B testing case studies are on the Web, but unfortunately, finding multivariate test case studies is still difficult. So, I scoured the Internet and compiled relevant ones.

Software Download Case Study: downloads increased by 60%
This is one multivariate test I did to compare different versions of headlines and links. In the end, one of the variations resulted in a more than 60% increase in downloads.

Mvt-case-3 in Multivariate Testing 101: A Scientific Method Of Optimizing Design

Microsoft Multivariate Testing Case Study
This presentation details the variations that were tested for this website and the ultimate winner.

SiteSpect Case Studies
This page presents a dozen of multivariate testing case studies of large companies using multivariate testing and behavioral targeting to optimize their sites.

Maxymiser Case Studies
Another set of multivariate testing case studies.

Look Inside a 1,024-Recipe Multivariate Experiment
YouTube did a gigantic multivariate test in 2009. It can afford to do tests with a thousand-plus combinations because it has sufficient traffic.

Multivariate testing of an email newsletter
An agency tested color and text on the call-to-action button of its email newsletter. The best button had the highest CTR: 60%.

Multivariate Testing Tools And Resources

Tools

Google Website Optimizer
A free basic multivariate testing tool by Google. It’s great if you want to test the waters before investing money in multivariate testing. The downside? You’ll need to tag different sections of the Web page with JavaScript, which can be cumbersome. It’s also prone to error and forces you to rely on others (like the technology department) for implementation.

Visual Website Optimizer (Disclaimer: I am the developer of this tool)
The main advantage of this paid tool is that you can create a multivariate test visually in a WYSIWYG editor by choosing different sections of the page. You can then run the test without having to tag sections individually (although a snippet of code is required in the header). The tool includes heat map and click map reports.

WhichMVT
A website that publishes user reviews of all of the multivariate testing tools available on the market. If you are planning to adopt a multivariate testing tool for your organization, do your research on this website.

Enterprise testing tools
Omniture’s Test&Target, Autonomy’s Optimost, Vertster, Webtrends’ Optimize, and SiteSpect.

Resources

Expert Guide to Multivariate Testing Success, by Jonathan Mendez
A series of blog posts detailing different aspects of multivariate testing.

Fail Faster With Multivariate Testing (PDF)
An excellent free mini-guide to multivariate testing.

Online Testing Vendor Landscape
A commercial report by Forrester that compares the various testing vendors out there.

Lessons Learned from 21 Case Studies in Conversion Rate Optimization
This article discusses ideas for conversion rate optimization detailed through different case studies.

Related posts

You may be interested in the following related articles:

(al)


© Paras Chopra for Smashing Magazine, 2011. | Permalink | Post a comment | Smashing Shop | Smashing Network | About Us
Post tags: ,


Multivariate Testing in Action: Five Simple Steps to Increase Conversion Rates

Advertisement in Multivariate Testing in Action: Five Simple Steps to Increase Conversion Rates
 in Multivariate Testing in Action: Five Simple Steps to Increase Conversion Rates  in Multivariate Testing in Action: Five Simple Steps to Increase Conversion Rates  in Multivariate Testing in Action: Five Simple Steps to Increase Conversion Rates

The attention span on the Web has been decreasing ever since Google had arrived and changed the rules of the game. Now with millions of results available on any topic imaginable, the window to grab a visitor’s attention has decreased significantly (in 2002, the BBC reported it is about 9 seconds). Picture yourself browsing the Web: do you go out of your way to read the text, look at all the graphics, and try to thoroughly understand what the page is about? The answer is most likely to be a straight “no.” With bombardment of information from all around, we have become spoiled kids, not paying enough attention to what a Web page wants to tell us.

We make snap decisions on whether to engage with a website based on whatever we can make out in the first few (milli)seconds. The responsibility for making a good first impression lies with designers and website owners. Given that the window of opportunity to persuade a visitor is really small, most designs (probably including yours) do a sub-optimal job because the designer in you thinks in terms of aesthetics. However, most websites do not exist just to impress visitors. Most websites exist to make a sale. Whether it is to get visitors to subscribe to the blog feed, or to download a trial, every website ultimately exists to make a sale of some kind.

In this post we will talk about how to tweak a website for generating more sales, downloads, membership (or any other business goal) in a scientific manner, using A/B split and multivariate testing. Like everything else science-related, this article will explore a step-by-step, reproducible method for increasing your conversion rate (the percentage of visitors converted to customers). Also, you may be interested in the Ultimate Guide to A/B Testing that was published earlier here, on Smashing Magazine.

Step 1. Identify a Challenge

How to have website visitors notice your offering, then get them to act on it? I wanted to answer that million dollar question for a software download page on my personal homepage. That page had all the right ingredients: product name, product description, testimonials, awards, ratings and a prominent download link. Yet, only 40% of the visitors downloaded the free software. Note that almost all traffic on that page was targeted as it arrived, either through doing a Google search or via a relevant referring website. So, why didn’t the remaining 60% of visitors download the software? Fixing that leaky bucket was my challenge.

Pdfprod in Multivariate Testing in Action: Five Simple Steps to Increase Conversion Rates
Key point: Clearly identify the goals of your website (or a particular Web page).

In my case, the desired action is to have visitors download the software and the challenge is to increase the download rate from 40% to as high as possible. Some of the most common challenges which can be solved using A/B split testing are:

  • Improving sign-up rate, reducing bounce rate, increasing newsletter subscriptions,
  • Increasing number of leads collected from landing page, increasing whitepaper or software trial downloads and
  • Optimizing purchases and sales, converting a higher percentage of visitors to customers.

It is entirely possible that your website may be serving multiple purposes. An example would be a blog where the challenge is to get more subscribers and to increase visitor engagement (in terms of number of comments). In that case, the best strategy is to tackle one (clearly defined) challenge at a time.

Abtesting-small in Multivariate Testing in Action: Five Simple Steps to Increase Conversion Rates
Quick overview: A/B Testing. Large version

Step 2. The Hypothesis

The next step is to make a list of hypotheses for the low conversion rate (percentage of visitors taking the desired action). Agreed, it is tough to come up with exact reasons (that is why we are calling them hypotheses) for a low conversion rate, but there are three excellent resources to help you:

1) You: Yes, you! Though it is hard not to fall in love with one’s own website, it is now time to be extremely self-critical. Try to step into your visitors’ shoes and ask yourself, is your Web page compelling enough to engage a visitor with no background knowledge about your offering? Remember that unlike you, your visitors don’t wake up in the morning saying, “Oh wow, this thing is fantastic!â€� Being critical towards your own website is an excellent way to improve it.

2) Web analytics data: Another source for getting a list of improvement ideas is your analytics tool. Specifically, data on referral sources and search keywords can provide interesting insights. For example, a lot of visitors may be arriving on your webpage by searching for keywords which you haven’t even thought about. In that case, your visitors may leave the website mistakenly thinking that your offer is not what they were searching for. Addressing such cases can increase the conversion rate.

3) Usability testing: Getting independent feedback from a usability test will always surprise you! Perhaps you will discover that visitors are not even aware that you are offering something on the page. In that case, a great idea would be to test the color and size of a prominent call-to-action. If you don’t have a large budget for usability testing, try out affordable services such as Feedback Army or UserTesting.

Key point: Determine what influences conversion rate.

Take feedback from others but evaluate your Web page honestly, and jot down a list of ideas on what could be affecting conversions. For my software download Web page, I had a hypothesis that the download rate was low primarily due to two reasons: a) a lot of visitors didn’t notice the download link and b) many didn’t know that the software is free to download.

My guess was that a normal visit went something like this: a visitor arrives at the website, sees a bunch of text, looks around for the download link, somehow misses it (possibly due to uniformity in color of headings), and finally leaves the website. Those who notice the download link probably don’t go to the trouble of reading the text, where it says “… is a freeware…�, so they assume that the software is a trial or a demo.

The kinds of hypotheses you may have at this step:

  • Maybe your sign-up form is too long, and a shorter version will help increasing total number of sign-ups?
  • Maybe your “Free Trialâ€� button isn’t noticeable; will a larger button help in more downloads?
  • Maybe your headline contains a lot of industry acronyms, or is too generic?
  • Maybe your landing page has no obvious next step, which is leading to high bounce rate?

Step 3. A/B or Multivariate Testing?

Once your list of possible reasons for low conversion rate is ready, it is time to crank your brain once again to come up with different ideas for addressing those reasons. What you do in this step, is to come up with multiple different versions for all the factors you came up with in the last step. For the “Sign Up” case, for example, different versions will be:

  • Form variations: Minimal form with just two fields; form not asking for email address; multi-step form; long form.
  • Submit button variations: “Submitâ€� or “Sign Up for Freeâ€� or “Instant Signupâ€� or even “Sign Up Now!â€�

If you are skeptical that such minor differences cannot make any significant impact on conversions, read a case study where 37Signals increased sign-ups by 30% by testing a simple headline change. Also read how Dustin Curtis increased his Twitter followers by 173% by simply changing the link text to “You should follow me on Twitter.�

A/B Split Testing

In A/B testing (also known as split testing), you vary only one element on the page at a time. This element may be any part of the Web page critical to conversions (e.g. button color, size, ad copy headline). Contrast this to multivariate testing, where multiple different elements are tested at a time. However, A/B tests are simpler and easier to implement than multivariate tests.

Multivariate Testing

In multivariate testing, you identify different sections/factors on a page which effect conversion rate. Different variations of those factors are created, which are then combined to give rise to multiple different versions of the website. Multivariate tests take more time than A/B tests to show results, but are more likely to produce better results.

Key point: Create variations.

Conducting Tests

Coming back to the challenge of increasing downloads for the software page, I used my own tool, Visual Website Optimizer, that provides a visual interface for creating variations, but you could use other tools as well. An obvious solution to make visitors notice the download link is to make the download section the most prominent part of the page. In the page design, the “Download� heading size and color blended well into the rest of the page, which resulted in people missing the download link.

For the multivariate test, I selected two factors on the page for creating variations: the “Download� heading in the sidebar and the “PDFProducer� download link below it. The focus of the test was to observe the effect of the word “free,� and the effect of highlighting the download section. Here are the variations I came up with for this test:

For “Download� headline:

  • “Downloadâ€� in red
  • “Download for Freeâ€� in red
  • “Downloadâ€� in default color, but a larger font size

For “PDFProducer� link:

  • “PDFProducerâ€� in default color, but a larger font size
  • “PDFProducerâ€� in red

In a multivariate test, different variations are combined to produce multiple versions of the Web page. In this case, combining the above variations, a total of 12 (4×3) different versions were produced (automatically), each with a unique combination of “Download� headings and “PDFProducer� links (variation 1 is the control, or default, variation).

Combinations in Multivariate Testing in Action: Five Simple Steps to Increase Conversion Rates
Different versions of the download section used in the multivariate test.

For definition’s sake, because I have combined variations of two different sections, the test is called a multivariate test. If I had just varied a single section, say the “Download� heading, the test would have been called an A/B split test.

Key point: Define the goal of the test.

Every test has a goal against which the performance of different versions is measured. In this case, the goal was the number of downloads. Other types of goals may be sign-ups, purchases, clicks, leads, page views, or bounce rate. It is important to define the goal which is closest to your business objectives. For example, an eCommerce store optimizing for sales shouldn’t define clicking on the “Add to Basket” button as a goal. Rather, it should define the goal as a visit to the “Thank you” page after a purchase is completed.

Step 4. Running the Test and Analyzing Results

What an A/B split or multivariate test does, is simple: whenever a visitor arrives on your Web page, it displays a randomly chosen version of the Web page. In other words, your traffic gets equally distributed amongst different versions. The performance of the different versions is tracked against the conversion goal(s) defined for the test. For example, in my case the goal was increasing the number of downloads; each time a visitor downloaded the software, Visual Website Optimizer tracked which Web page version was shown to the visitor. Setting up a test using this tool helped here as I could select the sections, make variations in a WYSIWYG editor, and immediately preview how it will look live on the page.

After a large number of visitors have been included in the test, different versions are compared to see which one of them performed the best and how much improvement (over the default) it achieved.

Key point: Analyze the results.

After running the test for about 4 weeks, I had results for my software download test. Can you guess which variation resulted in maximum downloads? Any guesses on how much improvement I was able to achieve over the existing 40% download conversion rate?

Hold your breath, here are the results:

#DetailsConversion rate% ImprovementConfidence*
1Default combination (control)39.4%--
10“Download for Free� in red, default “PDFProducer� link63.2%60%99%
9“Download� in big font, “PDFProducer� link in red56.5%43.3%98%
12“Download for Free� in red, “PDFProducer� link in red54.2%37.7%95%
……………
2“Download� as default, “PDFProducer� in big font41.3%4.76%56%

Note: % improvement over default is calculated as 100*(Variation % – Control %)/(Control %)
# refers to the combination number as described in the screenshot above
Confidence*: Statistical confidence in beating the default combination.

You can observe that the headline “Download for Free� in red pushed the download conversion rate from 39% to 63%, a whopping increase of 60%. Having “Download� in large font size (combined with link color as red) also had a positive (43%) improvement over the default. Of all results, the top three are statistically significant at 95% or more confidence level. It means that I can safely implement winning versions on the Web page, to see a permanent increase in downloads. Also note that even the worst performing combination has about a 4% improvement over the control, though it is not statistically significant.

A common concern is that the test results may not be reliable and that the improvement seen may be due to chance. It is, therefore, important to understand different parameters that influence reliability:

  • Number of visitors: the higher the number of visitors, the more reliable the results. You can use tools such as a split test duration calculator, to estimate how many visitors will be required for your test.
  • Conversion rate: in general, results for pages with a low conversion rate (say 1-2%) take a much longer period to produce statistically significant results, than pages with a higher conversion rate (say 40-50%).
  • Difference in performance: testing with a large difference in the performance of variations (say >10%) is always more reliable than one where the difference is extremely small (0.5% or so).

It is important to either use a tool which automatically crunches the reliability of results for you, or to use online calculators to gauge the confidence in results. Taking unreliable results and implementing them can actually cause decreased performance. The exact mathematics of what goes on behind split testing reliability analysis can be read in the 20bits article Statistical Analysis and A/B Testing, or my blog article Mathematics of A/B testing.

Step 5. Learn From the Test Results

Irrespective of whether improved versions of your page are found or not, every test ends up with a good amount of learning. Here are a couple of key takeaways from my test:

  • The word “Freeâ€� is a very powerful attention grabber. You are doing a sub-optimal job if you offer something for free, and don’t make that super-obvious on the page.
  • Best location for advertising your “Freeâ€� offer is near (or on) a call-to-action. Like in this case, “Download for Freeâ€� is displayed quite close to the download link itself.
  • This brings us to next important point: why not make the word “freeâ€� clickable? I am sure if I had analyzed the location of clicks on the page, I would have found a lot of visitors clicking on the “Download for Freeâ€� headline, only to realize it is not a link. I should have definitely tested a version with a clickable headline.
  • The color red, matters, but only if it is combined with other elements such as “Freeâ€� (or other effective call to action texts). Red may bring attention to your call to action, but if the text is not persuasive, the visitor will probably not take any action.
  • The size of your call to action also matters. A larger size tells the visitor that you consider this particular section (in this case, downloading the application) more important than the other parts of the page.

Even if you don’t remember any of the points above, please take home one key point: don’t replicate the suggestions above without testing them on your website! Every website is unique, every conversion goal is different. While generic observations about the effect of the word “Free,� of the color red, and of the size of your call to action make logical sense, it is always wise to be sure of their effectiveness by setting up a quick test.

A/B split testing holds a lot of potential for positively impacting a company’s revenue and profits. In spite of that, surprisingly, adoption of testing is not that high. If you haven’t done any A/B split tests yet, why is that so? If you have done A/B split or multivariate tests in the past, please share your experiences in the comments below so that others can get to know real-world examples.

Related posts

You may be interested in the following related articles:

(ik) (vf)


© Paras Chopra for Smashing Magazine, 2010. | Permalink | Post a comment | Add to del.icio.us | Digg this | Stumble on StumbleUpon! | Tweet it! | Submit to Reddit | Forum Smashing Magazine
Post tags:


In Defense Of A/B Testing

Smashing-magazine-advertisement in In Defense Of A/B TestingSpacer in In Defense Of A/B Testing
 in In Defense Of A/B Testing  in In Defense Of A/B Testing  in In Defense Of A/B Testing

Recently, A/B testing has come under (unjust) criticism from different circles on the Internet. Even though this criticism contains some relevant points, the basic argument against A/B testing is flawed. It seems to confuse the A/B testing methodology with a specific implementation of it (e.g. testing red vs. green buttons and other trivial tests). Let’s look at different criticisms that have surfaced on the Web recently and see why they are unfounded.

[Offtopic: by the way, did you already get your copy of the Smashing Book?]

Argument #1: A/B Testing And The Local Minimum

Jason Cohen, in his post titled Out of the Cesspool and Into the Sewer: A/B Testing Trap, argues that A/B testing produces the local minimum, while the goal should be to get to the global minimum. For those who don’t understand the difference between the local and global minimum (or maxima), think of the conversion rate as a function of different elements on your page. It’s like a region in space where every point represents a variation of your page; the lower a point is in space, the better it is. To borrow an example from Jason, here is the issue with the local vs. global minimum:

Local-minimum-global-maximum in In Defense Of A/B Testing

As even Jason acknowledges in his post, this argument isn’t really concerned with A/B testing, because the same methodology could be used to test radical changes to get to the global minima. So, calling it an A/B testing trap is unfair because it doesn’t have anything to do with A/B testing. Rather, the argument uncovers the futility of testing small changes.

So, if A/B testing is not the culprit, is the real issue the local minima? No, even the theory of discounting local minima is flawed. The image above shows you a very simple one-dimensional fitness landscape. You can imagine the x-axis as the background color and the y-axis as the bounce rate. Jason’s argument goes something like this: if you tested dozens of shades of blue, you might decrease your bounce rate, but if you tried something completely different (such as a yellow), you might achieve the absolute lowest bounce rate possible on your page.

There are two problems with this argument…

1. You Never Know for Sure Whether You’ve Found the Global Minimum (or Maximum)

Jason Scaled in In Defense Of A/B Testing

The global minimum (or absolute best) exists only in theory. Let’s continue with the example of an extreme yellow background giving you the global minima (in the bounce rate). Upon further testing, what if you found that no background color at all gave you a lower bounce rate? Or better yet, that a background full of lolcat images gave you an even lower bounce rate? The point is, unless you have reduced the bounce rate to 0% (or the conversion rate to 100%), you can never be confident that you have indeed achieved the global optimum.

There is another way to determine whether you have found the global optimum: by exhausting all possibilities. Theoretically, if your page didn’t contain anything other than background color (and you couldn’t even add the background image because, well, your boss hates it), then you could cycle through all background colors available and see which one gave you the lowest bounce rate. In exhausting all possibilities, the color that gives you the lowest bounce rate should be the one that is absolutely the best. This brings us to the second point…

2. It’s Not Just About the Background Color, My Friend

When optimizing a Web page, you can vary literally hundreds or thousands of variables (background color being just one of them). Headline, copy, layout, page length, video, text color and images are just a few such variables. Your goal for the page (in terms of conversion or bounce rate) is determined by all of these variables. This means that the fitness landscape (as seen in the images above) is not one-dimensional and never as simple as it appears. In reality, it is multi-dimensional, with a ton of variables affecting the minima and maxima:

FitnessLandscape in In Defense Of A/B Testing

Again, imagine the peaks as your conversion rate (or bounce rate) and the different dimensions as the variables on your page (only two are here, but in reality there are hundreds). Unlike a one-dimensional case, exhausting all possibilities in a real-world scenario (i.e. in conversion optimization) is impossible. So, you are never guaranteed to have found the global maxima (or minima). Lesson to be learned: embrace local minima.

Argument #2: A/B Tests Trivial Changes

Rand Fishkin of SEOMoz, posted an article titled Don’t Fall Into the Trap of A/B Testing Minutiae in which he reiterates Jason’s argument to not waste time testing small elements on a page (headline, text, etc.). His main argument is that getting to the local maxima (by testing trivial changes) takes up too much energy and time to make it worthwhile. See the image below, reproduced from his blog but modified a little to make the point:

Abtests in In Defense Of A/B Testing
Large view

The first point to make is that the opportunity cost is not the time required to run the test (which is weeks) but rather the time needed to set up the test (which is minutes). Once you have set up the test, it is pretty much automated, so you risk only the time spent setting it up. If an investment of 15 minutes to set up a button-color test ultimately yields a 1.5% improvement in your conversion rate, what’s wrong with that?

Many A/B testing tools (including Visual Website Optimizer—disclaimer: my start-up) make setting up small tests a no-brainer. They also monitor your test in the background, so if it isn’t a winner, it is automatically paused. What’s the risk then of doing such trivial tests? I see only the upside: increased sales and conversions.

To make his point, Rand gives the example of a recent Basecamp home page redesign, by which Basecamp managed to increase its conversion rate by 14%. Can you imagine the kind of effort that went into such a redesign (compared to a button-color test)? In fact, because the fitness landscape is multi-dimensional (and very complicated), a total redesign has a much higher probability of performing worse. A complex design can go wrong in many more ways than a simple button color can. Because we never hear of case studies of redesigns gone wrong (hello survivorship bias), we shouldn’t conclude that testing radical changes is a better approach than testing minutiae (especially because radical changes require a huge investment in effort and time compared to small red vs. blue tests).

With the local minima (or maxima), you at least know for sure that you are increasing your conversion rate, which leads directly to increased profit. This isn’t to say that we should give up on our hunt to achieve the global optimum. Global optimum is like world peace: incredibly hard to achieve, but we have to keep moving in that direction. Lesson to be learned: the ideal strategy is a mix of both small (red vs. blue) tests and radical redesign tests. By jumping across the mountains in the conversion rate fitness landscape, you ensure that you are constantly seeking better conversion rates.

Argument #3: A/B Testing Stifles Creativity

Jeff Atwood compares the movie Groundhog Day to (surprise, surprise) A/B testing and concludes that because the protagonist failed in the movie, A/B testing must also fail. Stripped of all (non-)comparisons, Jeff suggests that A/B testing lacks empathy and stifles creativity. He goes on to cite a tweet by Nathan Bowers:

A/B testing is like sandpaper. You can use it to smooth out details, but you can’t actually create anything with it.

Whoever claimed that A/B testing is good for creating anything? Creation happens in the mind, not in a tool. The same flawed reasoning could be applied to a paint brush:

A paint brush is like a stick with some fur. You can use it to poke your cat, but you can’t really create anything with it.

A/B testing, like a paint brush, is a tool, and like all tools, it has its properties and limitations. It doesn’t dictate what you can test; hence, it doesn’t limit your creativity. A/B testing or not, you can apply the full range of your creativity and empathy to coming up with a new design for your website. It is up to you whether to go with your gut and implement it on the website immediately or to take a more scientific approach and determine whether the new design converts better than the existing one. Lesson learned: A/B testing is a tool, not a guidebook for design.

Summary

To reiterate the lessons learned from the three arguments above:

  • Because you can never achieve the global minima, embrace the local minima. Testing trivial changes takes a few minutes, but the potential outcome is far greater than the cost of those minutes.
  • Constantly explore the best ways to increase your conversion rate by performing both trivial tests and radical redesign tests at regular intervals.
  • A/B testing is a tool and does not kill your imagination (in fact, you need your imagination most when designing variations).
  • Lastly, don’t feel guilty about performing A/B testing.

(al)


© Paras Chopra for Smashing Magazine, 2010. | Permalink | Post a comment | Add to del.icio.us | Digg this | Stumble on StumbleUpon! | Tweet it! | Submit to Reddit | Forum Smashing Magazine
Post tags:


  •   
  • Copyright © 1996-2010 BlogmyQuery - BMQ. All rights reserved.
    iDream theme by Templates Next | Powered by WordPress