A/B testing

"A/B testing" is a shorthand for a simple randomized controlled experiment, in which a number of samples (e.g. A and B) of a single vector-variable are compared.

[6] The following example illustrates an A/B test with a single variable: Suppose a company has a customer database of 2,000 people and decides to create an email campaign with a discount code in order to generate sales through its website.

The company creates two versions of the email with different call to action (the part of the copy which encourages customers to do something — in the case of a sales campaign, make a purchase) and identifying promotional code.

If, however, the aim of the test had been to see which email would generate the higher click-rate—that is, the number of people who actually click onto the website after receiving the email—then the results might have been different.

For example, even though more of the customers receiving the code B1 accessed the website, because the Call To Action didn't state the end-date of the promotion many of them may feel no urgency to make an immediate purchase.

In order to optimize revenue, they tested dozens of different hyperlink hues to see which color the users tend to click more on.

[12] A/B tests are sensitive to variance; they require a large sample size in order to reduce standard error and produce a statistically significant result.

In applications where active users are abundant, such as popular online social media platforms, obtaining a large sample size is trivial.

However, using a technique coined by Microsoft as Controlled-experiment Using Pre-Experiment Data (CUPED), variance from before the experiment start can be taken into account so that fewer samples are required to produce a statistically significant result.

In December 2018, representatives with experience in large-scale A/B testing from thirteen different organizations (Airbnb, Amazon, Booking.com, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, and Stanford University) summarized the top challenges in a SIGKDD Explorations paper.

[15] The challenges can be grouped into four areas: Analysis, Engineering and Culture, Deviations from Traditional A/B tests, and Data quality.

[16] Experimentation with advertising campaigns, which has been compared to modern A/B testing, began in the early twentieth century.

[18] Modern statistical methods for assessing the significance of sample data were developed separately in the same period.

[4] A/B testing has been claimed by some to be a change in philosophy and business-strategy in certain niches, though the approach is identical to a between-subjects design, which is commonly used in a variety of research traditions.

[21][22][23] A/B testing as a philosophy of web development brings the field into line with a broader movement toward evidence-based practice.

[25] On an e-commerce website, the purchase funnel is typically a good candidate for A/B testing, since even marginal-decreases in drop-off rates can represent a significant gain in sales.

A/B testing (especially valid for digital goods) is an excellent way to find out which price-point and offering maximize the total revenue.

[28] For example, Obama's team tested four distinct buttons on their website that led users to sign up for newsletters.