7_bayesian_ab

Bayesian A/B Testing

Table of Content

1. Bayesian A/B Testing 2. Loss between A & B 3. Stopping Criterion 4. Why Bayesian A/B testing?

1. Bayesian A/B Testing• In the Bayesian approach to A/B testing, the setup is the same as the frequentist approach, except that the way that we're interpreting the results is going to be different.• Just as a reminder, the Bayes' rule is

P (H | D) = \frac{P (D | H) . P (H)}{P (D)}

• The idea is that we want to know the probability of some hypothesis being true given some data.–

P (H | D)

→ posterior–

P (D | H)

→ likelihood → probability of some data given the hypothesis–

P (H)

→ prior–

P (D) = \sum_{h}^{} P (D | H_{h}) P (H_{h})

→ probability of data given all the hypotheses (in our case, they are null and alternative hypotheses). • The likelihood term

P (D | H) = \prod_{i}^{} P^{x_{i}} (1 - P)^{1 - x_{i}}

• The prior term

P (H) = \frac{P^{𝛼 - 1} (1 - P)^{𝛽 - 1}}{B (𝛼, 𝛽)}

→ Beta distribution (e.g. distribution over

C T P

probabilities)– Note: When

𝛼 = 𝛽 = 1

, it means we have uninformed prior.• • Note: Posterior and prior are going to be the same type of distribution.• Note: We interpret the posterior parameters as distribution.• In the Bayesian approach we compare the two posterior distributions of treatment and control groups (and not just two numbers as in frequentist approach).• Here, we want to know the probability of group

B

being greater than group

A

.– We examine the Beta distributions produced by group

B

and group

A

.– We sample from

A

and

B

's posterior and we get the percent of times that

B > A

. 2. Loss between A & B• Let's say we ended up choosing

B

over

A

, even though

B

was actually worse than

A

. How would this be possible?– The posterior distributions of

A

and

B

can have a small overlaps. – The small overlap means that there's a chance that we could actually pick

B

and get

A

's result.– So what we really want is to sample the differences between

A

and

B

and that will give us the loss from choosing

B

over

A

in the case that

B

is actually worse.• With the loss, we can multiply that by the probability that

A > B

→ This would give us the expected loss (EL). → – Assuming

B > A

when it's not →

E L = P (A > B) . \max (C T P_{A} - C T P_{B}, 0)

– Note: Expected loss is in the same unit as our metric.• Another way to look at it is to look at the expected gain.– Assuming

B > A

when it is →

E L = P (B > A) . \max (C T P_{B} - C T P_{A}, 0)

3. Stopping Criterion• If the expected loss is less than some threshold that we don't care about, then we stop.– That means that as the experiment goes on, if the two groups are actually different, then the expected loss will actually shrink.– This is because the two distributions are getting further and further apart, and the variance surrounding these distributions is growing thinner and thinner.– So, even if we make a mistake and launch the wrong one, the probability of that happening grows smaller and smaller.– The threshold should be something that we're comfortable with losing. 4. Why Bayesian A/B testing?• Generally, it's easier to interpret the results.• If I tell you p-value is

0.05

and the CI is some amount, although informational, but more often we want to know what's the probability that the experience

B

is better than the experience

A

.– With the Bayesian approach we can answer this question directly.• Also, there's often fewer samples that we have to collect to reach a launch decision.– This means faster experiments and faster improvements.• One of the tools, in addition to frequentist tools, is Visual Web Optimizer. Back to Top