Analysis of 115 A/B Tests: Average Lift is 4%, Most Lack Statistical Power

What can you learn from 115 publicly available A/B tests? Usually, not much, since in most cases you would be looking at case studies with very basic data about what was tested and the outcome of the A/B test. Confidence intervals, p-values and other m…

What can you learn from 115 publicly available A/B tests? Usually, not much, since in most cases you would be looking at case studies with very basic data about what was tested and the outcome of the A/B test. Confidence intervals, p-values and other measurements of uncertainty will often be missing, and when present they […] Read More...

Risk vs. Reward in A/B Tests: A/B testing as Risk Management

What is the goal of A/B testing? How long should I run a test for? Is it better to run many quick tests, or one long one? How do I know when is a good time to stop testing? How do I choose the significance threshold for a test? Is there something speci…

What is the goal of A/B testing? How long should I run a test for? Is it better to run many quick tests, or one long one? How do I know when is a good time to stop testing? How do I choose the significance threshold for a test? Is there something special about 95%? […] Read More...

Futility Stopping Rules in AGILE A/B Testing

In this article we continue our examination of the AGILE statistical approach to AB testing with a more in-depth look into futility stopping, or stopping early for lack of positive effect (lack of superiority). We’ll cover why such rules are help…

In this article we continue our examination of the AGILE statistical approach to AB testing with a more in-depth look into futility stopping, or stopping early for lack of positive effect (lack of superiority). We’ll cover why such rules are helpful and how they help boost the ROI of A/B testing, why a rigorous statistical rule […] Read More...