Page 12 – Ecommerce and Digital Conversion Resources

Magic Dozen Workshop

The Wizard of Ads are putting on a marketing workshop in Seattle this September 18th and there are still slots open. If you’re an owner-operator of a local or regional business that advertises — or that you think ought to advertise… And if you’re not a…

Radio’s Missing Creative Revolution

The Creative Revolution in advertising came in late 50s and 60s, courtesy of Bill Bernbach and his agency, Doyle Dane Bernbach. To understand what was so revolutionary, it’ll help to review what advertising was like in the early to mid 50s: Text domina…

Why Radio NEEDS Theatre of the Mind

I’ve slammed radio pretty hard for forgetting about and basically forsaking Theatre of the mind. So it’s fair to ask: what’s so darn important about Theatre of the Mind? First, I believe that Theatre of the Mind is radio’s secret weapon — both for cont…

Radio’s Ugly Baby

Radio Ink has had a lively discussion lately between radio executives and agency owners about their frustrations in dealing with each other. It’s a worthwhile conversation, but mostly for how what’s not being admitted is warping the dialogue. Here’s wh…

Political Advertisers Discover the Power of the Genesis Story

Few advertising strategies work as well as telling your Genesis (or Origin) Story. I’ve seen that time and again in working with local business clients in a wide variety of industries and I’ve written about Origin Stories in some depth. My marketing me…

The Persuasive Power of Privileged Moments

Check out this video: Ok, now check out a similar technique: Or how about the same general technique used for another cause: And just so you can really see the pattern, how about this: In every case, the ad’s effectiveness comes from what Cialdini calls Pre-Suasion and the creation of “Privileged Moments.” Cialdini’s overall thesis is that the […]

Data & Business Impact with Feras Alhlou

A few months ago I had the opportunity to chat with my friend and work partner Feras Alhlou, Co-Founder and Principal Consultant at E-Nor & Co-Author of Google Analytics Breakthrough. Feras and I have known each other for almost 10 years, and it is…

Here are the questions we discussed, checkout the answers in the video below. I have also added some of my favorite highlights from the interview after the video.

[01:05] What's the process that you use to make sense out of data?
[02:41]During this process, what do you actually do when you start working with data?
[04:07]When analyzing data, how can we make sure that we are looking at the context to understand what is happening around us?
[07:24]How can Data Studio and better data visualizations help companies make more data-driven decisions?

We believe analytics is a business process. We start with an audit, both from the business side and the technical side - we want to engage the stakeholders to understand how to measure what matters most to the business. Once we have the data in place, we go to the reporting layer - how do we report on this data? Then, we start to be able to analyze the data and find some actionable insights. Last, we can move to testing and personalization - that's when you really can have an impact on the business. Read more about E-Nor's Optimization Framework

There's a whole lot of data these days, right? Life used to be simple for marketers: one device, a few channels - now there's data everywhere, mobile, social, web, and of course backend data. I think one of the first things we need to do is to understand the context around that data, focusing on the following:

The integrity of the data: is it clean, was it collected properly, is it raw or aggregated? Understand the data collection, how the data was put together.

Having a set of meta data, information about the data: if you're looking at Google Analytics metrics, knowing more about the user. For example, if you have a subscription based model: Is it a premium user? Is it a standard user? Having that additional data gives a whole lot of context, to the person who's consuming that data.

I would definitely advice to have a data road map. Start with what you own, web and mobile analytics data. Then, start augmenting reports with basic social data, maybe you can get a little bit into the qualitative aspect with that. And last but not least, a great product that was recently introduced by Google as the Surveys product. There are surveys we can do on our own properties to understand the voice of the customer. But also use it to do market research - it used to be expensive and cumbersome to do it, but now you can easily run a Google survey and do a lot targeting.

And here is Feras and me having fun in the Google Analytics studio!

Daniel Waisberg and Feras Alhlou

image

Data That Matters: Maternal Mortality Trends

I have always appreciated the work of the Bill and Melinda Gates Foundation, it is really amazing to see people working so hard to make the world a better place. But I was left speechless when I opened their new report: GoalKeepers 2017. It tells the stories behind the data to help "accelerate progress in the fight against poverty by helping to diagnose urgent problems, identify promising solutions, measure and interpret key results, and spread best practices".

First and foremost, the goals themselves are superb - I can't think of more important issues to fight for. But I was also impressed by the information design, it is spotless. They used the right medium for each piece of information: text, images, videos, animations and charts. The report is engaging and, before you realize, you spent an hour going through it. So I was touched both as a person that cares about what is happening around me and as a professional appreciating good work.

Interestingly, a few months ago I was looking for some data to build a sample report, and I chose the maternal mortality dataset from UNICEF's data portal. I built the report and used it, but didn't take the time to publish it - ever heard of procrastination? :-)

In this article I will provide more context into GoalKeepers 2017 using publicly available UNICEF data on maternal mortality. I'll start with some words about the GoalKeepers 2017 report - then, I'll discuss some of the steps I used to create my report and the insights I learned from the data.

Stories behind the data: maternal mortality in Ethiopia

One of the highlights that I found particularly interesting in GoalKeepers 2017 was the maternal mortality case study, focusing on how Ethiopia is fighting this terrible issue. Here is how Bill and Melinda define it.

"If you were trying to invent the most efficient way to devastate communities and put children in danger, you would invent maternal mortality." Bill and Melinda Gates

Most people would agree that mothers are probably the most important pillar for a child (I'm a father, and I think fathers are important too, but as my mom always says: "you will never be a mother!"). So it is devastating to learn that in 2015, UNICEF registered 302,530 maternal deaths due to complications from pregnancy or childbirth - 168.7 deaths per 100,000 live births. And remember that a mother's death does not mean one child left motherless, women can already have many more children when it happens.

However, as GoalKeepers 2017 shows, we've made some great progress, and the trends look good. In their case study, they show how Ethiopia is taking giant steps on their fight against maternal mortality, and the chart they used is simple and powerful: mortality went from 843 to 357 per 100,000 from 1990 to 2015 - that's great!

maternal mortality ethiopia

But in order to understand our global status better, it is important to put more context into the mix: what's happening around the world? And how does Ethiopia compare to other places?

Maternal Mortality around the world

To have a better understanding of how both Ethiopia and the world in general is progressing, I took a deeper look in the maternal mortality dataset from UNICEF's statistics website. The data is publicly available, well organized, and it seems trustworthy. I downloaded the xlsx file and formated it for Data Studio using this spreadsheet; then, I imported it to Data Studio (learn how).

Below you'll find my data visualization embedded, scroll down to read some of my conclusions based on the data.

I know, the horizontal bar chart goes on forever! But I think it gives an interesting perspective.

Disclosure: I do not pretend to be a specialist in global health, my knowledge about the efforts in the area are minimal. The insights below are based on the data only - I'm assuming UNICEF publishes accurate and unbiased data. With that said, I hope it will help people understand better the status and trends of maternal mortality around the world.

Here are my insights on maternal mortality based on UNICEF's data.

Amazing progress - but not solved: out of 183 countries in the data, only 13 are worse off in 2015 compared to 1990. The trajectory is mostly good - globally, we saw a decrease from 339 to 168 in maternal mortality rate, an average of 44% decrease. For context, Ethiopia's rate decreased by 71%, significantly better than the average. However, it is clear from the map that Africa is bleeding, with Sierra Leone losing 1,360 for 100,000 giving birth - that's very bad.
United States and South Africa have alarming trends: both countries are among the top 10 countries in the 'getting worse' table (sorted by 1990-2015 % change) - South Africa had an absolute 1,500 deaths and USA 550, that's a lot of loss. Even though they don't have the highest rates, it is quite alarming to see the negative trends and absolute numbers. For more on the USA trend check this article, which discusses possible reasons and links to more in-depth analyses.
Cambodia and Turkey up-and-to-the-right, but still a lot of deaths: both countries have shown great progress, appearing in the top 10 'getting better' table - but they still need a big push, especially Cambodia.

I think those are interesting points to think about as we continue fighting this horrible issue - the more data (and analyses) we have, the more prepared we will be. If you are looking for a place to start, UNICEF has a lot of interesting datasets in their data portal. Let's help make the world a better place!

image

Embedding Google Data Studio Visualizations

Last year I wrote about the Marvel vs. DC war on the big screen. It was super fun to merge two of my passions (data visualization and comics) in one piece. It started with my curiosity to understand what all those movies are amounting to, and I think i…

One of the things that annoyed me was that I had to link to the interactive visualization, readers couldn't see the amazing charts in my article (!) - so I ended up including static screenshots with some insights explained through text. While some people clicked through to play with the data, I suspect many just read the piece and went away, which is suboptimal - when I publish a story, my goal is to allow readers to interact with it quickly and effectively.

I am extremely excited that now Google Data Studio allows users to embed reports in any online environment, which empowers us to create an improved experience for telling stories with data. This feature will be an essential tool for data journalists and analysts to effectively share insights with their audiences.

A year has passed since I did the Marvel vs. DC visualization, so I thought it was time to update it (5 new movies!) and share some insights on how to use Data Studio report embedding to create effective data stories.

Enable embedding

The first step to embed reports is a pretty important one: enable embedding! This is quite simple to do:

Open the report and click on File (top left)
Click on Embed report
Check Enable embedding and choose the width and height of your iframe (screenshot below)

Google data studio enable embedding

Please note that the embedding will work only for people that have access to the report. If the report is supposed to be publicly available, make sure that you make it viewable to everyone. If the report should be seen only to people in a group, then make sure to update your sharing settings accordingly. Read more about sharing reports on this help center article.

But how do you make sure you are choosing the right sizes? Read on...

Choosing the right visualization sizes

Needless to say, people access websites in all possible device categories and platforms, and we have little control over that. But we do have control over how we display information in different screens. The first obvious recommendation (and hopefully all the Interweb agrees with me) - make your website responsive! I am assuming you have already done that.

On Online Behavior, the content area is 640px wide, so the choice is pretty obvious when Data Studio asks me the width I want for my iframe - make sure you know the width of the content area where the iframe will be embedded. Also, since you want the visualizations to resize as the page responds to the screen size, set your Display mode to Fit to width (option available on Page settings).

Without further ado, here is the full Marvel vs. DC visualization v2!

I personally think the full dataviz looks pretty good when reading on a desktop, I kept it clean and short. However, as your screen size decreases, even though the report iframe will resize the image, it will eventually get too small to read. In addition, I often like to develop my stories intertwining charts and text to make it more digestible. So here is an alternative to embedding the whole thing...

Breaking down your dataviz into digestible insights

As I mentioned, sometimes you want to show one chart at a time. In this case, you might want to create separate versions of your visualization. Below I broke down the full dataviz into small chunks. Note that you will find three different pages in the iframe below, one per chart (see navigation in the bottom of the report)

Right now, you can't embed only one page, which means that if you want to show a specific chart that lives on page 2 of a report you would need to create a new report, but that's a piece of cake :-)

I am looking forward to seeing all the great visualizations that will be created and embedded throughout the web - why not partner with our data to create insightful stories? Let's make our blogs and newspapers more interesting to read :-) Happy embedding!

BONUS: Data Studio is the referee in the Marvel vs. DC fight!

As I was working on my dataviz, I asked my 10yo son (also a comic enthusiast) to create something that I could use to represent it. He created the collage / drawing below, I think it is an amazing visual description of my work :-)

Data Studio referee

image

Statistical Design in Online A/B Testing

A/B testing is the field of digital marketing with the highest potential to apply scientific principles, as each A/B experiment is a randomized controlled trial, very similar to ones done in physics, medicine, biology, genetics, etc. However, common advice and part of the practice in A/B testing are lagging by about half a century when compared to modern statistical approaches to experimentation.

There are major issues with the common statistical approaches discussed in most A/B testing literature and applied daily by many practitioners. The three major ones are:

Misuse of statistical significance tests
Lack of consideration for statistical power
Significant inefficiency of statistical methods

In this article I discuss each of the three issues discussed above in some detail, and propose a solution inspired by clinical randomized controlled trials, which I call the AGILE statistical approach to A/B testing.

1. Misuse of Statistical Significance Tests

In most A/B testing content, when statistical tests are mentioned they inevitably discuss statistical significance in some fashion. However, in many of them a major constraint of classical statistical significance tests, e.g. the Student’s T-test, is simply not mentioned. That constraint is the fact that you must fix the number of users you will need to observe in advance.

Before going deeper into the issue, let’s briefly discuss what a statistical significance test actually is. In most A/B tests it amounts to an estimation of the probability of observing a result equal to or more extreme than the one we observed, due to the natural variance in the data that would happen even if there is no true positive lift.

Below is an illustration of the natural variance, where 10,000 random samples are generated from a Bernoulli distribution with a true conversion rate at 0.50%.

Natural Variance

In an A/B test we randomly split users in two or more arms of the experiment, thus eliminating confounding variables, which allows us to establish a causal relationship between observed effect and the changes we introduced in the tested variants. If after observing a number of users we register a conversion rate of 0.62% for the tested variant versus a 0.50% for the control, that means that we either observed a rare (5% probability) event, or there is in fact some positive difference (lift) between the variant and control.

In general, the less likely we are to observe a particular result, the more likely it is that what we are observing is due to a genuine effect, but applying this logic requires knowledge that is external to the statistical design so I won’t go into details about that.

The above statistical model comes with some assumptions, one of which is that you observe the data and act on it at a single point in time. For statistical significance to work as expected we must adhere to a strict application of the method where you declare you will test, say, 20,000 users per arm, or 40,000 in total, and then do a single evaluation of statistical significance. If you do it this way, there are no issues. Approaches like “wait till you have 100 conversions per arm” or “wait till you observe XX% confidence” are not statistically rigorous and will probably get you in trouble.

However, in practice, tests can take several weeks to complete, and multiple people look at the results weekly, if not daily. Naturally, when results look overly positive or overly negative they want to take quick action. If the tested variant is doing poorly, there is pressure to stop the test early to prevent losses and to redirect resources to more prospective variants. If the tested variant is doing great early on, there is pressure to suspend the test, call the winner and implement the change so the perceived lift can be converted to revenue quicker. I believe there is no A/B testing practitioner who will deny these realities.

These pressures lead to what is called data peeking or data-driven optional stopping. The classical significance test offers no error guarantees if it is misused in such a manner, resulting in illusory findings – both in terms of direction of result (false positives) and in the magnitude of the achieved lift. The reason is that peeking results in an additional dimension in the test sample space. Instead of estimating the probability of a single false detection of a winner with a single point in time, the test would actually need to estimate the probability of a single false detection at multiple points in time.

If the conversion rates were constant that would not be an issue. But since they vary without any interventions, the cumulative data varies as well, so adjustments to the classical test are required in order to calculate the error probability when multiple analyses are performed. Without those adjustments, the nominal or reported error rate will be inflated significantly compared to the actual error rate. To illustrate: peeking only 2 times results in more than twice the actual error vs the reported error. Peeking 5 times results in 3.2 times larger actual error vs the nominal one. Peeking 10 times results in 5 times larger actual error probability versus nominal error probability. This is known to statistical practitioners as early as 1969 and has been verified time and again.

If one fails to fix the sample size in advance or if one is performing multiple statistical significance tests as the data accrues, then we have a case of GIGO, or Garbage In, Garbage Out.

2. Lack of Consideration for Statistical Power

In a review of 7 influential books on A/B testing published between 2008 and 2014 we found only 1 book mentioning statistical power in a proper context, but even there the coverage was superficial. The remaining 6 books didn’t even mention the notion. From my observations, the situation is similar when it comes to most articles and blog posts on the topic.

But what is statistical power and why is it important for A/B experiments? Statistical power is defined as the probability to detect a true lift equal to or larger than a given minimum, with a specified statistical significance threshold. Hence the more powerful a test, the larger the probability that it will detect a true lift. I often use “test sensitivity” and “chance to detect effect” as synonyms, as I believe these terms are more accessible for non-statisticians while reflecting the true meaning of statistical power.

Running a test with inadequately low power means you won’t be giving your variant a real chance at proving itself, if it is in fact better. Thus, running an under-powered test means that you spend days, weeks and sometimes months planning and implementing a test, but then failing to have an adequate appraisal of its true potential, in effect wasting all the invested resources.

What’s worse, a false negative can be erroneously interpreted as a true negative, meaning you will think that a certain intervention doesn’t work while in fact it does, effectively barring further tests in a direction that would have yielded gains in conversion rate.

Power and Sample Size

Power and sample size are intimately tied: the larger the sample size, the more powerful (or sensitive) the test is, in general. Let’s say you want to run a proper statistical significance test, acting on the results only once the test is completed. To determine the sample size, you need to specify four things: historical baseline conversion rate (say 1%), statistical significance threshold, say 95%, power, say 90%, and the minimum effect size of interest.

Last time I checked, many of the free statistical calculators out there won’t even allow you to set the power and in fact silently operate at 50% power, or a coin toss, which is abysmally low for most applications. If you use a proper sample size calculator for the first time you will quickly discover that the required sample sizes are more prohibitive than you previously thought and hence you need to compromise either with the level of certainty, or with the minimum effect size of interest, or with the power of the test. Here are two you could start with, but you will find many more on R packages, GPower, etc:

Making decisions about the 3 parameters you control – certainty, power and minimum effect size of interest is not always easy. What makes it even harder is that you remain bound to that one look at the end of the test, so the choice of parameters is crucial to the inferences you will be able to make at the end. What if you chose too high a minimum effect, resulting in a quick test that was, however, unlikely to pick up on small improvements? Or too low an effect size, resulting in a test that dragged for a long time, when the actual effect was much larger and could have been detected much quicker? The correct choice of those parameters becomes crucial to the efficiency of the test.

3. Inefficiency of Classical Statistical Tests in A/B Testing Scenarios

Classical statistics inefficiency

Classical tests are good in some areas of science like physics and agriculture, but are replaced with a newer generation of testing methods in areas like medical science and bio-statistics. The reason is two-fold. On one hand, since the hypotheses in those areas are generally less well defined, the parameters are not so easily set and misconfigurations can easily lead to over or under-powered experiments. On the other hand – ethical and financial incentives push for interim monitoring of data and for early stopping of trials when results are significantly better or significantly worse than expected.

Sounds a lot like what we deal with in A/B testing, right? Imagine planning a test for 95% confidence threshold, 90% power to detect a 10% relative lift from a baseline of 2%. That would require 88,000 users per test variant. If, however, the actual lift is 15%, you could have ran the test with only 40,000 users per variant, or with just 45% of the initially planned users. In this case if you were monitoring the results you’d want to stop early for efficacy. However, the classical statistical test is compromised if you do that.

On the other hand, if the true lift is in fact -10%, that is whatever we did in the tested variant actually lowers conversion rate, a person looking at the results would want to stop the test way before reaching the 88,000 users it was planned for, in order to cut the losses and to maybe start working on the next test iteration.

What if the test looked like it would convert at -20% initially, prompting the end of the test, but that was just a hiccup early on and the tested variant was actually going to deliver a 10% lift long-term?

The AGILE Statistical Method for A/B Testing

AGILE Statistical Method for A/B Testing

Questions and issues like these prompted me to seek better statistical practices and led me to the medical testing field where I identified a subset of approaches that seem very relevant for A/B testing. That combination of statistical practices is what I call the AGILE statistical approach to A/B testing.

I’ve written an extensive white-paper on it called “Efficient A/B Testing in Conversion Rate Optimization: The AGILE Statistical Method”. In it I outline current issues in conversion rate optimization, describe the statistical foundations for the AGILE method and describe the design and execution of a test under AGILE as an easy step-by-step process. Finally, the whole framework is validated through simulations.

The AGILE statistical method addresses misuses of statistical significance testing by providing a way to perform interim analysis of the data while maintaining false positive errors controlled. It happens through the application of so-called error-spending functions which results in a lot of flexibility to examine data and make decisions without having to wait for the pre-determined end of the test.

Statistical power is fundamental to the design of an AGILE A/B test and so there is no way around it and it must be taken into proper consideration.

AGILE also offers very significant efficiency gains, ranging from an average of 20% to 80%, depending on the magnitude of the true lift when compared to the minimum effect of interest for which the test is planned. This speed improvement is an effect of the ability to perform interim analysis. It comes at a cost since some tests might end up requiring more users than the maximum that would be required in a classical fixed-sample test. Simulations results as described in my white paper show that such cases are rare. The added significant flexibility in performing analyses on accruing data and the average efficiency gains are well worth it.

Another significant improvement is the addition of a futility stopping rule, as it allows one to fail fast while having a statistical guarantee for false negatives. A futility stopping rule means you can abandon tests that have little chance of being winners without the need to wait for the end of the study. It also means that claims about the lack of efficacy of a given treatment can be made to a level of certainty, permitted by the test parameters.

Ultimately, I believe that with this approach the statistical methods can finally be aligned with the A/B testing practice and reality. Adopting it should contribute to a significant decrease in illusory results for those who were misusing statistical tests for one reason or another. The rest of you will appreciate the significant efficiency gains and the flexibility you can now enjoy without sacrifices in terms of error control.

image

AGILE Statistical Method for A/B Testing