How Netflix does A/B Testing / by Gavin Lau

Have you ever wondered why Netflix has such a great streaming experience? Do you want to learn how they completed their homepage plus other UI layout redesigns through A/B testing? If so, then this article is for you!

I’ll start with sharing my takeaways from a Designers+Geeks event I attended last week at Yelp. The two great speakers Anna Blaylock and Navin Iyengar, both product designers at Netflix, walked through insights gleaned from their years of A/B testing on tens of millions of Netflix members, and showed some relevant examples from the product to help attendees think about their own designs.

Photo from the presentation

Photo from the presentation

Experimentation

I really liked this first slide of the presentation and think it’s smart to use an image from the TV show “Breaking Bad” to explain the concept of experimentation!

Photo from the presentation

Photo from the presentation

The Scientific Method

Photo from the presentation

Photo from the presentation

 

Hypothesis

In science, a hypothesis is an idea or explanation that you then test through study and experimentation. In design, a theory or guess can also be called a hypothesis.

Photo from the presentation

Photo from the presentation

 

The basic idea of a hypothesis is that there is no pre-determined outcome. It is something that can be tested and that those tests can be replicated.

“The general concept behind A/B testing is to create an experiment with a control group and one or more experimental groups (called “cells” within Netflix) which receive alternative treatments. Each member belongs exclusively to one cell within a given experiment, with one of the cells always designated the “default cell”. This cell represents the control group, which receives the same experience as all Netflix members not in the test.” — Netflix blog

Here’s how A/B testing is done at Netflix: as soon as the test is live, they track specific metrics of importance. For example, it could be elements like streaming hours and retention. Once the participants have provided enough meaningful conclusions, they move onto the efficacy of each test and define a winner out of the different variations.

Image from the presentation

Image from the presentation

Photo from the presentation that shows the hypothesis process

Photo from the presentation that shows the hypothesis process

Experiment

Experimentation is the act of experimenting. Many companies like Netflix run experiments to generate user data. It is also important to take time and effort to organize the experiment properly to ensure that both the type and amount of data is sufficient and available to clarify the questions of interest as efficiently as possible.

You probably have noticed that the featured show on the Netflix homepageseems to change whenever you log in. They’re all part of Netflix’s complex experiments to get you to watch their shows.

Homepage when I logged in the 1st time

Homepage when I logged in the 1st time

Image from the presentation: the House of Cards page when seen as a signed-out user

Image from the presentation: the House of Cards page when seen as a signed-out user

Home page when I logged in the 2nd time

Home page when I logged in the 2nd time

Home page when I switch the account user name

Home page when I switch the account user name

Home page when I switch name to kids

Home page when I switch name to kids

Home page when I’m not signed in

Home page when I’m not signed in

The idea of A/B testing is to present different content to different user groups, gather their reactions and use the results to build strategies in the future. According to this blog post written by Netflix engineer Gopal Krishnan:

If you don’t capture a member’s attention within 90 seconds, that member will likely lose interest and move onto another activity. Such failed sessions could at times be because we did not show the right content or because we did show the right content but did not provide sufficient evidence as to why our member should watch it.

Netflix did an experiment back in 2013 to see if they can create a few artwork variants that increase the audience for a title. Here is the result:

Image from Netflix blog

Image from Netflix blog

 

It was an early signal that members are sensitive to artwork changes. It was also a signal that there were better ways they could help Netflix members find the types of stories they were looking for within the Netflix experience.

Netflix later created a system that automatically grouped artwork that had different aspect ratios, crops, touch ups, localized title treatments but had the same background image. They replicated experiment on their other TV shows to track relative artwork performance. Here are some examples:

Image from Netflix blog, the two marked images significantly outperformed all others.

Image from Netflix blog, the two marked images significantly outperformed all others.

Image from Netflix blog, the last marked images significantly outperformed all others.

Image from Netflix blog, the last marked images significantly outperformed all others.

Check out these two blog posts to learn more about Netflix A/B testing:

What I learned

A/B testing is the most reliable way to learn user behaviors. As designers, we should think about our work through the lens of experimentation.

Image from the presentation: your instinct is not always right

Image from the presentation: your instinct is not always right

 

  1. When and why A/B testing
    Once you have a design in production, use A/B testing to tweak the design and target two key metrics: retention and revenue. By A/B testing changes throughout the product and tracking users over time, you can see whether your change improves retention or increases revenue. If it does, make it the default. In this way A/B testing can be used to continuously improve business metrics.
  2. Are your users finding or doing one thing you want them to find or to do?
    My experience is that often times users cannot always complete a task as fast as you expect, and sometimes they can’t even find a certain button you put on a page. The reasons can vary: it might because the design is not intuitive enough; the color is not vibrant enough; the user is not tech savvy; they don’t know how to make a decision because there are too many options on one page, and so on.
  3. Are your intuitions correct?
    Sadly, when it comes to user behavior, our intuitions could be wrong, and the only way to prove it is through A/B testing. It is the best way to validate whether one UX design is more effective than another. At work, our consumer product team have proved that through A/B testing on our real estate website. For example, they wanted to figure out whether they can make a design change to improve the registration rate for users who clicked on a Google Ad. They created a few different experimental designs and tested them. They thought the design that only hides the property image would win, but found that the design that hides both the property image and the price got the highest conversation rate.
  4. Explore the boundaries
    The best ideas come from many idea explorations. At work, our product team works collaboratively across many different projects. With so many parties involved (from designers to product managers to developers), we get to explore the boundaries together. Some of the best ideas are sometime from the developers or the product managers after testing out our prototypes.
  5. Observe what people do, not what they say
    When talking to users, it’s important to keep this in mind: they always say one thing but do it differently. I conducted a few user testing sessions this week and have one perfect example to show you why. I had this one user testing out a Contacts list view prototype and asked him if he usually sorts/filters his Contacts. He said no because he wouldn’t need do so. However, when he discovered the new filters dropdown menu, he was amazed by how convenient it is to sort and filter multiple options at a time and immediately asked when that can roll out in production.
  6. Use data to estimate size of opportunity
    • It’s always about the whys
    • Data can help shape ideas
    • Check if any A/B testing are in conflict

Isn’t it so fun to be a UI and UX designer? :) Knowing your user is the most exciting part of design process! There is no finished design, but many chances for iteration to improve the design and give our users the best experience possible! I enjoy the opportunity to make subtle tweaks for our users, measure their reactions and work with the product team to figure out the next steps.

 

 

Source: https://uxdesign.cc/how-netflix-does-a-b-t...