Airbnb is an online two-sided marketplace that matches people who rent out their homes (‘hosts’) with people who are looking for a place to stay (‘guests’). We use controlled experiments to learn and make decisions at every step of product development, from design to algorithms. They are equally important in shaping the user experience.

While the basic principles behind controlled experiments are relatively straightforward, using experiments in a complex online ecosystem like Airbnb during fast-paced product development can lead to a number of common pitfalls. Some, like stopping an experiment too soon, are reledvant to most experiments. Others, like the issue of introducing bias on a marketplace level, start becoming relevant for a more specialized application like Airbnb. We hope that by sharing the pitfalls we’ve experienced and learned to avoid, we can help you to design and conduct better, more reliable experiments for your own application.

Why Experiments?

Experiments provide a clean and simple way to make causal inference. It’s often surprisingly hard to tell the impact of something you do by simply doing it and seeing what happens, as illustrated in Figure 1.

Figure 1 — It’s hard to tell the effect of this product launch.

Figure 1 — It’s hard to tell the effect of this product launch.

The outside world often has a much larger effect on metrics than product changes do. Users can behave very differently depending on the day of week, the time of year, the weather (especially in the case of a travel company like Airbnb), or whether they learned about the website through an online ad or found the site organically. Controlled experiments isolate the impact of the product change while controlling for the aforementioned external factors. In Figure 2, you can see an example of a new feature that we tested and rejected this way. We thought of a new way to select what prices you want to see on the search page, but users ended up engaging less with it than the old filter, so we did not launch it.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/764f31ac-b4ea-4b01-880a-743c1523b1ca/Untitled.png

When you test a single change like this, the methodology is often called A/B testing or split testing. This post will not go into the basics of how to run a basic A/B test. There are a number of companies that provide out of the box solutions to run basic A/B tests and a couple of bigger tech companies have open sourced their internal systems for others to use. See Cloudera’s Gertrude, Etsy’s Feature, and Facebook’s PlanOut, for example.

The case of Airbnb

At Airbnb we have built our own A/B testing framework to run experiments which you will be able to read more about in our upcoming blog post on the details of its implementation. There are a couple of features of our business that make experimentation more involved than a regular change of a button color, and that’s why we decided to create our own testing framework.

First, users can browse when not logged in or signed up, making it more difficult to tie a user to actions. People often switch devices (between web and mobile) in the midst of booking. Also given that bookings can take a few days to confirm, we need to wait for those results. Finally, successful bookings are often dependent on available inventory and responsiveness of hosts — factors out of our control.