Contextual MAB
- The quality of a recommender system is often measured by A/B testing.
- We can use MAB as an alternative to A/B testing.
- The contextual bandit problem is a generalization of the multi-armed bandit that extends the model by making actions conditional on the state of the environment.
- Unlike the classical multi-armed bandit, it addresses the problem of identifying the most appropriate content at the best time for individual users.
- Contextual information may be used to group users by shared features via classification or clustering techniques with the assumption that users belonging to the same cluster tend to have similar behavior while users lying in different clusters have significantly different behavior.
- Here’s the formal definition of contextual MAB:

- A common real-world contextual bandit example is a news recommendation system.
- Given a set of presented news articles, a reward is determined by the click-through behavior of the user.
- If she clicks on the article, a payout of 1 is incurred and 0 otherwise.
- Click-through-rate (CRT) is used to determine the selection and placement of ads within the news recommendation application.
- Now suppose rewards are determined by CTR in conjunction with metadata about the user (e.g., age and gender), so recommendations can be further personalized.
- Take a 65 year old female and an 18 year male for example, both who read news articles from their mobile device.
- The recommendations for these two users should reflect their contrasting profiles.
- It wouldn’t make sense to show ads for retirement plans or mature women clothing stores (e.g., Talbots) to the 18 year old male.
- Contextual information may also include geographic location, time of day, day of the week, and season.
- Suppose geographic location metadata is available on the 18 year old male through his mobile device. He’s in close vicinity to University of Texas at Austin’s main campus and has expressed an interest in skate and surf shops via click-through behavior. With this contextual information about the user, the application should show ads for skate and surf shops within his current geographic location (e.g., Tyler’s).
- If it’s the beginning of the semester, say in September or January, ads for college textbook stores (e.g., University Coop) should be generated for this user, since it’s highly likely that he is a college student shopping for textbooks.
A/B Testing vs. MAB
- For most modern internet companies, wherever there is a metric that can be measured (e.g.,time spent on a page, click-through rates, conversion to sale), there is almost always a randomized trial behind the scenes, with the goal of identifying an alternative website design that provides improvements over the default design.