Awhile back I saw this tweet from a UX consultant that struck a nerve: “I don’t trust designers who don’t want their designs a/b tested. They’re not interested in knowing if they were wrong.”
I responded quickly in the short staccato bursts afforded by 140 char limits while I was (at the time) jamming on some key designs needed by the Citrix CEO for his keynote the following day. Oops! Perhaps not the best time to engage in twitter arguments ;-) For a long time now I’ve wanted to elaborate beyond twitter why (I think) many designers (certainly myself and most of my closest peers) do not love A/B testing.
And believe me, it has nothing to do with a lack of interest in being proven wrong.
As a former UI design consultant for two of the most famously data-driven internet firms, Netflix and LinkedIn, I totally understand the arguments for A/B testing and its commercial value at mercilessly incremental levels of marginal revenue value, literally nickels & dimes across millions of clicks, etc. Yep, I get all that. Massive scales, long tail value chain, so forth. Tons of money! I get it.
However, as I responded via tweets, as a designer, I must defend and assert aesthetic integrity as much as I can, keeping in mind key business metrics and technical limits. And, quite frankly, the first victims of A/B testing are beauty, elegance, charm, and grace. Instead we get a unsightly pastiche of uneven incrementalism lacking any kind of holistic cohesiveness or suggestive of a bold, vivid, nuanced vision that inspires users. A perplexing mashup of visuals, behaviors, and IA/navigation that leaves one gasping for air. It is the implicit charter of a high-quality design team (armed with user researchers and content strategists!) to propose something a user may not be able to imagine, that is significantly better, since they are so conditioned by mediocre design in the mainstream. (Look up Paul Rand’s infamous quote)
So why don’t many designers like A/B testing? I think it’s mainly the following:
* A/B testing may only be as effective as the designs being tested, which may or may not be high quality solutions. Users are not always the best judge of high quality design. That’s why you hire expert designers of seasoned skills, experience, judgment, and yes the conviction to make a call as to what’s better overall.
* As is true with any usability test, you gotta question the motives behind the participants’ answers/reactions. Instead, biz/tech folks look at A/B test results as “the truth” rather than a data point to be debated. Healthy skepticism is always warranted in any testing. Uncovering the rationale for a metric is vital.
* A/B testing is typically used for tightly focused comparisons of granular elements of an interface, resulting in poor pastiches with results drawn from different tests.
* How do you A/B test novel interaction models, conceptual paradigms, visual styles (by the way, visuals & interactions have a two-way rapport, they inform each other, can’t separate them–see Mike Kruzeniski’s talks) which may vary wildly from before? Would you A/B test the Wii or Dyson or Prius or iPhone? Against what???
* A/B testing locks you into just two comparative options, an exclusively binary (and thus limited) way of thinking. What about C or D or Z or some other alternatives? What if there are elements of A & B that could blend together to form another option? Avenues for generative design options are shut down by looking at only A and only B.
• Finally A/B testing can undermine a strong, unified, cohesive design vision by just “picking what the user says”. A designer (and team) should have an opinion at the table and be willing to defend it, not simply cave into a simplistic math test for interfaces.
Ultimately, no A/B test proves a design “wrong”. Designs can’t be “proven” wrong, only demonstrated to be in need of more effective improvement or better iteration. Therein lies the real flaw of the original statement. This assumption that designs are either “right” or “wrong” is inaccurate. Instead designs are “better” or “worse” depending on the audience & context and purpose, not to mention business strategy as well. Designers (and researchers) seasoned in the craft of software understand this deeply.
A/B test results perpetuate a falsely comforting myth that designs can be graded like a math test, in which there’s a single right answer. Certainly this myth soothes the nerves of an anxious exec about to make a multi-million dollar bet on the company’s future :-) Wanna relieve anxiety? Take prozac. Wanna achieve top quality design results, then assert confidence in a rigorous creative process as promoted (and well articulated) by Adam Richardson, Luke Williams, Jon Kolko, and others…as well as in your design team. Because, if you hired top quality designers & researchers with a sensble PM and skilled Engin team, you more than likely have a pretty darn good product on your hands .
At end of the day, A/B testing should NOT be used as a litmus test of a design or a designer. It’s a single data point, that’s all. It can be compelling, no doubt. Its level of impact and value varies per product/company/market, however. And just like Roe v Wade which has become an unfair litmus test for Supreme Court candidates (as part of a greater political-media circus, a whole separate issue), using A/B testing in this way only polarizes things, and makes the vetting process of a design or designer unnecessarily difficult. And you risk dissuading top quality design talent from joining the team’s cause for good, beautiful, useful designs that improve the human condition. After all, isn’t that what we are all fighting for? A/B testing is simply one tool; not something to judge the character or quality of a professional (nor her work) striving to do what’s right with integrity.