April 16, 2014

What it (Really) Takes: 3 absolutely indespensible elements for digital optimization

The internet is full of compelling examples of “winning” A/B tests that promise marked conversion or revenue growth. Similarly, there are countless articles about what it takes — from executive sponsorship, the cultural development to the critical roles — to develop a successful testing & optimization program. We are active participants in these discourses. We have contributed to both. They are both — the test “case studies” and the discussion of culture — very necessary requirements for inspiring more confidence in the practice of A/B testing, in particular, and digital optimization, in general.

But let me say that these conversations are increasingly reductive narratives that do a disservice to the long term work and benefit of continuous testing and optimization. Yes, you need to have winning tests. Yes, those are good. Yes, you need organizational buy in. Yes, you need a testing tool that works for your company. Yes, you need skilled front end dev resources, skilled analysts, experience designers and product managers. Those are all table stakes. All of them. All testing programs require them, but these are not the elements more correlative to a program’s success.

The critical elements that are most necessary to achieve long term success in testing and optimization are not the sexy ones. They are the behind the scenes guts and logic of the operation. They require tremendous thought and human talent to develop, but, ideally, they run quietly and effectively behind these scenes. Without these elements, you may have a great number of winning A/B tests. You may have executive support. You may have a great team. But you will (a) only be scratching the surface of disruptive success and (b) always be one tough question from a CEO/CFO/CTO away from an existential crisis for your testing and optimization program.

So, what are these three critical, infrequently discussed elements? They are:

  1. A replicable process for capturing and prioritizing the most relevant testable hypotheses.
  2. A transparent process and forum for identifying, discussing and mitigating key program risks.
  3. A simple, honest view of program health, success and investment.

Let’s unpack these.

Element #1, a replicable process for capturing and prioritizing the most relevant, testable hypotheses, first requires a method for capturing hypotheses. To “capture” something, you must first know where to look, then how to identify them and, finally, where to hold them, once captured. I have previously written about where to find relevant test hypotheses, but, suffice it to say, your historical analytics data, user-tests, customer feedback and product roadmap are rich areas to be synthesized in accordance with your KPIs.

Even if you know where to look, I’d suggest that many of your co-workers (even the very bright ones) may not even recognize what a hypothesis is. Strongly supported opinions, approved projects and long-held beliefs, assumed to be true, are more often than not simply hypotheses begging to be tested. Scratch the surface of a marketing plan or your consumer research and you are sure to find them. The great hypotheses are often the ones that people seem to accept as fact and are resistant to test. That is often precisely where disruption lives.

Articulating the hypotheses, though, is only part of the work. Harvesting this tribal knowledge — recording it, appending it, commenting on it – is also a necessary component to sustain the program’s energy and intelligence. You may use a shared drive or a SaaS tool (like UserVoice) for this. We have worked with several. None are perfect but all are better than a Word or Excel doc because of their collaborative features. The point is that all of those ideas need to be captured, tracked and stored for posterity in a place that is easily accessed, searchable, indexible, etc.

And finally, these ideas need to be prioritized via a simple and practical logic that ensures that the most relevant and impactful ideas surface from the pact. While everybody loves to test the “low hanging fruit,” there is massive opportunity cost to dabbling and avoiding those hypotheses that challenge impending feature launches or long-held beliefs. A good prioritization methodology accounts for the real cost (effort), the opportunity cost of NOT testing (relevance), the forecasted impact AND speed at which it can be validated. Every organization will have its own (likely similar) approach to prioritization, which is as much art as it is science. It’s critical, though, that your process is transparent and has a scoring methodology that you can point to as a dispassionate means for ending debates and moving forward. At his Opticon talk this week, my co-founder, Ryan Garner, will talk a bit about how we score and prioritize test ideas.

Element #2: a transparent process and forum for identifying, discussing and mitigating key program risks. On the surface, this might seem like project management 101. And perhaps this inclusion only underscore the point that A/B testing and optimization require tremendous project management. But I bring up risk mitigation not to simply highlight the test by test dependencies but also to surface those opinions and deep seeded fears that threaten to completely de-rail the momentum (or existence) of an optimization program. Perfectly fair questions related to site performance impact, the injection of “outside code” to production, MVP-style (quick and imperfect) creative, false-positives and inflated lift projections will assuredly crop up. They can’t be handled with “kid gloves.” They warrant respect but not dread or fear. They need to be addressed head on. Each of them is a legitimate risk that can be mitigated. Frequently, we see testing programs dabble endlessly with small tweaks and low risk tests, only to be dismayed by very mixed results. Why is there so much dabbling? My hypothesis is because those programs are afraid of confronting (perhaps) more senior executives fears that:

  • Testing will slow down website performance
  • Testing will break the website
  • Testing will slow down product development
  • Testing will introduce sub-par design experiences
  • Testing will confuse or alienate customers
  • Testing results are not credible

Go on. Keep going, It’s healthy to get it out. Each one of these perceived risks is absolutely legitimate and also absolutely able to be addressed, better understood and mitigated or completely remediated. There are also risks associated with each individual test, but I will stick to the highest level for this post. The same suggestion applies though. Identify risks and mitigation plans thoroughly. It takes one well-placed cynic in the organization, who feels like their concerns were not addressed to throw a program under the bus and/or into defensiveness.

We suggest that, before you dive into your very first tests, as you are codifying the mission and process for your program, you capture and address all of the most deeply held, “scary” risks. But don’t stop there. In reviewing your program health and value (element #3, below) you should continue set aside the space to identify new risks and commit to thoughtful mitigation plans. While it is unlikely that anybody in your organization is dead set against testing, you will be (probably not be) amazed at how fears can fester if not aired.

In their Opticon talk, Ryan and Jessica will get very real about questions, risks and concerns that you are likely to confront as you push this boulder up the mountain.

Element #3 is your scorecard. It is, quite simply, how you and your organization measures the value of your testing and optimization program. This scorecard should be easy to maintain, easy to access and and easy to interpret. A shared Google spreadsheet will most certainly suffice though a pretty Keynote or PowerPoint version might serve you well in executive read outs.

The scorecard should, minimally, document the following:

  • What is the status of the approved hypotheses?
  • Which tests have been completed?
  • What was the level of effort/investment in the execution of the completed tests?
  • Of those tests, what was the test metrics?
  • Did the hypothesis prove true for the test metric?
  • What percentage of your completed tests showed the hypothesis to be true?
  • Was there observed revenue lift during the test period for the winning variation?
  • If so, how much?
  • Did any of the results, in proving false or inconclusive, yield the reduction of opportunity costs?

These, in my opinion, are the most basic items to track. You could certainly contemplate more derived metrics around velocity, ROI, etc. But, I would strongly caution against the temptation to simply annualize and sum up revenue lift. Such blunt estimates – often conflating prediction with fact – are bandied in blogs and conferences. If you ask an A/B tester how much revenue lift they have created and they answer with a specific figure, staggering or modest, you can be assured that they have fallen into your trap. We have seen companies brag about lift that is greater than their actual annual revenue. And, all it takes to deflate these claims are the following questions:

  • What is the margin of error for your calculation?
  • How did you forecast out annual lift based on your limited data set?
  • Did you retest that precise test again and again to validate the observed lift?
  • How have you accounted for other market and product changes that might positively or negatively impact revenue since you launched your winning variation in production?

Somebody in your organization may force you to claim a single revenye lift figure. Resist it as long as you can. Focus on “observed” results. If still pushed, forecast longer term benefits for individual tests with margins of error. If you are still pushed to the edge of the plank, turn around, take off the blindfold, and, before you jump, ask (a) do we believe that our tests are leading to smarter and more successful digital product development and (b) are you confident that, if we continue to sustain the current testing and optimization program we will be more profitable than the alternative? To be sure, there is as much art and blind faith as there is data and dogma in the world of optimization.

I have no doubt that the “big winners” will continue to grab headlines. Similarly, I have no doubt that organizational change, in the interest of being data-driven, will be uniformly promoted and applauded. I say “hooray” to both. I support and applaud any of us behind the winners and the change. But, in order to sustain both endeavors, I’d suggest it is critical to understand and adopt those elements that enable scale and reduce friction. And, in our experience, those elements that best predict success and help avoid existential crisis are:

  1. The hypothesis development & prioritization process
  2. The risk identification and mitigation process
  3. The program scorecard

Go get em!