A bit of history of usability testing – and why it’s not so expensive any more
Early models – academic research; validation
Early usability testing (in the 1980s, mostly for software interfaces) had its origins in two traditions that called for large numbers (and therefore much money). First, it was modeled on academic psychology studies where having enough people to reach statistical validity was a critical factor. Second, the goal was usually “validation” – testing once at the end just before release with the goal of showing that all was well.
But those were not the right models
We quickly learned that those were not the right models.
It’s not research. You’re not trying to write an academic paper; you’re trying to launch a usable web site.
And it’s not about validation. Waiting until just before release caused enormous frustration. Products always turned out to have serious problems. Development teams learned what was wrong – but only when it was too late to fix the problems.
The focus of usability testing then shifted. Key words for usability testing now are
- early
- often
- iterative
(John Gould and colleagues at IBM gave us a wonderful case study of doing it the right way back in 1987 with their article on the 1984 Olympic Message System. Unfortunately, for years after that, most usability testing was still being done on the “once, at the end” model.)
The new model: small-scale and “discount”
When you have a series of usability tests, as your prototype moves through design and development, you can do each test cycle with only a few people. You accumulate a large number of people over the series of tests. Jakob Nielsen introduced the notion of “discount usability engineering” in the early 1990s.
Just how many participants you need in each test cycle is still a matter of controversy. And a 2007 paper suggests that what matters most isn’t how many people you involve in your usability test but how many different tasks they try. (Lindegaard and Chattratichart)
The new model: looking for and diagnosing problems
Another difference is the focus of usability testing. Of course, we want to know what is working well, so they we keep those aspects and build on them. But the greatest benefit of usability testing is finding out what is not working well. Usability testing helps you find and understand the problems people are likely to have with your web site. You watch and listen to a few people have problems so that you can fix those problems – and thus keep hundreds (thousands? millions?) of other people from having those same problems.
The new model: “quick and clean” formative evaluation
Some people call this small-scale, small-price usability testing “quick and dirty.” I prefer to call it “quick and clean.” It’s actually a well-established social science technique: formative evaluation. In formative evaluation, you create a prototype, try it out with people, fix it, try it out again, and so on – while it’s still being developed. That’s our current model of usability testing. And it works.
I agree that one should test early and often, and that many of these tests should be “quick & dirty” — ie. cheap, thus few users. However, our research shows (paper soon to be published)that you do find important issues with a large sample size that can be missed with a small sample size.
There are several reasons for doing a large sample size test if you can possibly afford it:
1. Opinion-based matters: With only a handful of testers, if, for example, 2 mention disliking the colour scheme, do you change it? You need a much larger sample size to know if their views are common and strongly held. (I didn’t just make up this example: Web Mystery Shoppers has tested sites where the colour scheme WAS a significant factor.)
2. Variety: If you only test a few users, chances are that you will miss some of the technical problems with the site. With a large number of users, testing from their own computers, you uncover problems with certain computer configurations that may be significant and need attention. (When Windows XP was still fairly new we discovered that a major bank’s website kept returning errors when XP users tried to conduct banking transactions online.)
3. Politics: In many organizations (especially big ones), a large sample size is often needed for political reasons. It is much harder for the top exec’s to ignore the views of 100+ site users than the opinion of a handful. Often the web manager has a pretty good idea of what needs doing, but needs external validation to win the internal support. Sad but true.
Tema Frank