By Bob Holmes in San Francisco A COMPUTER program that analyses word usage can do just as good a job of grading essays as an experienced human marker. “I’m not ready to claim it’s foolproof, but I am ready to claim that it’s approximately as foolproof as a human,” says Thomas Landauer, a cognitive scientist at the University of Colorado at Boulder, who led the team that developed the program. Landauer unveiled the team’s creation last week at a meeting of the American Educational Research Association in San Diego. The computer first “learns” about the subject of the essays it is to mark by scanning relevant passages from course textbooks. It looks for statistical patterns of words occurring together, and then calculates the extent to which different written passages share these patterns. To grade an essay, the computer compares it with a set of sample essays of varying quality that have been marked by hand. “We assign a grade to it based on a weighted average of how similar it is to the sample essays. If it’s very similar to an essay that got a mark of 90, then it will probably get close to a 90,” says Peter Foltz, a cognitive psychologist at New Mexico State University in Las Cruces, who helped develop the technique. This approach is more subtle than programs that simply look for matching words—the way most Internet search engines work. Provided the program has first scanned a sufficiently wide range of texts, it can easily recognise that an essay about doctors, for instance, is similar to one that talks about physicians. Landauer says the Net search engine called Excite! uses essentially the same technology. The program performs well. In tests with essays by 94 university undergraduates on the anatomy and function of the human heart, the grades assigned by two trained essay readers showed a correlation of 0.77 on a scale from −1 for total disagreement to 1 for exact agreement. The program’s grades gave a correlation of 0.68 to one reader and 0.77 to the other. The program also contains several checks to foil cheaters, most of which the researchers will not describe. But Landauer notes that students cannot gain high marks simply by packing their essays full of technical terms they do not understand. “You can’t get the right combination of words by just listing them,” he says. Essays that show little resemblance to any of the samples are set to one side by the grading program. These unusual essays—which are likely to be either brilliantly original or stunningly stupid—can then be graded by hand. Landauer admits that his program cannot judge an essay’s literary quality, so it will never please those who believe that students whose understanding of their subject is accompanied by an elegant use of language should receive higher marks. America’s largest supplier of standardised tests—including the admissions tests used by most US universities—is also developing a program along similar lines. “When we score essays we have up to three readers doing the scoring,” says Lawrence Frase, head of cognitive and instructional science at the Educational Testing Service in Princeton, New Jersey. “If our computer system works as well as a human, why not have the computer replace one human?