Yejin Choi has devised a statistical technique for spotting clusters of fake hotel reviews. The technique can be used to unmask the increasingly common practice of writing phony reviews to talk up a property or a product or running down a competitor.
Fake reviewers “might think that it was a perfect crime,” says Choi, a SUNY Stony Brook assistant professor. “The truth is, they distorted the shape of the review scores of their own hotels, and that leaves a footprint of the deceptive activity, and the more they do it, the stronger it becomes.”
Review scores produce a normal distribution pattern that looks roughly like the letter J in which it has a relatively higher level of one-star reviews, fewer twos, threes, and fours, and then a high number of five stars. This reflects the consumer tendency to buy what they like and therefore like what they buy. It also reflects the fact that consumers are less likely to write a review if a purchase generally meets expectations but are more likely to write them if their experience was extremely positive or extremely negative, explains Paul Pavou, associate professor at the Fox School of Business at Temple University who studies online commerce. This normal distribution is distorted when phony reviews are added to the mix.
Choi and her team began the search for the telltake signatures left by fake reviewers by finding reviewers who had written at least 10 reviews more than a day or two apart whose rating tended to stay within the average for all hotels. They deemed these to be reliable reviewers who weren’t engaging in phony promotions.
They compared the ratings from the reliable reviewers to those by one-time reviewers to detect the hotels that had large discrepancies between these two sets of reviewers. Those hotels were labeled suspicious. Other telltale clues Choi focused on were the ratio of positive to negative reviews and sudden bursts of reviewing activity that might tip off a marketing campaign.
To validate her theory Choi referred to work she had done earlier with computer scientist Jeff Hancock of Cornell University in which they had hired people to write phony hotel reviews to be analyzed by a machine-learning algorithm for textual clues. By applying her new method to the fake reviews previously generated Choi achieved a 72% success in detecting clusters of fake reviews.
“It’s really unlikely some random strategy would achieve 72 percent accuracy,” Choi noted, but admitted that it’s difficult to be absolutely positive that any particular review of actually phony. However, the approach would be more reliable in detecting a set of reviews that contains a significant number of phony reviews.
Her technique can “pinpoint where the densities of false reviews are for any given hotel,” said Choi.
As more people rely on user reviews to guide their online travel and other buying decisions, finding a way to minimize the impact of fake reviews has become more commercially valuable. the US Federal Trade Commission has even begun finding marketers who employ fake reviews.
Choi’s algorithm can be used by a travel site to apply a correcting algorithm to average review scores to minimize the distortions created by phony reviews. It can also be paired with other approaches like textual analysis for more reliable results.
Yejin Choi revceived a BS in computer science and engineering at Seoul National University, and a PhD in computer science at Cornell. She spent the summer of 2009 as a research intern at Yahoo! Research. She has over three years of work experience at Microsoft and LG Electronics Research Center. She joined the faculty of State University of New York at Stony Brook in September of 2010.