The Internet has become a tremendous world-wide bazaar where massive amounts of information are being disseminated and consumed on a constant basis. One source of shared information is textual, e.g., blogs, tweets and various discussion fora. Among those, community reviewing has carved out an important niche. This category includes well-known sites, such as: Yelp, CitySearch, UrbanSpoon and TripAdvisor.

Some recent work has shown that many contributors to community reviewing sites accumulate a body of authored content that is sufficient for creating their stylometric profiles, based on rather simple features (e.g., digram frequency). A stylometric profile allows probabilistic linkage among reviews generated by the same person. This could be used to link reviews from different accounts (within a site or across sites) operated by the same user. On one hand, tracking authors of spam reviews can be viewed as a useful service. On the other hand, the ease of highly accurate linkage between different accounts is disconcerting and ultimately detrimental to privacy.

We consider both sides of this debate to be equally valid and do not choose sides. However, we believe that the privacy argument deserves to be considered, which triggers the motivation for this project:

  1. Explore and measure linkability of reviews.
  2. Develop techniques that mitigate review linkage.
