[PYTHON] Algorithm for finding collusion spam reviewers

Introduction

Logo In online shopping and restaurant review sites Collude and post dummy reviews so that review results are unreasonably high or low I want to find a spam reviewer.

This time, Fraud Eagle Was announced at an international conference called AAAI Conference on Weblogs and Social Media in 2013. org / ocs / index.php / ICWSM / ICWSM13 / paper / viewFile / 5981/6338) The algorithm was implemented.

Review data

In Fraud Eagle, consider the review graph as shown in the figure below.

graph2.png

In other words, the person who posted the review (reviewer) and the destination (product) of the review are at the top, respectively. The review relationship is represented by a branch. The review itself can be text or any number of stars, It is necessary to be able to judge whether the review is positive or negative. This time, the review will take a number between 0 and 1, and if it is 0.5 or more, it will be positive. If not, I decided to judge it as negative.

How to use

The rgmining-fraud-eagle created this time can be installed from PyPI.

pip install --upgrade rgmining-fraud-eagle

A package called fraud_eagle will be installed, so Create an instance of the ReviewGraph class from it. Fraud Eagle takes one parameter that is greater than 0 and less than 0.5. The optimum value of the parameter changes depending on the data set, but this time we will set the center 0.25.

import fraud_eagle as feagle

graph = feagle.ReviewGraph(0.25)

Next, add reviewers, products, and reviews to the graph. When creating a graph as shown in the above figure

reviewers = [graph.new_reviewer("reviewer-{0}".format(i)) for i in range(2)]
products = [graph.new_product("product-{0}".format(i)) for i in range(3)]
graph.add_review(reviewers[0], products[0], 0.2)
graph.add_review(reviewers[0], products[1], 0.9)
graph.add_review(reviewers[0], products[2], 0.6)
graph.add_review(reviewers[1], products[0], 0.1)
graph.add_review(reviewers[1], products[1], 0.7)

It becomes. Reviewers and products are created using the new_reviewer, new_product methods of ReviewGraph. The review is added by the ʻadd_review` method.

After creating the graph, execute the ReviewGraph ʻupdate` method until the update converges. Fraud Eagle uses an algorithm called Loopy belief propagation. One loop is supported by one call of update. Since the update method returns the maximum amount of modification, it ends when it becomes sufficiently small.

print("Start iterations.")
max_iteration = 10000
for i in range(max_iteration):

   # Run one iteration.
   diff = graph.update()
   print("Iteration %d ends. (diff=%s)", i + 1, diff)

   #Maximum correction amount is 10^-If it is 5, I think it has converged
   if diff < 10**-5:
       break

Finally, the analysis result is acquired. The reviewer returned by the new_reviewer method has an attribute of ʻanomalous_score`. This attribute takes a value from 0 to 1 and indicates how singular (spammer) the reviewer is.

for r in graph.reviewers:
    print(r.name, r.anomalous_score)

Also, the product object returned by the new_product method has an attribute called summary. This value returns the weighted average of the review score given to the item by the reviewer ʻanomalous_score. (Average value that does not consider reviews of reviewers with large ʻanomalous_score)

for p in graph.products:
    print(p.name, p.summary)

Summary

We implemented the Fraud Eagle algorithm to find colluded spam reviewers. We also publish artificial data for evaluation of this type of spam reviewer detection algorithm. The usage of artificial data is summarized in Data set for evaluation of spam reviewer detection algorithm. I hope it will be helpful for those who are working on anti-spam review measures.

Recommended Posts

Algorithm for finding collusion spam reviewers
Data set for evaluation of spam reviewer detection algorithm
Dijkstra algorithm for beginners
Camouflaged Spam Reviewer Discovery Algorithm