I implemented collaborative filtering (recommendation) with redis and python

What to make this time

Let's create the "People who bought this product also bought this product" function that often appears on Amazon.

スクリーンショット 2015-04-23 5.53.32.png

Element technology and implementation method

A function generally called the recommendation function (recommended function). There are two main ways to implement recommendations: "collaborative filtering" and "content-based filtering".

In content-based filtering, for example, when implementing the recommended product of "The Old Man and the Sea (Hemingway)" in the above example on a content basis, the attribute tag is added to the product in advance. For example, if you tag with the attribute of author, the book written by the same Hemingway will be displayed as a recommendation.

Collaborative filtering displays products bought by others who bought this product as recommendations.

This time, we will implement "collaborative filtering".

I use redis and python.

redis is KVS Use Redis SortedSet.

redis installation procedure

MacPorts:http://blog.katsuma.tv/2010/03/start_redis.html HomeBrew:http://qiita.com/items/3d2a2fc683ae19302071

Reasons to use redis

It is not realistic to calculate the recommended products each time from the viewpoint of the amount of calculation, and it was necessary to calculate in advance and ** record it in a form that is easy to take out **. (If you can easily retrieve and record, you can use other than Redis without any problem)

What is Sorted Set?

A list that automatically sorts (on the redis side) when data is entered

スクリーンショット 2015-04-23 4.26.57.png

Implementation of collaborative filtering

It can be implemented if the similarity of each product to product X can be obtained as a value.

スクリーンショット 2015-04-23 4.29.36.png

Similarity formula

There are many, but it is common to use the Jaccard index. In the sample data below, the formula for product A is 1/5. 1 means that one customer has purchased both product X and product A. That is, the intersection 5 is the total number of customers who purchased either product X or product A. That is, the union

スクリーンショット 2015-04-23 4.36.07.png

Sample data used this time

スクリーンショット 2015-04-23 5.23.01.png

Implementation

# -*- coding: utf-8 -*-
from __future__ import absolute_import
from __future__ import unicode_literals


def jaccard(e1, e2):
    """
Calculate the Jackard Index
    :param e1: list of int
    :param e2: list of int
    :rtype: float
    """
    set_e1 = set(e1)
    set_e2 = set(e2)
    return float(len(set_e1 & set_e2)) / float(len(set_e1 | set_e2))


def get_key(k):
    return 'JACCARD:PRODUCT:{}'.format(k)

#Customer ID that purchased product X is 1,3,5
product_x = [1, 3, 5]
product_a = [2, 4, 5]
product_b = [1, 2, 3]
product_c = [2, 3, 4, 7]
product_d = [3]
product_e = [4, 6, 7]

#Product data
products = {
    'X': product_x,
    'A': product_a,
    'B': product_b,
    'C': product_c,
    'D': product_d,
    'E': product_e,
}

# redis
import redis
r = redis.Redis(host='localhost', port=6379, db=10)

#Calculate the Jackard Index and record it in the Redis Sorted Set for each product
for key in products:
    base_customers = products[key]
    for key2 in products:
        if key == key2:
            continue
        target_customers = products[key2]
        #Calculate Jackard Index
        j = jaccard(base_customers, target_customers)
        #Record in Redis Sorted Set
        r.zadd(get_key(key), key2, j)

#Example 1 The person who bought the product X also bought this product.
print r.zrevrange(get_key('X'), 0, 2)
# > ['B', 'D', 'A']

#Example 2 The person who bought the product E also bought this product.
print r.zrevrange(get_key('E'), 0, 2)
# > ['C', 'A', 'X']

Let's see the value in redis

r.png

Let's check

Products B, D, and A are recommended for those who bought product X. When checked, the similarity is 0.5, 0.33, and 0.2, respectively, so it seems that they are properly recommended.

スクリーンショット 2015-04-23 5.21.23.png

Problems with this method

As the number of customers and products increases, the amount of calculation explodes and dies

Solution

Let's create an inverted index by Amazon http://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf

Recommended Posts

I implemented collaborative filtering (recommendation) with redis and python
[Recommendation] Content-based filtering and collaborative filtering
I installed and used Numba with Python3.5
PySpark learning record ③ Recommendation overview + Collaborative filtering easily implemented with Spark ML
[Python] I introduced Word2Vec and played with it.
I tried Jacobian and partial differential with python
I tried function synthesis and curry with python
I implemented Python Logging
Collaborative filtering with PySpark
Collaborative filtering with principal component analysis and K-means clustering
I want to handle optimization with python and cplex
User-based collaborative filtering in python
I tried fp-growth with python
Programming with Python and Tkinter
Encryption and decryption with Python
Python and hardware-Using RS232C with Python-
I made blackjack with python!
Implemented SMO with Python + NumPy
I compared Java and Python!
I implemented VQE with Blueqat
python with pyenv and venv
I tried gRPC with Python
I tried scraping with python
I made blackjack with Python.
I made wordcloud with Python.
Works with Python and R
I tried follow management with Twitter API and Python (easy)
I tried to make GUI tic-tac-toe with Python and Tkinter
This time I learned python III and IV with Prorate
Communicate with FX-5204PS with Python and PyUSB
Handle Base91 keys with python + redis.
Shining life with Python and OpenCV
I compared the speed of Hash with Topaz, Ruby and Python
I made a simple circuit with Python (AND, OR, NOR, etc.)
Robot running with Arduino and python
Install Python 2.7.9 and Python 3.4.x with pip.
Neural network with OpenCV 3 and Python 3
AM modulation and demodulation with python
[Python] font family and font with matplotlib
Scraping with Node, Ruby and Python
Implemented file download with Python + Bottle
I can't install python3 with pyenv-vertualenv
Scraping with Python, Selenium and Chromedriver
I want to solve APG4b with Python (only 4.01 and 4.04 in Chapter 4)
I tried web scraping with python.
Scraping with Python and Beautiful Soup
[Python] I implemented peripheral Gibbs sampling
I added Maki Horikita and Kanna Hashimoto and divided by 2 with python
I made a fortune with Python.
I made a Nyanko tweet form with Python, Flask and Heroku
I implemented Attention Seq2Seq with PyTorch
I sent an SMS with Python
JSON encoding and decoding with python
Solve AtCoder Problems Recommendation with python (20200517-0523)
Hadoop introduction and MapReduce with Python
Reading and writing NetCDF with Python
I liked the tweet with python. ..
I tried to easily detect facial landmarks with python and dlib
Reading and writing CSV with Python
Multiple integrals with Python and Sympy
"Gaussian process and machine learning" Gaussian process regression implemented only with Python numpy