A reminder about the implementation of recommendations in Python

Introduction

I need to use recommendations in my work, and I researched the Python recommendation library, so I summarized it as a memorandum. In addition, since the main focus is on a brief introduction to the library and no explanation of the algorithm, etc., please refer to another document as necessary.

I haven't dealt with the recommendations before, but nowadays I feel that I have to study seriously. ..

crab HP: http://muricoca.github.io/crab/ GitHub: https://github.com/muricoca/crab

This library was the first to be found in Python's collaborative filtering implementation. It is said that item-based and user-based collaborative filtering can be calculated, but since the last update of master on GitHub was 4 years ago, it seems that it has not been used much recently. .. It didn't work well in modern environments due to the dependencies of other libraries.

Presentation materials at the conference http://conference.scipy.org/scipy2011/slides/caraciolo_crab_recommendation.pdf

python-recsys HP: http://ocelma.net/software/python-recsys/build/html/index.html GitHub: https://github.com/ocelma/python-recsys

Singular value decomposition and collaborative filtering using neighborhood algorithms are possible. The calculated model can be saved and reused as a file, and there are many methods for evaluation, so this is the easiest to use unless you are pursuing accuracy.

However, it does not support the method using Nonnegative Matrix Factorization (NMF), which is the mainstream in recent years, so if you want to use it, you should implement it using the following ninfa.

By the way, I also needed to calculate the similarity between items this time, so I chose this one.

nimfa HP: http://nimfa.biolab.si GitHub: https://github.com/marinkaz/nimfa

The method using NMF, which has become popular in recent years, does not seem to exist as a recommendation library, but since the matrix operations that are important in implementation are provided as a library, it can be implemented without much difficulty by using this. Seems to be feasible. The implementation algorithms are quite abundant, and there were more than 10 types of Factorization implementations alone. There is a difference. .. (ry

** References on NMF ** [Matrix Factorization Techniques for Recommender Systems] (http://www2.research.att.com/~volinsky/papers/ieeecomputer.pdf) [Basics of non-negative matrix factorization NMF and its application to data / signal analysis] (http://www.kecl.ntt.co.jp/icl/signal/sawada/mypaper/829-833_9_02.pdf) [Non-negative Matrix Factorization] (http://d.hatena.ne.jp/a_bicky/20100325/1269479839)

Spark + MLlib

MLlib - Collaborative Filtering

Spark + MLlib also has a recommendation implementation, so if you want to distribute the amount of data that cannot be scaled up, you should use this. In MLlib, Matrix Factorization is implemented using a technique called Alternate Least Square, and a Python API is also provided.

** References ** [Spark and Matrix Factorization] (http://stanford.edu/~rezab/slides/reza_codeneuro.pdf)

[Implementation of recommendation system in Dataproc using Spark's MLlib] (http://qiita.com/kndt84/items/b975ac9e6552f5289ec9)

Summary

When implementing recommendations in Python, if you want to use it easily, I think that using python-recsys is the quickest way. However, it does not support NMF, which is popular these days, so if you want to use NMF, I think it is better to implement it by using nimfa.

Also, if you want to handle a large amount of data that cannot be scaled up, Spark + MLlib has a recommendation implementation and a Python API is also provided, so I think it is better to use this. This has been verified separately and will be introduced in another article.

Recommended Posts

A reminder about the implementation of recommendations in Python
Get the caller of a function in Python
Make a copy of the list in Python
A note about the python version of python virtualenv
[Note] About the role of underscore "_" in Python
About the behavior of Model.get_or_create () of peewee in Python
Output in the form of a python array
About the ease of Python
Implementation of quicksort in Python
About the features of Python
About testing in the implementation of machine learning models
Implementation of life game in Python
Implementation of original sorting in Python
About the basics list of Python basics
Find out the apparent width of a string in python
A simple Python implementation of the k-nearest neighbor method (k-NN)
[Note] Import of a file in the parent directory in Python
Find the eigenvalues of a real symmetric matrix in Python
A story about trying to introduce Linter in the middle of a Python (Flask) project
Check the behavior of destructor in Python
Write the test in a python docstring
Display a list of alphabets in Python 3
About the virtual environment of python version 3.7
How to check the memory size of a variable in Python
Read the standard output of a subprocess line by line in Python
Run the Python interpreter in a script
How to check the memory size of a dictionary in Python
The result of installing python in Anaconda
A function that measures the processing time of a method in python
The basics of running NoxPlayer in Python
In search of the fastest FizzBuzz in Python
Get the number of readers of a treatise on Mendeley in Python
Get a capture of the entire web page in Selenium Python VBA
If you want a singleton in python, think of the module as a singleton
Check the in-memory bytes of a floating point number float in Python
Test & Debug Tips: Create a file of the specified size in Python
Receive a list of the results of parallel processing in Python with starmap
Get a datetime instance at any time of the day in Python
I made a program to check the size of a file in Python
Draw a graph of a quadratic function in Python
[Python] Get the files in a folder with Python
[Python] Sort the list of pathlib.Path in natural sort
Match the distribution of each group in Python
Why the Python implementation of ISUCON 5 used Bottle
View the result of geometry processing in Python
A memo about writing merge sort in Python
Find the number of days in a month
Rewriting elements in a loop of lists (Python)
Find the divisor of the value entered in python
Data analysis in Python: A note about line_profiler
Think about building a Python 3 environment in a Mac environment
A memorandum about the Python tesseract wrapper library
The story of reading HSPICE data in Python
Make a joyplot-like plot of R in python
Solving the equation of motion in Python (odeint)
[Python] A rough understanding of the logging module
Get a glimpse of machine learning in Python
A well-prepared record of data analysis in Python
[Modint] Decoding the AtCoder Library ~ Implementation in Python ~
About the * (asterisk) argument of python (and itertools.starmap)
A discussion of the strengths and weaknesses of Python