[PYTHON] Recommendation tutorial using association analysis (concept)

About this article

I wrote an article because there weren't many tutorials implemented using sample data regarding recommendations.
There are methods that use machine learning etc. to create recommendations, but this is an article on how to create recommendations using methods based on statistics.
I will explain using python and open dataset.

What is a recommendation?

What is a recommendation in the first place? ** It is ** recommending products, services, etc. that the customer may be interested in from the provider side **.
Below is an example of ● MAZON, but if you look at the product page of a certain "comforter", ** "comforter cover" ** and ** "mattress" ** are recommended as the recommended products.

20200312_レコメンドについて (1).jpg

"Comforter", "quilt cover" and "mattress" certainly seem to be related, and I think some people buy them together.
This is exactly the aim, and by making recommendations, you can make people recognize the so-called "buying with you" and other products.

Types of recommendations

Recommendations can be broadly divided into ** "content-based" ** and ** "transaction-based" **. 20200312_レコメンドについて (2).jpg

Each has its advantages and disadvantages, but since it can be used in combination, it is possible to eliminate each other's disadvantages.

Recommendations that utilize association analysis

Recommendations that utilize association analysis, which is the theme of this article, correspond to ** "transaction-based recommendations" ** of the above types.
And it is also a "transaction-based recommendation" to recommend "comforter cover" and "mattress" for "comforter" in the previous example of ● MAZON.

スクリーンショット 2020-04-04 19.26.00.png

** "Transaction-based recommendations" basically result in products that are "buy with you". **

What is Association Analysis

Association analysis is to clarify the relevance of product XY, for example, "when product X is bought, product Y is easy to buy at the same time (or next)". This is exactly what you want to do in the recommendation.

Association analysis is a statistical approach, and detailed explanations and theories are very well organized on the here site. Therefore, please check this site for detailed explanations and theories, and in this article, we will explain abstract concepts without theory.

Association analysis method

There are two methods for assessing relevance in association analysis. ** 1. Method using Confidence 2. Method using a lift **

By the way, in this article, we will use the method ** using the lift value of ** 2. 2 is based on 1, so I will explain from 1 to the following.

1. Concept of Confidence

Simply put, it's a way to find a product Y that changes at the same time (or next) when a product X is bought. See the figure below. 20200312_レコメンドについて (3).jpg

First of all, as a small example, let's assume that you have extracted the data of the customer who bought the comforter as described above.
Looking at this, out of the 6 people, after purchasing the comforter, 2 people have a comforter cover, 2 people have mineral water, and 1 person has other products.
Looking at this result alone, ** the duvet cover and mineral water are most relevant to the comforter. ** **

This idea is the idea of ** confidence **.

2. Concept of lift

There is one point to consider in the concept of confidence in 1. See the figure below. 20200312_レコメンドについて (4).jpg

Let's say we've collected a little more data from these six customers.
In the above example, mineral water is a major product in the first place and is frequently purchased regardless of the comforter. In this case, it is wondering that mineral water is highly relevant only to comforters.

However, if mineral water is often bought as a result, you may think that you should recommend mineral water for that product X as well.
I can't say that this is not a good idea, but I personally think about the following.

  1. There is no need to recommend major products because they are recognized in the first place.
  2. When viewed from the customer side, it feels strange as a recommended product

** With the idea of lift, it is possible to omit such major products and find related products Y that are characteristic of product X. **
Actually, as shown on the here site, the lift value is calculated from the transaction data, and based on that, the highly relevant product XY It will be a flow as you lead the pair.

Tutorial using sample data

An article was added to here.

Recommended Posts

Recommendation tutorial using association analysis (concept)
Recommendation tutorial using association analysis (python implementation)
Recommendation of data analysis using MessagePack
Data analysis using xarray
Explanation of the concept of regression analysis using python Part 2
Data analysis using Python 0
Explanation of the concept of regression analysis using Python Part 1
Explanation of the concept of regression analysis using Python Extra 1
Orthologous analysis using OrthoFinder
Japanese morphological analysis using Janome
Pepper Tutorial (5): Using a Tablet