Implemented Matrix Factorization (python)

I implemented Matrix Factorization, which is the basis of the recommended model, in python.

What is a recommender system?

It is a system that predicts the preference for items and presents the most suitable items to specific users. Examples of implementation as a service include Amazon's "Recommended products for you" and Youtube's "Recommended videos".

What is Matrix Factorization (MF)?

An algorithm often used in recommendation.

Based on the evaluation value information for the user's item, the evaluation value of the item that has not been evaluated is predicted, and the item that is likely to have a high evaluation value is recommended.

First, there is a matrix (A) consisting of m users and n items, and each value in it is the evaluation value for an item of a certain user. This matrix (A) is decomposed into a user matrix (U) consisting of users and latent variables (P) and an item matrix (I) consisting of items and latent variables (Q).

The value obtained by taking the inner product of the user matrix and the item matrix is used as the predicted value. The parameter is updated many times until the error between the predicted value and the correct answer value falls below the threshold value or the number of repetitions exceeds a certain number.

Model implementation / data reading

We adopted https://grouplens.org/datasets/movielens/100k/ as the data set.

import numpy as np
import random
import pandas as pd
import math

class MFx(object):
	#Set each variable
	def __init__(self,K=20,alpha=1e-6,beta=0.0):
		self.K=K
		self.alpha=alpha
		self.beta=beta

	def fit(self,X,n_user,n_item,n_iter=100):
		self.R=X.copy()
		self.samples=X.copy()

		#Set initial value for latent factor variable
		self.user_factors = np.random.rand(n_user,self.K)
		self.item_factors = np.random.rand(n_item,self.K)

		#stochastic gradient descent
		self.loss=[]
		for i in range(n_iter):
			self.sgd()
			mse=self.mse()
			self.loss.append((i,mse))

	def sgd(self):
		np.random.shuffle(self.samples)
		for user,item,rating in self.samples:
			err=rating-self.predict_pair(user,item)

			#update parameter
			self.user_factors[user] += self.alpha*(err*self.item_factors[item]-self.beta*self.user_factors[user])
			self.item_factors[item] += self.alpha*(err*self.user_factors[user]-self.beta*self.item_factors[item])

	def mse(self):
		predicted = self.predict(self.R)
		error=np.hstack((self.R,np.array(predicted).reshape(-1,1)))
		error=np.sqrt(pow((error[:,2]-error[:,3]),2).mean())
		return error

	#Evaluation value prediction for a certain item by a certain user (inner is the inner product of vectors)
	def predict_pair(self,user,item):
		return np.inner(self.user_factors[user],self.item_factors[item])

	def predict(self,X):
		rate=[]
		for row in X:
			rate.append(self.predict_pair(row[0],row[1]))
		return rate

	def get_full_matrix(self):
		return np.inner(self.user_factors,self.item_factors)

#Data read
def load_ml100k():
	samples=pd.read_csv('data/ml-100k/u.data',sep='\t',header=None)

	samples=samples.iloc[:,:3]
	samples.columns=['user','item','rate']

	samples['user']=samples['user']-1
	samples['item']=samples['item']-1

	return samples

#Make the dataset an array type
df = np.array(load_ml100k())

#Delete the same value and count the number of users and items
n_user=np.unique(df[:,0]).max()+1
n_item=np.unique(df[:,1]).max()+1
n_rate=np.unique(df[:,2]).max()

#Random order
random.shuffle(df)
#80% of the total size is for training and the rest is for testing
train_size=int(df.shape[0]*0.8)
train_df=df[:train_size]
test_df=df[train_size:]

#MF
MF=MFx(K=20,alpha=0.01,beta=0.5)
MF.fit(train_df,n_user,n_item,n_iter=10)

Evaluation method

The following two evaluations were performed on the recommendation accuracy. --Evaluation of predicted value --Ranking evaluation

Mean squared error

Find the squared error between the predicted value and the correct answer value, and calculate the average. The lower the value, the more accurate the recommendation.

#Predict the evaluation value of items in test data
pre=MF.predict(test_df)
#f=np.array(pre)
#print(type(test_df))
#Create a matrix with the predicted value after the evaluation value of the test data
ret1=np.hstack((test_df, np.array(pre).reshape(-1, 1)))
#Calculate the mean square error between the predicted value and the actual evaluation value
print("pred")
print(np.sqrt(pow((ret1[:,2]-ret1[:,3]),2).mean()))

NDCG Create a ranking based on the correct answer value and a ranking based on the predicted value. Compare the two rankings and measure how close the predicted ranking is to the correct ranking. The range of NDCG values is 0 to 1, and the closer it is to 1, the better the ranking is generated.

def get_dcg(f):
    dcg=0.0
    s=1
    if type(f[0])==float:
        f=[f]
    for i in f:
        if s==1:
            dcg+=i[0]
            s+=1
        else:
            dcg+=i[0]/math.log2(s)
            s+=1
    return dcg
    
def get_ndcgs(pre):
    user=[]
    x=[]
    for i in np.argsort(pre[:,0]):
        user.append(pre[i][0])
        x.append(pre[i])
    users=list(set(user))
    columns=['user','item','real','pred']
    y=pd.DataFrame(data=pre, columns=columns, dtype='float')
    y.set_index("user",inplace=True)
    y_real=y.sort_values(['user','real'],ascending=[True, False])   
    y_pred=y.sort_values(['user','pred'],ascending=[True, False])

    ndcg=0
    fff=users
    for ii in fff:
        fr=y_real.loc[ii,"real":"pred"].values.tolist()
        fp=y_pred.loc[ii,"real":"pred"].values.tolist()
        ndcg+=get_dcg(fp)/get_dcg(fr)
    return ndcg/len(fff)

print("NDCG")
print(get_ndcgs(ret1[:,:4]))

Recommended Posts

Implemented Matrix Factorization (python)
[Python] Matrix operation
Implemented SimRank in Python
Python Not Implemented Error
I implemented Python Logging
Implemented Shiritori in Python
Python Math Series ② Matrix Multiplication
[Python 3] Prime factorization in 14 lines
Implemented SMO with Python + NumPy
Matrix multiplication in python numpy
Transposed matrix in Python standard
Sudoku solver implemented in Python 3
6 Ball puzzle implemented in python
Recommender system using matrix factorization [Unsupervised learning with python Chapter 10]
Matrix representation with Python standard input
Python
Implemented file download with Python + Bottle
Draw a scatterplot matrix in python
Widrow-Hoff learning rules implemented in Python
Non-negative Matrix Factorization (NMF) with scikit-learn
Implemented label propagation method in Python
Implemented Perceptron learning rules in Python
Implemented in 1 minute! LINE Notify in Python
A simple HTTP client implemented in Python
[Python] Matrix multiplication processing time using NumPy
[Python] [pandas] How is pd.DataFrame (). T implemented?
Implemented in Python PRML Chapter 7 Nonlinear SVM
VBA user tried using Python / R: Matrix
I implemented Cousera's logistic regression in Python
[Python] Region Covariance: Covariance matrix and computer vision
Find and check inverse matrix in Python
Create Excel file with Python + similarity matrix
Implemented in Python PRML Chapter 5 Neural Networks
Implemented Stooge sort in Python3 (Bubble sort & Quicksort)
Visualization of NMF (Nonnegative Matrix Factorization) Learning
Implemented in Python PRML Chapter 1 Bayesian Inference
Implemented Python wrapper for Qiita API v2
CheckIO (Python)> Non-unique Elements> I implemented it
Comparison of matrix transpose speeds with Python
Matrix Convolution Filtering-Reinventor of Python Image Processing-