[PYTHON] I tried to visualize the Beverage Preference Dataset by tensor decomposition.

This article is the 9th day article of Furukawa Lab Advent_calendar. This article was written by a student at Furukawa Lab as part of his studies. The content may be ambiguous or the expression may be slightly different.

Introduction

I wanted to draw an article to introduce Beverage Preference Data Set, but the program is not running yet. , I will edit it from time to time.

Beverage Preference Data Set

Beverage Preference Data Set is the actual data of related data published by Furukawa Laboratory. Please refer to the link for detailed rules.

Data from a survey of 604 users on how to evaluate 14 types of drinking water in each of 11 situations.

In other words, it is the relational data observed by the combination of the elements of the three populations (person) x (drinking water) x (situation). image.png

import The steps to import the Beverage Preference Data Set are as follows: The download_file and zip_extract methods Python Tips: I want to download a zip file from the Internet and use it I borrowed from.

import pandas as pd
import numpy as np

filename = download_file('http://www.brain.kyutech.ac.jp/~furukawa/beverage-e/BeveragePreferenceDataset.zip')
zip_extract(filename)
df = pd.read_table('./BeveragePreferenceDataset/Beverage604.txt', header=None, delim_whitespace=True)
df.shape
# (8456, 11)

Convert this Dataframe to 3rd order tensor data.


X = np.zeros((604, 14, 11))
for i in range(X.shape[0]):
  Before = i * 14
  X[i] = df.iloc[Before:(14*(i+1))].values
X.shape
# (604, 14, 11)

Tensor decomposition

CP decomposition

About CP decomposition Pioneer (tensor decomposition with pytorch (CP decomposition)) is here, so I will explain it lightly.

CP decomposition is a straightforward generalization of matrix factorization, which decomposes the cubic tensor $ X $ using three vectors as follows.

X = \sum_{r=1}^R u_r \circ v_r \circ w_r

image.png

R=1 image.png

R=2 image.png

image.png

U (user) is sprayed in an oval shape, and V (drinking water) is likely to be different from the others by only two types.

Summary

I'd like to try HOSVD and Tucker as well. I'll try again when I have time. This time I tried a linear tensor decomposition method, but there is also a * Tensor SOM * that corresponds to a non-linear tensor decomposition. If you are interested, please try playing with the link below.

TensorSOM3 Viewer (drinking water data) ver Japanese

Recommended Posts

I tried to visualize the Beverage Preference Dataset by tensor decomposition.
I tried to visualize the spacha information of VTuber
[Python] I tried to visualize the follow relationship of Twitter
[TF] I tried to visualize the learning result using Tensorboard
I tried to explain Pytorch dataset
I tried to move the ball
I tried to estimate the interval.
I tried to summarize the commands used by beginner engineers today
I tried to predict by letting RNN learn the sine wave
I tried to visualize Boeing of violin performance by pose estimation
I tried to solve the shift scheduling problem by various methods
I tried to visualize the common condition of VTuber channel viewers
I tried to summarize the umask command
I tried to visualize AutoEncoder with TensorFlow
I tried to recognize the wake word
I tried to summarize the graphical modeling.
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried moving the image to the specified folder by right-clicking and left-clicking
I tried to visualize the age group and rate distribution of Atcoder
I tried to summarize the general flow up to service creation by self-education.
765 I tried to identify the three professional families by CNN (with Chainer 2.0.0)
I tried to find the optimal path of the dreamland by (quantum) annealing
I tried to verify and analyze the acceleration of Python by Cython
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried to summarize the Linux commands used by beginner engineers today-Part 1-
I tried to solve the inverted pendulum problem (Cart Pole) by Q-learning.
I tried to verify the result of A / B test by chi-square test
I tried to analyze the New Year's card by myself using python
I tried to program bubble sort by language
I tried web scraping to analyze the lyrics.
I tried to implement reading Dataset with PyTorch
I tried to get an image by scraping
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried using the Datetime module by Python
Qiita Job I tried to analyze the job offer
LeetCode I tried to summarize the simple ones
I tried to classify dragon ball by adaline
I tried to implement the traveling salesman problem
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to predict the presence or absence of snow by machine learning.
I tried to predict the change in snowfall for 2 years by machine learning
I tried to rescue the data of the laptop by booting it on Ubuntu
I tried to pass the G test and E qualification by training from 50
I tried to learn the sin function with chainer
I tried to graph the packages installed in Python
[Introduction] I tried to implement it by myself while explaining the binary search tree.
I tried to detect the iris from the camera image
I tried to summarize the basic form of GPLVM
[First data science ⑤] I tried to help my friend find the first property by data analysis.
I tried to predict the J-League match (data analysis)
I tried to solve the soma cube with python
I tried to understand how to use Pandas and multicollinearity based on the Affairs dataset.
I tried to debug.
I tried to approximate the sin function using chainer
I tried to put pytest into the actual battle