[PYTHON] Change the data frame of pandas purchase data (id x product) to a dictionary

Try changing the purchase data to a dictionary. do not use to_dict

It didn't seem to work with to_dict, so I tried it myself. The reason was that I wanted to process purchasing data by collaborative filtering, but with data frames It didn't seem to work. Also, I would like to try the recommendation logic like in collective intelligence programming. I wanted to use the data in the data frame at hand by converting it somehow.

# coding: utf-8

import pandas as pd
from collections import defaultdict

df = pd.DataFrame({'id':['a','a','b','b','c',], 'shouhin':['x', 'y', 'y','z', 'x']})

Suppose you have the following data

  id shouhin
0  a       x
1  a       y
2  b       y
3  b       z
4  c       x

The purpose is to change this to a dictionary like the one below.

{'a': ['y', 'x'], 'b': ['y', 'z'], 'c': ['x']}

First, create a dictionary with defaultdict. Then, fetch each line with df.values and create a dictionary with nested elements. (df.values returns numpy.array)

tempdic = defaultdict(dict)

for d in df.values:
    
    tempdic[d[0]][d[1]] = 1.0     #Any value is acceptable
    

Then, you can do the following.

dic = {k: tempdic[k].keys() for k in tempdic}

Looking at dic, it's as expected

{'a': ['y', 'x'], 'c': ['x'], 'b': ['y', 'z']}


If you use set, you can get common products and it is easy to calculate the jaccard coefficient.

{'y'}```


 Even if you don't set the first part to df.values, you can loop and get the elements of each line with df.iloc [line number].
 It is possible, but in that case the speed is much slower.
 In the case of purchasing data, I think that the amount of data is quite large, so if it is slow here, it will be severe.

 Also, I think there is a way to do it all at once using while or if, but this also gives priority to speed.
 I try not to use such a method.














Recommended Posts

Change the data frame of pandas purchase data (id x product) to a dictionary
I want to give a group_id to a pandas data frame
[Go] Create a CLI command to change the extension of the image
How to check the memory size of a dictionary in Python
How to find the memory address of a Pandas dataframe value
[Spark Data Frame] Change a column from horizontal to vertical (Scala)
How to divide and process a data frame using the groupby function
Script to change the description of fasta
Do not change the order of columns when concatenating pandas data frames.
Let's use Python to represent the frequency of binary data contained in a data frame in a single bar graph.
How to mention a user group in slack notification, how to check the id of the user group
[Circuit x Python] How to find the transfer function of a circuit using Lcapy
Paste a link to the data point of the graph created by jupyterlab & matplotlib
[Introduction to Python] How to get the index of data with a for statement
How to calculate the volatility of a brand
Change the decimal point of logging from, to.
Make holiday data into a data frame with pandas
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
Find out the maximum number of characters in multi-line text stored in a data frame
How to change the generated image of GAN to a high quality one to your liking
I tried scraping food recall information with Python to create a pandas data frame
A memo to visually understand the axis of pandas.Panel
How to write a list / dictionary type of Python3
I made a program in Python that changes the 1-minute data of FX to an arbitrary time frame (1 hour frame, etc.)
Steps to calculate the likelihood of a normal distribution
A story about struggling to loop 3 million ID data
Calculate the product of matrices with a character expression?
Python Note: The mystery of assigning a variable to a variable
Changed the default style (CSS) of pandas data frame output by display in Google Colab
Get the value of a specific key up to the specified index in the dictionary list in Python
When a character string of a certain series is in the Key of the dictionary, the character string is converted to the Value of the dictionary.
How to plot the distribution of bacterial composition from Qiime2 analysis data in a box plot
Examples and countermeasures for "A value is trying to be set on a copy of a slice from a Data Frame." Warning in pandas
[Python] Change the Cache-Control of the object uploaded to Cloud Storage
[Ubuntu] How to delete the entire contents of a directory
A network diagram was created with the data of COVID-19.
[python] Change the image file name to a serial number
Get the id of a GPU with low memory usage
Change the standard output destination to a file in Python
An introduction to object orientation-let's change the internal state of an object
Ingenuity to handle data with Pandas in a memory-saving manner
I made a function to check the model of DCGAN
Put the lists together in pandas to make a DataFrame
Build a Python environment and transfer data to the server
The story of copying data from S3 to Google's TeamDrive
How to find the scaling factor of a biorthogonal wavelet
Change the volume of Pepper according to the surrounding environment (sound)
Convert the image data (png) at hand to a .pbm image
Comparison of data frame handling in Python (pandas), R, Pig
A collection of methods used when aggregating data with pandas
How to get an overview of your data in Pandas
[Pytorch] Utilization of einsum to ease the product of matrices (tensors)
The minimum methods to remember when aggregating data in Pandas
Is there a secret to the frequency of pi numbers?
How to connect the contents of a list into a string
I sent the data of Raspberry Pi to GCP (free)
Try to extract the features of the sensor data with CNN
Manage the overlap when drawing scatter plots with a large amount of data (Matplotlib, Pandas, Datashader)
[PyQt x pySerial] Display a list of COM ports connected to the PC in the combo box
A story about a person who uses Python addicted to the judgment of an empty JavaScript dictionary