[PYTHON] Create a summary table by product and time by processing the data extracted from a certain POS system

Introduction

I received a request from several acquaintances that "data extracted from POS is difficult to handle, so please aggregate it", so I made a script to convert it to data that is easy to handle with Python.

Check the received data

Open it with a text editor and check it.

zitem.csv


0000,0000,0000,20160000,201600000000,1000
Menu code / PLU code,Own code,Product name,Categories,Categories名称,sub/set/Comment Parent menu code,Parent menu name,unit price,Free 1 chord code,Free 1 aggregate name,Free 2 tabulation code,Free 2 aggregate name,Free 3 chord code,Free 3 aggregate name,Takeout classification,Takeout classification名称,原unit price,GP code,GP name,DP code,DP name,Time zone start time,Time zone end time,Time zone 01 Number of appearances (points),Time zone start time,Time zone end time,Time zone 02 Number of appearances (points),Time zone start time,Time zone end time,Time zone 03 Number of appearances (points),Time zone start time,Time zone end time,Time zone 04 Number of appearances (points),Time zone start time,Time zone end time,Time zone 05 Number of appearances (points),Time zone start time,Time zone end time,Time zone 06 Number of appearances (points),Time zone start time,Time zone end time,Time zone 07 Number of appearances (points),Time zone start time,Time zone end time,Time zone 08 Number of appearances (points),Time zone start time,Time zone end time,Time zone 09 Number of appearances (points),Time zone start time,Time zone end time,Time zone 10 appearances (points),Time zone start time,Time zone end time,Time zone 11 appearances (points),Time zone start time,Time zone end time,Time zone 12 appearances (points),Time zone start time,Time zone end time,Time zone 13 appearances (points),Time zone start time,Time zone end time,Time zone 14 appearances (points),Time zone start time,Time zone end time,Time zone 15 appearances (points),Time zone start time,Time zone end time,Time zone 16 appearances (points),Time zone start time,Time zone end time,Time zone 17 appearances (points),Time zone start time,Time zone end time,Time zone 18 appearances (points),Time zone start time,Time zone end time,Time zone 19 appearances (points),Time zone start time,Time zone end time,Time zone 20 appearances (points),Time zone start time,Time zone end time,Time zone 21 appearances (points),Time zone start time,Time zone end time,Time zone 22 appearances (points),Time zone start time,Time zone end time,Time zone 23 appearances (points),Time zone start time,Time zone end time,Time zone 24 appearances (points),Number of Employers,Number of promotions,Data by product 1,Data by product 2,Data by product 3,Time zone 01 value discount points,Time zone 01 value discount amount,Time zone 01 Promo number,Time zone 01 reserve,Time zone 02 value discount points,Time zone 02 value discount amount,Time zone 02 Promo number,Time zone 02 reserve,Time zone 03 value discount points,Time zone 03 value discount amount,Time zone 03 Promo number,Time zone 03 reserve,Time zone 04 value discount points,Time zone 04 value discount amount,Time zone 04 Promo number,Time zone 04 reserve,Time zone 05 value discount points,Time zone 05 value discount amount,Time zone 05 Promotion number,Time zone 05 reserve,Time zone 06 value discount points,Time zone 06 value discount amount,Time zone 06 Promo number,Time zone 06 reserve,Time zone 07 value discount points,Time zone 07 value discount amount,Time zone 07 Promo number,Time zone 07 reserve,Time zone 08 value discount points,Time zone 08 value discount amount,Time zone 08 Promo number,Time zone 08 reserve,Time zone 09 value discount points,Time zone 09 value discount amount,Time zone 09 Promo number,Time zone 09 reserve,Time zone 10 value discount points,Time zone 10 value discount amount,Time zone 10 Promos,Time zone 10 spare,Time zone 11 value discount points,Time zone 11 value discount amount,Time zone 11 Promos,Time zone 11 reserve,12-value discount for time zones,12-value discount for time zone,Time zone 12 Promos,Time zone 12 reserve,Time zone 13 value discount points,Time zone 13 value discount amount,Time zone 13 Promos,Time zone 13 reserve,Time zone 14 value discount points,Time zone 14 value discount amount,Time zone 14 Promos,Time zone 14 spare,Time zone 15 value discount points,Time zone 15 value discount amount,Time zone 15 Promos,Time zone 15 spare,Time zone 16 value discount points,Time zone 16 value discount amount,Time zone 16 Promos,Time zone 16 reserve,Time zone 17 value discount points,Time zone 17 value discount amount,Time zone 17 Promos,Time zone 17 reserve,Time zone 18 value discount points,Time zone 18 value discount amount,Time zone 18 Promos,Time zone 18 reserve,Time zone 19 value discount points,Time zone 19 value discount amount,Time zone 19 Promos,Time zone 19 reserve,Time zone 20 value discount points,Time zone 20 value discount amount,Time zone 20 Promos,Time zone 20 reserve,21 time zone discount points,Time zone 21 value discount amount,Time zone 21 Promo number,Time zone 21 reserve,22 time zone discount points,Time zone 22 value discount amount,Time zone 22 Promos,Time zone 22 reserve,23 time zone discount points,Time zone 23 value discount amount,Time zone 23 Promo number,Time zone 23 reserve,Time zone 24-value discount points,Time zone 24-value discount amount,Time zone 24 Promos,Time zone 24 spare
0000000000000,0000000000000,Sample 1,00,Main,0000,                    ,+001200,01,Free 1,01,Free 2,01,Free 3,0,Food and drink in the store,+000000,01,Meal,01,water,0000,0059,+0000,0100,0159,+0000,0200,0259,+0000,0300,0359,+0000,0400,0459,+0000,0500,0559,+0000,0600,0659,+0000,0700,0759,+0000,0800,0859,+0000,0900,0959,+0000,1000,1059,+0000,1100,1159,+0000,1200,1259,+0000,1300,1359,+0000,1400,1459,+0000,1500,1559,+0000,1600,1659,+0000,1700,1759,+0002,1800,1859,+0000,1900,1959,+0000,2000,2059,+0000,2100,2159,+0000,2200,2259,+0000,2300,2359,+0000,+0000,+0000,+0000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000
0000000000000,0000000000000,Test sample,00,Main,0000,                    ,+000400,01,Free 1,01,Free 2,01,Free 3,0,Food and drink in the store,+000000,01,Meal,01,water,0000,0059,+0000,0100,0159,+0000,0200,0259,+0000,0300,0359,+0000,0400,0459,+0000,0500,0559,+0000,0600,0659,+0000,0700,0759,+0000,0800,0859,+0000,0900,0959,+0000,1000,1059,+0000,1100,1159,+0000,1200,1259,+0000,1300,1359,+0000,1400,1459,+0000,1500,1559,+0000,1600,1659,+0000,1700,1759,+0000,1800,1859,+0000,1900,1959,+0000,2000,2059,+0000,2100,2159,+0000,2200,2259,+0000,2300,2359,+0000,+0000,+0000,+0000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000,+0000,+0000000,+0000,+0000
(Hereafter omitted)

... Certainly it is difficult to use this as it is (-_-;)

When I asked a little about the contents and requests of the data, --The menu code has 13 digits, but 0 to 9999 is used. --Product names that were not sold even if the menu code was set are not included in CSV --I want the number of items sold by product from 0:00 to 23:00 --This file is output daily and created daily --Finally, I want to combine files from multiple days

Output CSV data format

The header is the menu code, product name, and unit price, and organizes the number of products that have appeared for each hour thereafter.

output.csv


Menu code,0,1,2,〜,9999
Product name,Sample 1,Sample 2,Sample 3,〜,Sample 10000
unit price,100,200,300,〜10000
2016/1/1 0:00,1,2,3,〜,10
2016/1/1 1:00,4,5,6,〜,0
2016/1/1 2:00,6,2,9,〜,20
(Omission)
2016/1/1 23:00,1,2,3,〜,10

Loading the library

python


import pandas as pd
import os
import datetime

Read file

Since the file name is fixed, set the file name and get the date data from the creation date and read it.

python


#file name
filename = "zitem.csv"    

#Date data acquisition
dt = datetime.datetime.fromtimestamp(os.stat(filename).st_mtime)
y = dt.year
m = dt.month
d = dt.day

#Data read
df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=1)

Get product name and price

Regarding the product name and price, it is said that there are times when it is included in the CSV data and there are times when it is not included, so in consideration of collecting multiple data in the future, prepare a frame for the time being and the part with the data I will fill it up.

python


cnt = 10001

names = [""] * cnt
prices = [0] * cnt

for i in range(len(df)):
    key = int(df.loc[i][0]) + 1
    names[key] = df.loc[i][2]
    prices[key] = int(df.loc[i][7])

Since it is 0 to 9999, it is 10000 kinds of data, but since date and time data is added, it is made into 10001 data array.

Get sales

It seems that the data of the number of sales is included in every 3 columns from the 23rd column, so the data for 24 hours is acquired every 3 columns.

python


x = 23

datas = []

for i in range(24):
    items = [0] * cnt
    items[0] = datetime.datetime(y,m,d,i,0).strftime("%Y-%m-%d %H:%M")
    for j in range(len(df)):
        key = df.loc[j][0] + 1
        num = int(df.loc[j][x])
        items[key] = num
    datas.append(items)
    x += 3

File output

Save the acquired information in a CSV file.

python


f = open('output.csv', 'w')

sName = ""
sPrice = ""

for i in range(len(names)):
    sName += names[i] + ","
    sPrice += str(prices[i]) + ","

f.write(sName + "\n" + sPrice + "\n")

for i in range(len(datas)):
    data = datas[i]
    sData = ""
    for j in range(len(data)):
        sData += str(data[j]) + ","
    f.write(sData + "\n")
    
f.close()

Save the code up to this point as "pos.py" and execute it.

python


$ python pos.py

did it!

All source code posted

pos.py


import pandas as pd
import os
import datetime

#file name
filename = "zitem.csv"    

#Date data acquisition
dt = datetime.datetime.fromtimestamp(os.stat(filename).st_mtime)
y = dt.year
m = dt.month
d = dt.day

#Data read
df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=1)

#Get the data you need

cnt = 10001

names = [""] * cnt
prices = [0] * cnt

for i in range(len(df)):
    key = int(df.loc[i][0]) + 1
    names[key] = df.loc[i][2]
    prices[key] = int(df.loc[i][7])

x = 23

datas = []

for i in range(24):
    items = [0] * cnt
    items[0] = datetime.datetime(y,m,d,i,0).strftime("%Y-%m-%d %H:%M")
    for j in range(len(df)):
        key = df.loc[j][0] + 1
        num = int(df.loc[j][x])
        items[key] = num
    datas.append(items)
    x += 3

#File output

f = open('output.csv', 'w')

sName = ""
sPrice = ""

for i in range(len(names)):
    sName += names[i] + ","
    sPrice += str(prices[i]) + ","

f.write(sName + "\n" + sPrice + "\n")

for i in range(len(datas)):
    data = datas[i]
    sData = ""
    for j in range(len(data)):
        sData += str(data[j]) + ","
    f.write(sData + "\n")
    
f.close()

at the end

There must be a better way!

Recommended Posts

Create a summary table by product and time by processing the data extracted from a certain POS system
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
Create a dictionary by searching the table using sqlalchemy
Create a data frame from the acquired boat race text data
[numpy] Create a moving window matrix from multidimensional time series data
The first time a programming beginner tried simple data analysis by programming
Prepare a high-speed analysis environment by hitting mysql from the data analysis environment
Extract and plot the latest population data from the PDF data provided by the city
Create a correspondence table between EC number and Uniprot entry from enzyme.dat
[Python] Create a list of date and time (datetime type) for a certain period
Python --Read data from a numeric data file and find the multiple regression line.
Create a simple reception system with the Python serverless framework Chalice and Twilio
Aggregate steps by day from iPhone healthcare data to create a CSV file