Output product information to csv using Rakuten product search API [Python]

Introduction

I used Rakuten Ichiba's API to output product information that applies to keywords to csv.

I used this "Rakuten product search API". Rakuten Web Service: Rakuten Product Search API (version: 2017-07-06) \ | API List

Development environment and libraries to use

I used Jupyter Notebook as a development environment. When creating a large-scale tool or a tool that you want to execute regularly, you may need to create it with another text editor, but when creating a small one-shot tool, you can write a script while trying it little by little with Jupyter Notebook. However, it is very convenient because it can be executed immediately.

The libraries used are request and pandas. I used request to hit the API and pandas for the retrieved data manipulation and csv output.

Purpose

This was done for a price survey to sell agricultural products. Based on the acquired information, we are assuming that we will further analyze and make decisions (this time, we will acquire information).

There are various direct sales sites, but I thought that Rakuten Ichiba is familiar, has a large number of products, and has an API, so it is easy to acquire.

Preparing to handle Rakuten API

In order to utilize the API, you must first create an app from Rakuten's developer page and get an ID before you start writing a script.

This Rakuten Developers site Rakuten Web Service: API List

Create an app from "+ Issue App ID" on the upper right. By using the app ID obtained here when executing it in your own script, you will be able to access and stream the information on Rakuten Ichiba.

It would be nice to have APIs for other Rakuten services (Rakuten Travel, Rakuten Recipes, etc.) as well as Rakuten Ichiba. I would like to use it if I have a chance.

Script to get product information

(1) Enter keywords to get product information

This time, we will acquire product information that includes the potato variety name "Make-in" as a keyword.

First, import the required libraries.

import requests
import numpy as np
import pandas as pd

I want to use it later, so I'll include NumPy as well. There is no problem even if you do not use it. Next, a script that hits the API to get information.

REQUEST_URL = "https://app.rakuten.co.jp/services/api/IchibaItem/Search/20170706"
APP_ID="<Enter the app ID obtained from Rakuten's site here>"

serch_keyword = 'Make-in'

serch_params={
    "format" : "json",
    "keyword" : serch_keyword,
    "applicationId" : [APP_ID],
    "availability" : 0,
    "hits" : 30,
    "page" : 1,
    "sort" : "-updateTimestamp"
}

response = requests.get(REQUEST_URL, serch_params)
result = response.json()

Now you can get the information in the form of a dict type list with result ['Items']. This time, 30 products have been acquired (the value specified by " hits ": 30 in serch_params. This is the maximum value that can be acquired at one time).

Furthermore, for example, by setting result ['Items'] [2] ['Item'], the second item from the acquired items can be acquired as a dict type. 　　 If you take a quick look at the script,

REQUEST_URL is listed in Rakuten Web Service: Rakuten Product Search API (version: 2017-07-06) \ | API List Specify the request URL, In ʻAPP_ID`, enter the app ID obtained from Rakuten's developer page earlier.

By specifying the character string you want to search with serch_keyword, products that match that keyword will be searched. It seems to be easy to use even if you accept user input here with Python's ʻinput ()` function.

In serch_params, write the parameters when sending a request in dict type. Rakuten Web Service: Rakuten Product Search API (version: 2017-07-06) \ | API List Details in the "Input Parameters" section Is listed. ʻApplicationId (app ID) is required for this parameter, and it seems that ʻAPP_ID is required, and one of keyword, shopCode, ʻitemCode, genreIdis required. This time, I want to get the product information by the search keyword, so I specified the previousserch_keyword for keyword`.

For example, this " page ": 1 is an acquisition page, so it seems that you can easily acquire a large amount of product information over multiple pages by looping this number with a for statement.

(2) Create a dict type that contains the necessary product information

By the way, the dict that I got by hitting the API earlier is [Rakuten Web Service: Rakuten Product Search API (version: 2017-07-06) \ | API List](https://webservice.rakuten.co.jp/api The items listed in the "Output parameters" section of / ichibaitemsearch /) are included as dict keys and values.

For example, if you specify the key like result ['Items'] [2] ['Item'] ['itemName'], you can get the product name.

The information acquired at this stage is inconvenient to handle because it contains extra information as it is, so I will make a dict that contains only the necessary information.

The data we need this time 「itemName」「itemPrice」「itemCaption」「shopName」「shopUrl」「itemUrl」 (Later, I thought that the shipping flag "postageFlag" was also necessary, but it is not reflected in the script below).

#Turn the for statement to make a dict
item_key = ['itemName', 'itemPrice', 'itemCaption', 'shopName', 'shopUrl', 'itemUrl']
item_list = []
for i in range(0, len(result['Items'])):
    tmp_item = {}
    item = result['Items'][i]['Item']
    for key, value in item.items():
        if key in item_key:
            tmp_item[key] = value
    item_list.append(tmp_item)

Now you can get a list containing dict type product information.

What got stuck here was that I had to use the copy () method at ʻitem_list.append (tmp_item.copy ()). If you use ʻitem_list.append (tmp_item) without using this method, you will end up with a dict that contains multiple items of one product, and you will have to twist your neck and straddle the days.

The following article helped me.

When you append a dict type variable to a Python list, the variable behaves like a pointer ... · GitHub

This theory seems to need to be understood, so I would like to summarize it separately.

(3) Format data with pandas

If you can create a dict type list, the rest is not difficult and the basic operation of pandas is enough. Create a data frame and format it a little to make it easier to use.

#Create a data frame
item_df = pd.DataFrame(item_list)

#Change the order of columns
items_df = items_df.reindex(columns=['itemName', 'itemPrice', 'itemCaption', 'itemUrl', 'shopName', 'shopUrl'])

#Change column names and row numbers:Column names should be in Japanese, and row numbers should be serial numbers starting from 1.
items_df.columns = ['Product name', 'Product price', 'Product description', 'Product URL', 'Store name', 'Store URL']
items_df.index = np.arange(1, 31)

(4) csv output

Output the created data frame to a csv file.

items_df.to_csv('./rakuten_mayqueen.csv')

In the argument of the df.to_csv () method, specify the save destination path (directory and file name). This time, I created a csv file directly under the directory where this script is located, using a relative path.

Now, let's open the output data in Excel or SpreadSheet.

スクリーンショット 2020-09-01 20.28.15.png

I was able to get it nicely!

in conclusion

For the time being, I was able to get product information from Rakuten Ichiba and even output csv. As a future policy,

** (1) Data collection and shaping ** Collect as many data as you need and shape them into a form that can be used.

** (2) Analysis and decision making of collected data ** Attempt reasonable pricing using data as a judgment material (decision-making) 　 That's why, next time, I'd like to do a little complicated data collection and shaping.