Introduction

I was interested in the GoToEat campaign because it became a hot topic in the Torikizoku Marathon. This time, I would like to use Python scraping and mapping techniques to map the stores targeted for the campaign. GoToEat Campaign Site of Ministry of Agriculture, Forestry and Fisheries

Usage data

We scraped the page of Tabelog Campaign Target Stores to collect information.

At first glance, I couldn't find the wording that scraping was prohibited on the site ...

I will omit the scraping source code this time and only give an overview. The information to be acquired is ['Store name',' Genre',' Review rate',' Minimum budget',' Phone number',' Smoking information',' Address',' Prefecture',' Area code',' URL ']will do.


import requests
from bs4 import BeautifulSoup
import time
import pandas as pd
from pprint import pprint


def scrape_page(soup, df, cols):
    for title in soup.find_all('div', class_='list-rst__rst-name'):
        #Acquisition of store name
        name = title.find('a').text.strip()
        #Get store url
        shop_url = title.find('a').get('href')
        #Acquisition of genre
        genre = title.find('span', class_='list-rst__area-genre cpy-area-genre').text
        genre = genre.replace('/', '').replace(' ', '').replace('、', ',')

        print('「' + name + 'Information is being acquired ...')

        #Get detailed information by scraping the page for each shop from the obtained url
        res_shop = requests.get(shop_url)
        soup_shop = BeautifulSoup(res_shop.content, 'html.parser')
        time.sleep(0.1)

        #Prefectures,Acquisition of area
        prefecture = shop_url.split('/')[3]
        area = shop_url.split('/')[4]

        #Get an address
        address = soup_shop.find('p', class_='rstinfo-table__address').text.strip()

        #Get word-of-mouth rate--Since an error occurs, control it with a try statement
        try:
            rate = float(soup_shop.find('span', class_='rdheader-rating__score-val-dtl').text.strip())
        except ValueError:
            rate = '-'
        #Get the minimum budget
        budget = soup_shop.find('em', class_='gly-b-dinner').text.strip()
        budget = budget.split('～')[0].replace('￥', '').replace(',', '')
        budget = int(budget)

        #Get a phone number
        phone_num = soup_shop.find('p', class_='rstinfo-table__tel-num-wrap').text.strip()

        #Acquisition of smoking information
        smoke = soup_shop.find('p', class_='p-input-form__line').text.strip()

        #Create an empty list for each shop and append
        shop_datas = []
        shop_datas.append(name)
        shop_datas.append(genre)
        shop_datas.append(rate)
        shop_datas.append(budget)
        shop_datas.append(phone_num)
        shop_datas.append(smoke)
        shop_datas.append(address)
        shop_datas.append(prefecture)
        shop_datas.append(area)
        shop_datas.append(shop_url)

        #Generate a dataframe from a list and merge
        df_shop = pd.DataFrame([shop_datas], columns=cols, index=None)
        df = pd.concat([df, df_shop], sort=False)

    return df


def job():
    url = 'https://tabelog.com/go-to-eat/list/?page='
    page_cnt = 1
    cols = ['name', 'genre', 'rate', 'budget', 'tel', 'smoke', 'address', 'prefectures', 'area', 'url']
    df = pd.DataFrame(columns=cols, index=None)
    url = url + str(page_cnt)
    res = requests.get(url)
    soup = BeautifulSoup(res.content, 'html.parser')

    df = scrape_page(soup, df, cols)

    pprint(df.columns.values)
    pprint(df.head().values)


job()

output

Obtaining information on "Charcoal-grilled meat Hiyoriya" ...
Obtaining information on "Charcoal-grilled meat / Korean food KollaBo Ebisu store" ...
Obtaining information on "Miraku" ...
Obtaining information on "Wine Bar. Dipunto Shibuya Jinnan" ...
Obtaining information on "Kobe beef steak cooking Yuzuki Hana charcoal grill" ...
Obtaining information on "Italian Alberta KARASUMA of fish and vegetables" ...
Obtaining information on "Ningyocho Wine Bar" ...
Obtaining information on "Basashi and Motsunabe Izakaya Kyushu Komachi Private Room All-you-can-drink Kanayama 2-chome" ...
Obtaining information on "Adult Sake Bar Irori" ...
Obtaining information on "Hidden Bunch Machida" ...
Obtaining information on "Aloha Table Yokohama Bay Quarter" ...
Obtaining information on "Beer Garden Terrace Iidabashi" ...
Obtaining information on "Takkanmari University" ...
Obtaining information on "Private room steak & Italian Dining VT Ebisu store" ...
Obtaining information on "Mountain lodge bar beast" ...
Obtaining information on "Genkatsugu" ...
Obtaining information on "Kyomachi Koishigure Shinjuku Main Building" ...
Obtaining information on "GINTO Ikebukuro" ...
Obtaining information on "Satsuma Jidori and Private Izakaya Izakaya Shinjuku" ...
"Meat wholesaler 25-Obtaining information on "89" ...

array(['name', 'genre', 'rate', 'budget', 'tel', 'smoke', 'address',
       'prefectures', 'area', 'url'], dtype=object)
array([['Charcoal grilled meat\u3000 Hiyori', 'Grilled meat,Izakaya,steak', 3.2, 4000, '050-5869-1319',
        'All seats can be smoked', '1 Nakayamatedori, Chuo-ku, Kobe City, Hyogo Prefecture-7-5 S Building 2F', 'hyogo', 'A2801',
        'https://tabelog.com/hyogo/A2801/A280101/28032248/'],
       ['Charcoal-grilled meat / Korean cuisine KollaBo Ebisu store', 'Grilled meat,Korean cuisine,Izakaya', 3.09, 3000,
        '050-5571-4836', 'All seats are non-smoking', '1 Ebisu Nishi, Shibuya-ku, Tokyo-14-1 Sunrise Building\u30002Ｆ',
        'tokyo', 'A1303',
        'https://tabelog.com/tokyo/A1303/A130302/13178659/'],
       ['Miraku', 'Izakaya,Seafood / seafood,Bowl of rice topped with sashimi', 3.59, 3000, '050-5595-5384', 'Separate smoke',
        '1 Kyomachi, Kokurakita-ku, Kitakyushu City, Fukuoka Prefecture-6-28', 'fukuoka', 'A4004',
        'https://tabelog.com/fukuoka/A4004/A400401/40001071/'],
       ['Wine bar. Dipunto Shibuya Jinnan', 'Bar Bar,Izakaya,Italian', 3.08, 2000,
        '050-5589-7383', 'Separate smoke', '1 Jinnan, Shibuya-ku, Tokyo-20-17 B1F', 'tokyo', 'A1303',
        'https://tabelog.com/tokyo/A1303/A130301/13186934/'],
       ['Kobe beef steak cooking Yuzuki Hana charcoal grill', 'steak,Kappou / Small dishes,Grilled meat', 3.1, 10000,
        '050-5594-6576', 'All seats are non-smoking', '4 Kano-cho, Chuo-ku, Kobe-shi, Hyogo-8-19 Parfun Building 2F',
        'hyogo', 'A2801',
        'https://tabelog.com/hyogo/A2801/A280101/28050226/']],
      dtype=object)

I was able to get the information on one page successfully. After that, if you count up the pages of the script like the above and loop by prefecture, you can get all the information (sweat that took a lot of time).

To map

I used folium as in the example, but I can't map unless I enter the latitude and longitude in addition to the address. I searched a lot, but using this site makes it easy to move from the address to the latitude and longitude. I used it because it seemed that I could get it. If you upload the csv file with the address, you can download the file with the latitude and longitude added to the column. Save it as tabelog_geo.csv and check the contents of the data.


df = pd.read_csv('tabelog_geo.csv', encoding='shift_jis')
print('The number of data:{}'.format(len(df)))
print('-------column---------')
pprint(df.columns.values)
print('-------head values----')
pprint(df.head().values)

output

The number of data:15020
-------column---------
array(['name', 'genre', 'rate', 'budget', 'tel', 'smoke', 'address',
       'url', 'prefectures', 'area', 'LocName', 'fX', 'fY', 'iConf',
       'iLvl'], dtype=object)
-------head values----
array([['Horse meat bar bounce Madamachi Mita store', 'Bar Bar,Izakaya,Italian', 3.51, 3000, '050-5869-1861',
        'All seats are non-smoking', '5 Shiba, Minato-ku, Tokyo-22-5 Harada Building 1F',
        'https://tabelog.com/tokyo/A1314/A131402/13143781/', 'tokyo',
        'A1314', 'Tokyo/Minato-ku/Turf/5-chome/No. 22', 139.74643999999998,
        35.648109999999996, 5.0, 7.0],
       ['Dorado', 'bistro,French,Western food', 3.64, 5000, '050-5596-1243', 'All seats are non-smoking',
        '3 Sakae, Naka-ku, Nagoya-shi, Aichi-10-14 Pivot Lion Building 2F',
        'https://tabelog.com/aichi/A2301/A230102/23029870/', 'aichi',
        'A2301', 'Aichi prefecture/Nagoya city/Naka-ku/Sannomaru/4-chome', 136.90649, 35.18425, 5.0, 6.0],
       ['Izakaya Ryunosu', 'Izakaya,Seafood / seafood,Gyoza', 3.16, 3000, '050-5456-3379',
        'All seats are non-smoking', '2 Saiwaicho, Chitose City, Hokkaido-5-1',
        'https://tabelog.com/hokkaido/A0107/A010701/1024501/',
        'hokkaido', 'A0107', 'Hokkaido/Chitose/Yukimachi/2-chome/Address 5', 141.64700000000002,
        42.822959999999995, 5.0, 7.0],
       ['Kushikatsu Dengana Yokohama Minamisaiwai', 'Izakaya,Deep-fried skewers, skewers,hormone', 3.01, 0, '050-5597-9448',
        '-', '2 Minamisaiwai, Nishi-ku, Yokohama-shi, Kanagawa-8-20SFBuilding2F',
        'https://tabelog.com/kanagawa/A1401/A140101/14079956/',
        'kanagawa', 'A1401', 'Kanagawa Prefecture/Naka-gun/Ninomiya Town/Ninomiya', 139.25673999999998,
        35.30523, 5.0, 5.0],
       ['Charcoal-grilled meat / Korean food KollaBo Shinbashi store', 'Izakaya,Grilled meat,Korean cuisine', 3.06, 3000,
        '050-5592-3837', 'Separate smoke', '2 Shinbashi, Minato-ku, Tokyo-8-17 Sanken Building 1F',
        'https://tabelog.com/tokyo/A1301/A130103/13197832/', 'tokyo',
        'A1301', 'Tokyo/Minato-ku/Shimbashi/2-chome/No. 8', 139.7563, 35.66747, 5.0, 7.0]],
      dtype=object)

Latitude and longitude information is included in the fX and fY columns.

Data plot

I was wondering how the tabelog rates are distributed, so I will plot using seaborn. Histogram and violin plot.

#Histogram generation
def plot_hist(df):
    #Because some rates cannot be obtained and are registered as 0
    df = df[df['rate'] != 0]
    sns.set()
    fig, ax = plt.subplots(figsize=(10, 6))
    sns.distplot(
        df['rate'], bins=10, color='blue', label='rate',
        kde=True)
    ax.set_title('histogram --tabelog rate')
    plt.show()

#Generation of violin plot
def plot_violin(df):
    df = df[df['rate'] != 0]
    sns.set()
    fig, ax = plt.subplots(figsize=(10, 6))
    sns.violinplot(x=df['rate'].values, color='lightgreen')
    ax.set_title('violin_plot --tabelog rate')
    plt.show()

I've never used a violin plot for the first time, but I'm surprised that it unintentionally becomes obscene. Aside from that, it can be read that stores with a rating of about 3.4 or higher are highly rated in the whole.

Mapping with folium

If you map all the information, it will be a huge number, so I would like to pin the information narrowed down on the map in advance. This time, the genre includes "Italy", the maximum budget is "10,000 yen", the tabelog rate is "3.2 or more", and the prefectures are "Tokyo". Also, in the previous plot, a rate of 3.4 or higher seems to be good, so for stores with a rate of 3.4 or higher, make it stand out in green.

import folium
import pandas as pd


def param_input(df):
    #Setting of each parameter-Not specified''
    param_genre = 'Italy'  #str input
    param_budget = 10000  #int input maximum budget
    param_rate = 3.2  #float input minimum rate
    param_prefecture = 'tokyo'  #str Half-width alphabet input
    param_address = ''  #str input

    #Narrow down df according to param
    if param_genre != '':
        df = df[df['genre'].str.contains(param_genre)]
    if param_budget != '':
        df = df[df['budget'] <= param_budget]
    if param_rate != '':
        df = df[df['rate'] >= param_rate]
    if param_prefecture != '':
        df = df[df['prefectures'] == param_prefecture]
    if param_address != '':
        df = df[df['address'].str.contains(param_address)]

    return df


def make_folium_map(df):
    #Generate a map with the very first record as the starting point
    start_loc = df[['fY', 'fX']].values[0]
    m = folium.Map(location=start_loc, tiles='Stamen Terrain', zoom_start=12)

    for data in df[['fY', 'fX', 'name', 'rate', 'budget', 'smoke']].values:
        lat = data[0]
        lon = data[1]
        name = data[2]
        rate = data[3]
        budget = data[4]
        smoke = data[5]

        #Generate ping character
        ping_text = '{name}<br>{rate}<br>{budget}<br>{smoke}' \
            .format(name=name,
                    rate='Evaluation:' + str(rate),
                    budget=str(budget) + 'Yen ~',
                    smoke=smoke)

        #icon according to rate_set color
        # 3.Display 4 or more in green
        if rate > 3.4:
            icon_color = 'green'
        else:
            icon_color = 'blue'

        #Put a pin on the map
        folium.Marker([lat, lon],
                      tooltip=ping_text,
                      icon=folium.Icon(color=icon_color)).add_to(m)
        m.save('tabelog_map.html')

df = pd.read_csv('tabelog_geo.csv', encoding='shift_jis')
df = param_input(df)
make_folium_map(df)

result

Zoom in and move the cursor to the store information.

I was able to successfully map in the html file. It would be even better if I could upload it on the web using heroku, but since I am currently trying it, I will hurry to publish it as a screenshot.

It may be interesting to discover new things by generating random numbers and randomly deciding which store to go to. After all, I tend to go to the stores I always go to.

bonus

[This site](https://hikiniku11029.hatenablog.com/entry/2019/06/18/folium%E3%81%A7%E3%83%92%E3%83%BC%E3%83%88% E3% 83% 9E% E3% 83% 83% E3% 83% 97% E3% 81% AE% E4% BD% 9C% E3% 82% 8A% E6% 96% B9% E3% 82% 92% E3% I made a heat map with reference to 81% 8A% E3% 81% BC% E3% 81% 88% E3% 81% 9F). The place where there is a high-rate store is shining strongly.