1. Motivation

** "Use scraping to find a delicious and powerful restaurant regardless of the tabelog score! 』**

I often use tabelog when eating out, but the tabelog score is ** low while there are few reviews **. In fact, there is such a description. (Https://tabelog.com/help/score/)

As an index that reflects the voices of users, the score increases as more high evaluations are collected from users who have an influence. For example, if the degree of influence is the same, a store with 100 5-point evaluations will get a higher score than a store with only 2 5-point evaluations.

But since it has just opened ** There must be a store that has excellent food and service with few reviews **.
However, if you just rely on ranking search like I do, you will never meet such a shop.
Therefore, we will make full use of scraping to extract restaurants that are actually highly rated in the early days of opening.
I'm glad to have delicious rice, and I'm sure it will be like "You know this restaurant! Nice!". ~~

2. Implemented code

Code to get information from tabelog in Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup
import math
import time

root_url = 'https://tabelog.com/'
res = requests.get(root_url)
soup = BeautifulSoup(res.content, 'html.parser')
a = soup.find_all('a', class_='rsttop-area-search__target js-area-swicher-target')

area_url = {}
for area in a:
    area_dict = {}
    splitted = area.get('data-swicher-area-list').split('"')
    for i in range(int((len(splitted)-1)/8)):
        area_dict[splitted[i*8+3]] = splitted[i*8+7]
    area_url[area.get('data-swicher-city').split('"')[3]] = area_dict

visit_areas = ['Shibuya / Ebisu / Daikanyama']
url_dict = {}
for visit_area in visit_areas:
    url = area_url['Tokyo'][visit_area]
    time.sleep(1)
    res = requests.get(root_url + url[1:])
    soup = BeautifulSoup(res.content, 'html.parser')
    a = soup.find_all('a', class_='c-link-arrow')
    for area in a:
        href = area['href']
        if href[-21:-8]!=url:
            continue
        else:
            url_dict[area.text] = href

max_page = 20
restaurant_data = []
for area, url in url_dict.items():
    time.sleep(1)
    res_area = requests.get(url)
    soup_area = BeautifulSoup(res_area.content, 'html.parser')
    rc_count = int(soup_area.find_all('span', class_='list-condition__count')[0].text)
    print('There are ' + str(rc_count) + ' restaurants in ' + area)
    for i in range(1,min(math.ceil(rc_count/20)+1,max_page+1,61)):
        print('Processing...  ' + str(i) + '/' + str(min(math.ceil(rc_count/20)+1,max_page+1,61)-1))
        url_rc = url + 'rstLst/RC/' + str(i) + '/?Srt=D&SrtT=nod'
        res_rc = requests.get(url_rc)
        soup_rc = BeautifulSoup(res_rc.content, 'html.parser')
        for rc_div in soup_rc.find_all('div', class_='list-rst__wrap js-open-new-window'):
            rc_name = rc_div.find('a', class_='list-rst__rst-name-target cpy-rst-name').text
            rc_url = rc_div.find('a', class_='list-rst__rst-name-target cpy-rst-name')['href']
            rc_score = rc_div.find('span', class_='c-rating__val c-rating__val--strong list-rst__rating-val')
            if rc_score is None:
                rc_score = -1.
            else:
                rc_score = float(rc_score.text)
            rc_review_num = rc_div.find('em', class_='list-rst__rvw-count-num cpy-review-count').text
            if rc_review_num != ' - ':
                page = 1
                score = []
                while True:
                    rc_url_pg = rc_url + 'dtlrvwlst/COND-2/smp0/?smp=0&lc=2&rvw_part=all&PG=' + str(page)
                    time.sleep(1)
                    res_pg = requests.get(rc_url_pg)
                    soup_pg = BeautifulSoup(res_pg.content, 'html.parser')
                    if 'I can't find the page I'm looking for' in soup_pg.find('title').text:
                        break
                    try:
                        station = soup_pg.find_all('dl', class_='rdheader-subinfo__item')[0].find('span', class_='linktree__parent-target-text').text
                    except:
                        try:
                            station = soup_pg.find_all('dl', class_='rdheader-subinfo__item')[0].find('dd', class_='rdheader-subinfo__item-text').text.replace('\n','').replace(' ','')
                        except:
                            station = ''
                    genre = '/'.join([genre_.text for genre_ in soup_pg.find_all('dl', class_='rdheader-subinfo__item')[1].find_all('span', class_='linktree__parent-target-text')])
                    price = soup_pg.find_all('dl', class_='rdheader-subinfo__item')[2].find('p', class_='rdheader-budget__icon rdheader-budget__icon--dinner').find('a', class_='rdheader-budget__price-target').text
                    score = score + [score_.next_sibling.next_sibling.text for score_ in soup_pg.find_all('span', class_='c-rating__time c-rating__time--dinner')]
                    page += 1
                    if page == math.ceil(int(rc_review_num)/100)+1:
                        break
                restaurant_data.append([area, rc_count, rc_name, rc_url, rc_score, rc_review_num, station, genre, price, score])

Also, for code explanations and details, see Scraping and Tabelog-I want to find a good restaurant! ~ (Work), so please have a look if you are interested.

3. What is the result you are interested in? ??

Plot and visualize data

The result is as follows! The top 400 restaurants in Shibuya, Ebisu, and Daikanyama in order of new opening are targeted.

The vertical axis is the tabelog score, and the horizontal axis is the average word-of-mouth score.
Restaurants with few reviews are color-coded so that they are red and black as the number of reviews increases.
Since there are restaurants that have not been given a tabelog score, they are plotted as a score of 2.9.

Implication

What you can read from this scatter plot is ...

** There is a positive correlation between the average word-of-mouth score and the tabelog score **, which is especially noticeable in shops with many reviews (marker color is close to black).
Shops with few reviews (marker color is close to red) generally have low tabelog scores, but the average word-of-mouth score varies widely.
Because there are few reviews ** The tabelog score is low, but the reviews are high. There are certainly bargain shops! !! ** The shops in the lower right area of the scatter plot correspond to this.
When you actually look at the page of each restaurant, you can see very positive reviews.
However, since these may be marketing on the restaurant side, consideration is required when actually going. In some cases, a restaurant affiliated with a restaurant that is already open in other places and has a high tabelog score has opened in Shibuya, and in such a case, it seems to be quite reliable.
I'm sorry, but I don't want to be mistaken for using tabelog data for anything other than private use, so I'll refrain from listing specific restaurant names here.

4. Deeper consideration

From here on, I'll consider it a little more, with my curiosity in mind.

Score distribution

Looking at the scatter plot earlier, I was curious that the distribution of tabelog scores was distorted, and that restaurants were concentrated on a specific score. So if you look at the distribution I'm also concerned that restaurants are concentrated on a specific score, but I'm also concerned that there are extremely few restaurants on a specific score.

3.04 (41 cases) → 3.05 (7 cases)
3.09 (83 cases) → 3.10 (10 cases)
3.29 (20 cases) → 3.30 (2 cases)
3.34 (40 cases) → 3.35 (7 cases)

I investigated the relationship with the number of reviews, but I could not get a result that could explain this ... Perhaps the key is the data not acquired this time, such as the number of days since opening. On a rumor basis, there is a theory that the annual membership fee paid by the restaurant to the tabelog limits the tabelog score, which may also have an effect. However, depending on the way of thinking, the tabelog score is low due to the upper limit of the score, but the word-of-mouth score is a factor that causes high restaurants, and if you use the approach introduced this time, ** it is not becoming more popular than necessary. You can find a store **.

5. Summary

I was able to collect data from tabelog by scraping.
Although there is a correlation between the tabelog score and the word-of-mouth score, there are certainly shops where the word-of-mouth score is high but the tabelog score is low.
I will go to eat and check the true power of such a restaurant! Lol
Although it deviates from the purpose of this time, if we take a little more time to elucidate the scoring mechanism more deeply, it seems that we can derive ** an efficient marketing strategy for restaurants **. It's when I feel like it again ~

[Python] I want to be a gourmet person [Data Driven approach] Choosing a store for the year-end and New Year holidays