I was interested in the GoToEat campaign because it became a hot topic in the Torikizoku Marathon. This time, I would like to use Python scraping and mapping techniques to map the stores targeted for the campaign. GoToEat Campaign Site of Ministry of Agriculture, Forestry and Fisheries
We scraped the page of Tabelog Campaign Target Stores to collect information.
At first glance, I couldn't find the wording that scraping was prohibited on the site ...
I will omit the scraping source code this time and only give an overview. The information to be acquired is ['Store name',' Genre',' Review rate',' Minimum budget',' Phone number',' Smoking information',' Address',' Prefecture',' Area code',' URL ']will do.
import requests
from bs4 import BeautifulSoup
import time
import pandas as pd
from pprint import pprint
def scrape_page(soup, df, cols):
for title in soup.find_all('div', class_='list-rst__rst-name'):
#Acquisition of store name
name = title.find('a').text.strip()
#Get store url
shop_url = title.find('a').get('href')
#Acquisition of genre
genre = title.find('span', class_='list-rst__area-genre cpy-area-genre').text
genre = genre.replace('/', '').replace(' ', '').replace('、', ',')
print('「' + name + 'Information is being acquired ...')
#Get detailed information by scraping the page for each shop from the obtained url
res_shop = requests.get(shop_url)
soup_shop = BeautifulSoup(res_shop.content, 'html.parser')
time.sleep(0.1)
#Prefectures,Acquisition of area
prefecture = shop_url.split('/')[3]
area = shop_url.split('/')[4]
#Get an address
address = soup_shop.find('p', class_='rstinfo-table__address').text.strip()
#Get word-of-mouth rate--Since an error occurs, control it with a try statement
try:
rate = float(soup_shop.find('span', class_='rdheader-rating__score-val-dtl').text.strip())
except ValueError:
rate = '-'
#Get the minimum budget
budget = soup_shop.find('em', class_='gly-b-dinner').text.strip()
budget = budget.split('~')[0].replace('¥', '').replace(',', '')
budget = int(budget)
#Get a phone number
phone_num = soup_shop.find('p', class_='rstinfo-table__tel-num-wrap').text.strip()
#Acquisition of smoking information
smoke = soup_shop.find('p', class_='p-input-form__line').text.strip()
#Create an empty list for each shop and append
shop_datas = []
shop_datas.append(name)
shop_datas.append(genre)
shop_datas.append(rate)
shop_datas.append(budget)
shop_datas.append(phone_num)
shop_datas.append(smoke)
shop_datas.append(address)
shop_datas.append(prefecture)
shop_datas.append(area)
shop_datas.append(shop_url)
#Generate a dataframe from a list and merge
df_shop = pd.DataFrame([shop_datas], columns=cols, index=None)
df = pd.concat([df, df_shop], sort=False)
return df
def job():
url = 'https://tabelog.com/go-to-eat/list/?page='
page_cnt = 1
cols = ['name', 'genre', 'rate', 'budget', 'tel', 'smoke', 'address', 'prefectures', 'area', 'url']
df = pd.DataFrame(columns=cols, index=None)
url = url + str(page_cnt)
res = requests.get(url)
soup = BeautifulSoup(res.content, 'html.parser')
df = scrape_page(soup, df, cols)
pprint(df.columns.values)
pprint(df.head().values)
job()
output
Obtaining information on "Charcoal-grilled meat Hiyoriya" ...
Obtaining information on "Charcoal-grilled meat / Korean food KollaBo Ebisu store" ...
Obtaining information on "Miraku" ...
Obtaining information on "Wine Bar. Dipunto Shibuya Jinnan" ...
Obtaining information on "Kobe beef steak cooking Yuzuki Hana charcoal grill" ...
Obtaining information on "Italian Alberta KARASUMA of fish and vegetables" ...
Obtaining information on "Ningyocho Wine Bar" ...
Obtaining information on "Basashi and Motsunabe Izakaya Kyushu Komachi Private Room All-you-can-drink Kanayama 2-chome" ...
Obtaining information on "Adult Sake Bar Irori" ...
Obtaining information on "Hidden Bunch Machida" ...
Obtaining information on "Aloha Table Yokohama Bay Quarter" ...
Obtaining information on "Beer Garden Terrace Iidabashi" ...
Obtaining information on "Takkanmari University" ...
Obtaining information on "Private room steak & Italian Dining VT Ebisu store" ...
Obtaining information on "Mountain lodge bar beast" ...
Obtaining information on "Genkatsugu" ...
Obtaining information on "Kyomachi Koishigure Shinjuku Main Building" ...
Obtaining information on "GINTO Ikebukuro" ...
Obtaining information on "Satsuma Jidori and Private Izakaya Izakaya Shinjuku" ...
"Meat wholesaler 25-Obtaining information on "89" ...
array(['name', 'genre', 'rate', 'budget', 'tel', 'smoke', 'address',
'prefectures', 'area', 'url'], dtype=object)
array([['Charcoal grilled meat\u3000 Hiyori', 'Grilled meat,Izakaya,steak', 3.2, 4000, '050-5869-1319',
'All seats can be smoked', '1 Nakayamatedori, Chuo-ku, Kobe City, Hyogo Prefecture-7-5 S Building 2F', 'hyogo', 'A2801',
'https://tabelog.com/hyogo/A2801/A280101/28032248/'],
['Charcoal-grilled meat / Korean cuisine KollaBo Ebisu store', 'Grilled meat,Korean cuisine,Izakaya', 3.09, 3000,
'050-5571-4836', 'All seats are non-smoking', '1 Ebisu Nishi, Shibuya-ku, Tokyo-14-1 Sunrise Building\u30002F',
'tokyo', 'A1303',
'https://tabelog.com/tokyo/A1303/A130302/13178659/'],
['Miraku', 'Izakaya,Seafood / seafood,Bowl of rice topped with sashimi', 3.59, 3000, '050-5595-5384', 'Separate smoke',
'1 Kyomachi, Kokurakita-ku, Kitakyushu City, Fukuoka Prefecture-6-28', 'fukuoka', 'A4004',
'https://tabelog.com/fukuoka/A4004/A400401/40001071/'],
['Wine bar. Dipunto Shibuya Jinnan', 'Bar Bar,Izakaya,Italian', 3.08, 2000,
'050-5589-7383', 'Separate smoke', '1 Jinnan, Shibuya-ku, Tokyo-20-17 B1F', 'tokyo', 'A1303',
'https://tabelog.com/tokyo/A1303/A130301/13186934/'],
['Kobe beef steak cooking Yuzuki Hana charcoal grill', 'steak,Kappou / Small dishes,Grilled meat', 3.1, 10000,
'050-5594-6576', 'All seats are non-smoking', '4 Kano-cho, Chuo-ku, Kobe-shi, Hyogo-8-19 Parfun Building 2F',
'hyogo', 'A2801',
'https://tabelog.com/hyogo/A2801/A280101/28050226/']],
dtype=object)
I was able to get the information on one page successfully. After that, if you count up the pages of the script like the above and loop by prefecture, you can get all the information (sweat that took a lot of time).
I used folium as in the example, but I can't map unless I enter the latitude and longitude in addition to the address. I searched a lot, but using this site makes it easy to move from the address to the latitude and longitude. I used it because it seemed that I could get it. If you upload the csv file with the address, you can download the file with the latitude and longitude added to the column. Save it as tabelog_geo.csv and check the contents of the data.
df = pd.read_csv('tabelog_geo.csv', encoding='shift_jis')
print('The number of data:{}'.format(len(df)))
print('-------column---------')
pprint(df.columns.values)
print('-------head values----')
pprint(df.head().values)
output
The number of data:15020
-------column---------
array(['name', 'genre', 'rate', 'budget', 'tel', 'smoke', 'address',
'url', 'prefectures', 'area', 'LocName', 'fX', 'fY', 'iConf',
'iLvl'], dtype=object)
-------head values----
array([['Horse meat bar bounce Madamachi Mita store', 'Bar Bar,Izakaya,Italian', 3.51, 3000, '050-5869-1861',
'All seats are non-smoking', '5 Shiba, Minato-ku, Tokyo-22-5 Harada Building 1F',
'https://tabelog.com/tokyo/A1314/A131402/13143781/', 'tokyo',
'A1314', 'Tokyo/Minato-ku/Turf/5-chome/No. 22', 139.74643999999998,
35.648109999999996, 5.0, 7.0],
['Dorado', 'bistro,French,Western food', 3.64, 5000, '050-5596-1243', 'All seats are non-smoking',
'3 Sakae, Naka-ku, Nagoya-shi, Aichi-10-14 Pivot Lion Building 2F',
'https://tabelog.com/aichi/A2301/A230102/23029870/', 'aichi',
'A2301', 'Aichi prefecture/Nagoya city/Naka-ku/Sannomaru/4-chome', 136.90649, 35.18425, 5.0, 6.0],
['Izakaya Ryunosu', 'Izakaya,Seafood / seafood,Gyoza', 3.16, 3000, '050-5456-3379',
'All seats are non-smoking', '2 Saiwaicho, Chitose City, Hokkaido-5-1',
'https://tabelog.com/hokkaido/A0107/A010701/1024501/',
'hokkaido', 'A0107', 'Hokkaido/Chitose/Yukimachi/2-chome/Address 5', 141.64700000000002,
42.822959999999995, 5.0, 7.0],
['Kushikatsu Dengana Yokohama Minamisaiwai', 'Izakaya,Deep-fried skewers, skewers,hormone', 3.01, 0, '050-5597-9448',
'-', '2 Minamisaiwai, Nishi-ku, Yokohama-shi, Kanagawa-8-20SFBuilding2F',
'https://tabelog.com/kanagawa/A1401/A140101/14079956/',
'kanagawa', 'A1401', 'Kanagawa Prefecture/Naka-gun/Ninomiya Town/Ninomiya', 139.25673999999998,
35.30523, 5.0, 5.0],
['Charcoal-grilled meat / Korean food KollaBo Shinbashi store', 'Izakaya,Grilled meat,Korean cuisine', 3.06, 3000,
'050-5592-3837', 'Separate smoke', '2 Shinbashi, Minato-ku, Tokyo-8-17 Sanken Building 1F',
'https://tabelog.com/tokyo/A1301/A130103/13197832/', 'tokyo',
'A1301', 'Tokyo/Minato-ku/Shimbashi/2-chome/No. 8', 139.7563, 35.66747, 5.0, 7.0]],
dtype=object)
Latitude and longitude information is included in the fX and fY columns.
I was wondering how the tabelog rates are distributed, so I will plot using seaborn. Histogram and violin plot.
#Histogram generation
def plot_hist(df):
#Because some rates cannot be obtained and are registered as 0
df = df[df['rate'] != 0]
sns.set()
fig, ax = plt.subplots(figsize=(10, 6))
sns.distplot(
df['rate'], bins=10, color='blue', label='rate',
kde=True)
ax.set_title('histogram --tabelog rate')
plt.show()
#Generation of violin plot
def plot_violin(df):
df = df[df['rate'] != 0]
sns.set()
fig, ax = plt.subplots(figsize=(10, 6))
sns.violinplot(x=df['rate'].values, color='lightgreen')
ax.set_title('violin_plot --tabelog rate')
plt.show()
I've never used a violin plot for the first time, but I'm surprised that it unintentionally becomes obscene. Aside from that, it can be read that stores with a rating of about 3.4 or higher are highly rated in the whole.
If you map all the information, it will be a huge number, so I would like to pin the information narrowed down on the map in advance. This time, the genre includes "Italy", the maximum budget is "10,000 yen", the tabelog rate is "3.2 or more", and the prefectures are "Tokyo". Also, in the previous plot, a rate of 3.4 or higher seems to be good, so for stores with a rate of 3.4 or higher, make it stand out in green.
import folium
import pandas as pd
def param_input(df):
#Setting of each parameter-Not specified''
param_genre = 'Italy' #str input
param_budget = 10000 #int input maximum budget
param_rate = 3.2 #float input minimum rate
param_prefecture = 'tokyo' #str Half-width alphabet input
param_address = '' #str input
#Narrow down df according to param
if param_genre != '':
df = df[df['genre'].str.contains(param_genre)]
if param_budget != '':
df = df[df['budget'] <= param_budget]
if param_rate != '':
df = df[df['rate'] >= param_rate]
if param_prefecture != '':
df = df[df['prefectures'] == param_prefecture]
if param_address != '':
df = df[df['address'].str.contains(param_address)]
return df
def make_folium_map(df):
#Generate a map with the very first record as the starting point
start_loc = df[['fY', 'fX']].values[0]
m = folium.Map(location=start_loc, tiles='Stamen Terrain', zoom_start=12)
for data in df[['fY', 'fX', 'name', 'rate', 'budget', 'smoke']].values:
lat = data[0]
lon = data[1]
name = data[2]
rate = data[3]
budget = data[4]
smoke = data[5]
#Generate ping character
ping_text = '{name}<br>{rate}<br>{budget}<br>{smoke}' \
.format(name=name,
rate='Evaluation:' + str(rate),
budget=str(budget) + 'Yen ~',
smoke=smoke)
#icon according to rate_set color
# 3.Display 4 or more in green
if rate > 3.4:
icon_color = 'green'
else:
icon_color = 'blue'
#Put a pin on the map
folium.Marker([lat, lon],
tooltip=ping_text,
icon=folium.Icon(color=icon_color)).add_to(m)
m.save('tabelog_map.html')
df = pd.read_csv('tabelog_geo.csv', encoding='shift_jis')
df = param_input(df)
make_folium_map(df)
result
Zoom in and move the cursor to the store information.
I was able to successfully map in the html file. It would be even better if I could upload it on the web using heroku, but since I am currently trying it, I will hurry to publish it as a screenshot.
It may be interesting to discover new things by generating random numbers and randomly deciding which store to go to. After all, I tend to go to the stores I always go to.
[This site](https://hikiniku11029.hatenablog.com/entry/2019/06/18/folium%E3%81%A7%E3%83%92%E3%83%BC%E3%83%88% E3% 83% 9E% E3% 83% 83% E3% 83% 97% E3% 81% AE% E4% BD% 9C% E3% 82% 8A% E6% 96% B9% E3% 82% 92% E3% I made a heat map with reference to 81% 8A% E3% 81% BC% E3% 81% 88% E3% 81% 9F). The place where there is a high-rate store is shining strongly.