[PYTHON] I want to easily find a delicious restaurant

Is there something like this?

I tried searching on my smartphone to find rice in a strange land, but ...

――It's hard to find a highly rated restaurant from the list of tabelog ――If you use google map, you have to click each search candidate one by one to see the reviews. ――It's hard to tell if the review site is really delicious because it doesn't write negative opinions.

Even if you take out your smartphone and try to find a delicious restaurant in the city, it is difficult to find a restaurant that is highly rated at a glance. *** I want to know the highly rated shops around my current location! *** *** In such a case, the highly rated shop map introduced on this page is convenient.

What to do

By scraping, we collect information on the shops introduced in Tabelog's 100 Famous Shops. By importing that information into Google My Maps, we will create a highly rated shop map.

Before scraping

Extracting data from the Web and making it structured data that can be analyzed is called scraping, but there are some points to note.

Consider copyright

If the collected data contains copyrighted material, you must consider the copyright. Do not pass the data collected by scraping to others or start a business based on the collected data. On the other hand, it seems that copying for private use is permitted.

Do not put an excessive load on the data acquisition destination

In many cases, you will probably send multiple requests to your web server to collect structured data. Be aware that sending a large number of requests in a short period of time can cause the web server to puncture. In the past, there was precedent for being arrested unintentionally.

Check the terms of use

Please read the terms of use carefully as scraping may be prohibited. The website also has a text file called robots.txt for controlling crawlers. I will omit how to check this file, but before scraping, check the robot.txt of the target website and follow the contents described.

Check the safety yourself

I have briefly explained the above points, but it is highly possible that it is not enough because it is an original compilation. Do your own research to make sure there are no problems before scraping.

Implementation

The implementation uses python and its library, BeautifulSoup. These are the most major scraping articles and have many reference articles, so I chose them because they are easy to handle even for beginners.

main.py


import requests
from bs4 import BeautifulSoup
from time import sleep
import pandas

url = "https://award.tabelog.com/hyakumeiten/tonkatsu"
r = requests.get(url)

soup = BeautifulSoup(r.content, "html.parser")
shoplinks = soup.findAll('a', class_='list-shop__link-page')
rowdata = []
for shoplink in shoplinks:
    url = shoplink.get("href")
    rshop = requests.get(url)
    soup = BeautifulSoup(rshop.content, "html.parser")
    print("------------------")
    print(url)
    shopname = soup.find(
        "div", class_="rdheader-rstname").find("h2").find("span")
    print(shopname)
    if shopname is not None:
        shopname = shopname.get_text().strip()
    address = soup.find("p", class_="rstinfo-table__address")
    if address is not None:
        address = address.get_text()
    print(address)
    point = soup.find("span", class_="rdheader-rating__score-val-dtl")
    if point is not None:
        point = point.get_text().strip()
    print(point)
    regholiday = soup.find("dd", class_="rdheader-subinfo__closed-text")
    if regholiday is not None:
        regholiday = regholiday.get_text().strip()[0:10]
    print(regholiday)
    rowdata.append([shopname, address, point, regholiday, url])
    sleep(5)

print(rowdata)

df = pandas.DataFrame(
    rowdata, columns=["shopname", "address", "point", "regular holiday", "url"])
df.to_csv("tonkatsu" + ".csv", index=False)

This code corresponds to 100 Famous Stores 2019. When I checked Hyakumeiten 2020, I could not get the data because the structure of the Web page has changed.

Run

To run it, just install python and run this script. By executing this code, you can get the data of the Tonkatsu store selected as one of the 100 famous stores in csv. If you want to get dumpling data, you can change the end of the url to gyoza and the tonkatsu on the last line to gyoza.

Import to my map

All you have to do is import the generated csv into Google My Maps. After importing csv of all genres, 100 famous stores should be plotted as below. map.jpg

Summary

In this way, we were able to extract the store information introduced in Tabelog 100 Famous Stores as structured data and map it by scraping. My Maps can be viewed on Andoroid, so you can instantly find the 100 best stores closest to your current location.

Task

It's a script I made a long time ago, but when I look at it again, it's such a terrible code that being a python beginner is no excuse at all ... Actually, there is a script that allows you to get the genre from the runtime argument, but I stopped because it is a level that hesitates to publish. If the following is resolved, it may be published on github.

--Executable file --Logging --Read url from config file --Repository structure based on HitchHiker's guide to python --Test --Design review

reference

https://docs.pyq.jp/column/crawler.html https://www.cric.or.jp/qa/hajime/hajime8.html https://ja.wikipedia.org/wiki/%E5%B2%A1%E5%B4%8E%E5%B8%82%E7%AB%8B%E4%B8%AD%E5%A4%AE%E5%9B%B3%E6%9B%B8%E9%A4%A8%E4%BA%8B%E4%BB%B6

Recommended Posts

I want to easily find a delicious restaurant
Scraping and tabelog ~ I want to find a good restaurant! ~ (Work)
I want to easily create a Noise Model
I want to easily implement a timeout in python
NikuGan ~ I want to see a lot of delicious meat! !!
I want to print in a comprehension
I want to build a Python environment
I want to INSERT a DataFrame into MSSQL
I want to create a window in Python
Anyway, I want to check JSON data easily
I want to make a game with Python
I don't want to take a coding test
I want to create a plug-in type implementation
I want to write to a file with Python
I want to upload a Django app to heroku
I want to iterate a Python generator many times
I want DQN Puniki to hit a home run
100 image processing knocks !! (021-030) I want to take a break ...
I want to give a group_id to a pandas data frame
I want to transition with a button in flask
I want to climb a mountain with reinforcement learning
I want to write in Python! (2) Let's write a test
I want to randomly sample a file in Python
I want to work with a robot in python.
I want to split a character string with hiragana
I want to install a package of Php Redis
[Python] I want to make a nested list a tuple
I want to manually create a legend with matplotlib
I want to send a business start email automatically
I want to run a quantum computer with Python
I want to bind a local variable with lambda
I want to solve Sudoku (Sudoku)
I want a mox generator (2)
I want to use R functions easily with ipython notebook
I want to easily delete columns containing NA in R
I want to make a blog editor with django admin
I want to start a jupyter environment with one command
[Python] I want to get a common set between numpy
I want to start a lot of processes from python
I want to make a click macro with pyautogui (desire)
I want to automatically generate a modern metal band name
I want to make a click macro with pyautogui (outlook)
I want to use a virtual environment with jupyter notebook!
I want to install a package from requirements.txt with poetry
I want to send a message from Python to LINE Bot
[Visualization] I want to draw a beautiful graph with Plotly
I want to make input () a nice complement in python
I want to create a Dockerfile for the time being.
I want to find a stock that will rise 5 minutes after the Nikkei Stock Average rises
I want to find the intersection of a Bezier curve and a straight line (Bezier Clipping method)
I want to understand systemd roughly
I want to scrape images to learn
I want to do ○○ with Pandas
I want to copy yolo annotations
I want to debug with Python
I want to record the execution time and keep a log.
I made a library to easily read config files with Python
I want to automatically find high-quality parts from the videos I shot
I want to use a wildcard that I want to shell with Python remove
MacBookPro Setup After all I want to do a clean installation
numpy: I want to convert a single type ndarray to a structured array