[PYTHON] Scraping the Excel file of the list of stores handling regional coupons

Scraping the Excel file of List of stores handling common coupons

Scraping

import datetime
import re
from urllib.parse import urljoin

import requests
from bs4 import BeautifulSoup

url = "https://biz.goto.jata-net.or.jp/couponlist.html"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
}

r = requests.get(url, headers=headers)
r.raise_for_status()

soup = BeautifulSoup(r.content, "html.parser")

area = {}

for a in soup.select("section.download_couponlist div.button_area > a"):

    s = a.get("aria-label")
    m = re.search("List of stores handling regional coupons_(.+)((\d{4})Year(\d{1,2})Month(\d{1,2})As of the date)", s)

    year, month, day = map(int, m.group(2, 3, 4))

    area[m.group(1)] = {
        "link": urljoin(url, a.get("href")),
        "date": datetime.date(year, month, day),
    }

area

Data wrangling

import pandas as pd

# area.keys()
# dict_keys(['Hokkaido', 'Tohoku region', 'Kanto region', 'Chubu region', 'Kinki', 'Chugoku region', 'Shikoku region', 'Kyushu / Okinawa region', 'Nationwide'])

df0 = pd.read_excel(area["Nationwide"]["link"])

df0["Paper coupon"] = df0["Coupon type"].str.contains("paper", na=False).astype(int)
df0["Electronic coupon"] = df0["Coupon type"].str.contains("Electronic", na=False).astype(int)

industries = df0["Industry"].str.split(".", expand=True).rename(columns={0: "Industryコード", 1:"Industry名"})

industries["Industry code"] = industries["Industry code"].astype(int)

df1 = pd.concat([df0, industries], axis=1)

pd.crosstab(df1["Industry name"], df1["Prefectures"])
Industry name Mie Prefecture Kyoto Saga Prefecture Hyogo prefecture Hokkaido Chiba Wakayama Prefecture Saitama Oita Prefecture Osaka Nara Prefecture Miyagi Prefecture Miyazaki prefecture Toyama Prefecture Yamaguchi Prefecture Yamagata Prefecture Yamanashi Prefecture Gifu Prefecture Okayama Prefecture Iwate Prefecture Shimane Prefecture Hiroshima Prefecture Tokushima Prefecture Ehime Prefecture Aichi prefecture Niigata Prefecture Tokyo Tochigi Prefecture Okinawa Prefecture Shiga Prefecture Kumamoto Prefecture Ishikawa Prefecture Kanagawa Prefecture Fukui prefecture Fukuoka Prefecture Fukushima Prefecture Akita Gunma Prefecture Ibaraki Prefecture Nagasaki Prefecture Nagano Prefecture Aomori Prefecture Shizuoka Prefecture Kagawa Prefecture Kochi Prefecture Tottori prefecture Kagoshima prefecture
Other 476 762 143 906 1641 823 269 706 449 1114 207 420 164 378 281 363 338 461 257 334 220 496 120 189 1069 651 2446 502 569 253 320 495 1188 334 644 622 184 595 445 274 1216 187 1322 218 168 200 400
Other transportation services 0 4 0 6 11 6 0 2 1 2 2 0 1 6 0 3 2 1 1 4 0 3 2 5 12 7 6 0 4 3 15 2 5 1 1 5 0 5 0 4 11 1 2 7 5 0 2
Convenience store supermarket 619 896 200 1445 2566 1611 356 1772 448 2909 355 787 425 401 293 313 376 702 575 518 235 851 250 562 2552 660 5666 513 1139 494 587 436 2898 363 1403 489 394 511 767 371 688 593 1100 297 286 232 876
Sports 38 19 5 63 76 75 14 48 10 34 9 22 14 9 11 9 18 26 18 14 6 19 8 13 29 40 46 52 41 27 13 14 48 7 38 24 8 46 43 10 115 9 60 8 8 6 17
Watching sports 0 0 1 0 2 0 0 0 0 0 0 1 0 0 3 0 0 0 0 0 0 0 0 0 0 0 32 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0
Fitness (sports gym, etc.) 0 2 1 2 4 1 1 1 2 7 0 0 3 0 1 0 1 3 0 3 1 9 0 1 8 0 16 0 9 0 1 1 5 1 6 3 2 0 0 0 5 0 8 0 1 0 1
Car rental 51 72 22 109 225 163 36 151 36 173 27 73 36 26 41 44 18 43 63 43 28 64 15 37 224 83 302 59 147 31 43 41 185 27 167 63 48 40 48 54 50 55 60 34 14 31 91
Experience-based activities 26 96 9 37 151 44 20 24 11 33 12 13 18 6 11 8 45 36 15 24 6 24 17 9 33 22 133 63 454 20 27 21 58 13 30 10 2 37 9 8 110 17 101 8 32 3 76
Theaters, viewing halls, movie theaters, theaters 0 4 3 3 6 8 0 2 1 4 0 0 1 1 0 0 1 0 0 0 2 2 2 2 4 2 23 1 8 1 2 3 4 3 5 1 3 2 2 0 0 0 2 1 0 0 1
Retail (souvenirs, etc.) 764 1794 441 2116 2456 1341 280 1528 702 2874 393 1117 309 556 477 456 386 869 673 423 309 1001 270 450 2273 754 5273 637 886 602 645 849 2262 473 1918 682 356 636 596 545 1167 458 1554 390 300 197 651
Cultural facilities (museums, museums, etc.) 3 21 1 18 18 7 5 1 16 6 3 10 0 11 7 11 22 21 22 6 8 6 3 10 18 17 13 12 3 8 7 17 30 8 5 9 4 4 2 11 31 4 24 5 11 5 4
Marine transportation 9 8 1 5 14 4 5 1 5 8 0 7 0 3 3 0 0 0 5 1 3 36 2 7 6 5 12 0 25 5 3 1 3 1 8 0 2 0 0 14 2 11 13 18 7 1 14
Logistics (home delivery, etc.) 0 2 0 0 5 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 31 0 1 0 1 0 0 0 0 0 1 0 1 0 1 1 0 0 2
Air transport 0 1 0 0 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0 1 0 2 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 2
Tourist facilities (amusement parks, zoos, hot spring facilities, tourist farms, etc.) 71 50 10 75 89 50 17 38 65 47 13 45 13 20 16 26 52 48 20 32 13 28 18 21 49 42 36 42 43 28 43 25 53 23 22 48 21 54 24 13 119 16 103 18 10 16 18
Amusement facilities (Internet cafes, manga cafes, etc.) 1 2 0 5 20 5 0 7 0 14 2 7 1 1 3 0 0 1 3 0 0 3 1 0 5 5 48 3 4 1 2 0 14 0 7 2 0 2 4 0 2 0 2 3 3 0 1
Railroad 8 26 0 14 5 12 8 7 0 17 1 44 0 32 2 21 4 18 9 31 1 6 9 14 94 49 59 8 19 8 6 9 28 5 43 31 14 10 27 5 53 17 69 29 8 2 8
Restaurant (alcoholic beverages provided) 275 728 41 312 576 487 51 195 63 884 124 103 40 132 163 54 74 218 115 53 43 328 47 57 719 227 1578 151 214 128 136 232 522 199 255 110 82 75 98 61 537 40 513 90 92 26 104
Restaurant (no alcoholic beverages provided) 76 105 13 117 96 111 15 62 23 132 33 26 11 30 42 14 15 77 29 20 15 62 18 18 165 56 130 43 36 38 49 70 99 32 59 23 11 25 25 17 122 19 125 58 24 7 25

Recommended Posts

Scraping the Excel file of the list of stores handling regional coupons
Scraping the result of "Schedule-kun"
Scraping the list of Go To EAT member stores in Fukuoka prefecture and converting it to CSV
Scraping the list of Go To EAT member stores in Niigata prefecture and converting it to CSV
Dig the directory and create a list of directory paths + file names
The story of the "hole" in the file
About the basics list of Python basics
Open an Excel file in Python and color the map of Japan
Handling of character code of file in IronPython
Algorithm Gymnastics 24 Middle of the Linked List
Get the column list & data list of CASTable
[Python] Get the character code of the file
[Python3] Understand the basics of file operations
[Python] Scan the inside of the folder including subfolders → Export the file list to CSV