Scraping: Save website locally

memorandum. For backup purposes only. Solar electromagnetic waves? Solar magnetic storm? In preparation for a global power outage like. This is done because microCMS does not have a backup function.

code

import os
from urllib.request import *

#URL for each article category
#base_url = "https://benzoinfojapan.org/patients-article/"
#base_url = "https://benzoinfojapan.org/doctors-article/"
base_url = "https://benzoinfojapan.org/medias-article/"

#Save destination file name prefix
#prefix = "patients-article"
#prefix = "doctors-article"
prefix = "medias-article"

num = 1

#While num for each category article upper limit<=Set to X.Below are the current values as of October 2020.
#For patients 10
#For patients 26
#For patients 13
# 　　　　　　↓↓
while num <= 13:
    print("Download started")
 
    #Directory where HTML files are saved
    save_dir = os.path.dirname(os.path.abspath(__file__)) + "/html/"
    #Create directory if it does not exist
    if not os.path.exists(save_dir): 
        os.mkdir(save_dir)

    url=base_url + str(num)

    #Destination file path
    num_str = str(num)
    save_file = save_dir + prefix + num_str + ".html"

    urlretrieve(url, save_file)

    # doctors-Necessary processing because the article of article is missing 22nd^^;
    if num != 11:
        num += 1
    else:
        num += 2

No error handling

How to use

Run the above code three times, changing the parameters for each of the three categories.

The only changes are as follows.

base_url
prefix --while num <= xx: ← xx is the current total number of articles.

result

Each page is saved as an HTML file on your local drive.

that's all.

[PYTHON] Scraping: Save website locally

Scraping: Save website locally

code

How to use

result