A story about downloading the past question PDF of the Fundamental Information Technology Engineer Examination in Python at once

Overview

Download the past question PDF of Fundamental Information Technology Engineer Examination (FE) using Python's urllib package.

Details

The past questions of the Fundamental Information Technology Engineer Examination are [published] on the official IPA website (https://www.jitec.ipa.go.jp/1_04hanni_sukiru/_index_mondai.html). However, questions and answers are posted for each year, and you must go to the page for each year and download them. To save this hassle, download the questions and answers at once using Python's urllib package.

Look up the URL

Looking at the pages posted in the past questions, for example, the URL of the 2015 Spring Examination is as follows.

--Morning exam --Problem: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_am_qs.pdf --Answer: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_am_ans.pdf --Afternoon exam --Problem: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_pm_qs.pdf --Answer: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_pm_ans.pdf --Comment: https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2015h27_1/2015h27h_fe_pm_cmnt.pdf

The structure of the URL of the past question is https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_ In addition to [Western calendar] [Japanese calendar] _ [1 OR 2] / [Western calendar] [Japanese calendar] [h OR a] _fe_ [am OR pm] _ [qs OR ans OR cmnt] .pdf It can be seen that

Implementation

I wrote the code using a lot of for statements without thinking too much.

kakomon.py


import urllib.request

def download():
    #Common (first half) part of URL
    urlbase = "https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_"
    
    #Spring and autumn
    season = {1:"h", 2:"a"}
    
    # 2009-Download the 2019 PDF (questions / answers / comments)
    for y in range(2009,2020):
        nendo = str(y) + "h" + str(y - 1988)  #Example: 2009h21
        for s in range(1,3):
            for t in ["am","pm"]:
                if t == "pm":   #Commentary only in the afternoon
                    try:
                        url = urlbase + nendo + "_" + str(s) + "/" + nendo + season[s] + "_fe_" + t + "_cmnt.pdf"
                        filename = nendo + season[s] + "_fe_" + t + "_cmnt.pdf"
                        urllib.request.urlretrieve(url,"{0}".format(filename))
                    except urllib.error.HTTPError:
                        print("Error: " + filename) #Show file names that could not be downloaded
                for qa in ["qs","ans"]:
                    try:
                        url = urlbase + nendo + "_" + str(s) + "/" + nendo + season[s] + "_fe_" + t + "_" + qa + ".pdf"
                        filename = nendo + season[s] + "_fe_" + t + "_" + qa + ".pdf"
                        urllib.request.urlretrieve(url,"{0}".format(filename))
                    except urllib.error.HTTPError:
                        print("Error: " + filename) #Show file names that could not be downloaded

if __name__ == "__main__":
    download()

When the above is executed, the PDF file is obtained and the following error message is displayed (as of December 30, 2019).

Error: 2011h23h_fe_am_qs.pdf
Error: 2011h23h_fe_am_ans.pdf
Error: 2011h23h_fe_pm_cmnt.pdf
Error: 2011h23h_fe_pm_qs.pdf
Error: 2011h23h_fe_pm_ans.pdf
Error: 2019h31a_fe_am_qs.pdf
Error: 2019h31a_fe_am_ans.pdf
Error: 2019h31a_fe_pm_cmnt.pdf
Error: 2019h31a_fe_pm_qs.pdf
Error: 2019h31a_fe_pm_ans.pdf

This is due to the following two facts.

--Due to the Great East Japan Earthquake, there was no spring test in 2011, and a ** special test ** was held instead. ――Because of the revision, the ** Autumn test was conducted in the fall of 2019.

It is necessary to manually download or rewrite the above program to obtain past questions of the year and time when an error occurs, that is, an exception to the program. For example, the author used the following program.

kakomon_revised.py


import urllib.request

def download():
    #Common (first half) part of the 2011 special exam and 2019 fall exam URL and common part of the file name
    urlbase = {"https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2011h23_1/2011h23tokubetsu_fe_":"2011h23tokubetsu_fe_",
               "https://www.jitec.ipa.go.jp/1_04hanni_sukiru/mondai_kaitou_2019h31_2/2019r01a_fe_":"2019r01a_fe_"}

    #Download PDF (questions, answers, comments) for the 2011 Special Exam and the 2019 Fall Exam
    for u in urlbase:
        for t in ["am","pm"]:
            if t == "pm":   #Commentary only in the afternoon
                try:
                    url = u + t + "_cmnt.pdf"
                    filename = urlbase[u] + t + "_cmnt.pdf"
                    urllib.request.urlretrieve(url,"{0}".format(filename))
                except urllib.error.HTTPError:
                    print("Error: " + filename) #Show file names that could not be downloaded
            for qa in ["qs","ans"]:
                try:
                    url = u + t + "_" + qa + ".pdf"
                    filename = urlbase[u] + t + "_" + qa + ".pdf"
                    urllib.request.urlretrieve(url,"{0}".format(filename))
                except urllib.error.HTTPError:
                    print("Error: " + filename) #Show file names that could not be downloaded

if __name__ == "__main__":
    download()

Digression

There is a change in the afternoon exam from the 2nd year of Reiwa, and it seems that the programming language is COBOL will be abolished and Python will be added. The number of questions, the number of answers, the points assigned, etc. will also change.

reference

--Download files on the Web with Python --Qiita

Recommended Posts

A story about downloading the past question PDF of the Fundamental Information Technology Engineer Examination in Python at once
[Fundamental Information Technology Engineer Examination] I wrote the algorithm of Euclidean algorithm in Python.
[Fundamental Information Technology Engineer Examination] I wrote a linear search algorithm in Python.
[Fundamental Information Technology Engineer Examination] I wrote an algorithm for the maximum value of an array in Python.
Fundamental Information Technology Engineer Examination (FE) Afternoon Exam Python Sample Question Explanation
Experience of taking the Applied Information Technology Engineer Examination
A reminder about the implementation of recommendations in Python
A story about trying to introduce Linter in the middle of a Python (Flask) project
[Fundamental Information Technology Engineer Examination] I wrote an algorithm for determining leap years in Python.
The story of blackjack A processing (python)
Fundamental Information Technology Engineer Examination Implemented Python sample questions without using external libraries
Get the caller of a function in Python
Make a copy of the list in Python
A note about the python version of python virtualenv
About the behavior of Model.get_or_create () of peewee in Python
Output in the form of a python array
python Basic sorting algorithm summary (Basic Information Technology Engineer Examination)
Zip 4 Gbyte problem is a story of the past
Find out the apparent width of a string in python
A story about how to specify a relative path in python.
Get the number of specific elements in a python list
[Note] Import of a file in the parent directory in Python
A story about trying to implement a private variable in Python.
The story of making a question box bot with discord.py
A story about a person who uses Python addicted to the judgment of an empty JavaScript dictionary
The story of creating a bot that displays active members in a specific channel of slack with python
About the ease of Python
About the features of Python
How to determine the existence of a selenium element in Python
A story that struggled to handle the Python package of PocketSphinx
How to check the memory size of a variable in Python
Read the standard output of a subprocess line by line in Python
A function that measures the processing time of a method in python
A memorandum regarding the acquisition of the Python3 engineer certification basic exam
The story of making a module that skips mail with python
Get the number of readers of a treatise on Mendeley in Python
Count the number of times two values appear in a Python 3 iterator type element at the same time
A story about trying to improve the testing process of a system written in C language for 20 years
[Python] How to save the installed package and install it in a new environment at once Mac environment
A story about creating a program that will increase the number of Instagram followers from 0 to 700 in a week