[PYTHON] UpNext2 Development record # 2 Traffic information API-GET-Implementation of file saving and pytest-mock

This is a continuation of the UpNext2 development record. This time, I will write the actual code and tests in the Python CI environment in the previous VS Code. In particular, there are not many examples of description about using mock_open in pytest on the net, so I had a hard time, so I hope it will be helpful.

This article is about mocking API calls using pytest and pytest-mock, mocking file writing with mock_open, covering multiple conditional branches with mark.parametrize, exception handling tests with side_effects, and even relative imports. Includes topics such as countermeasures. The code under test makes the actual API call for the Tokyo Public Transport Open Data Challenge.

The prerequisite environment for this article is Python 3.8.3 Pytest 5.4.2 ; plugins: cov-2.9.0, mock-3.1.1 VSCode 1.46.0 ; Python extention v2020.5.86806 is.

The code I created this time is a commit on June 14, 2020 at https://github.com/toast-uz/UpNext2/tree/develop.

1. Project structure

Create odpt_dump.py and its test based on the environment created last time. odpt_dump.py uses the Tokyo Public Transport Open Data Challenge dump API to download some traffic information files and save them as-is in the local local_data / odpt_dump folder.

image.png

2. Overview of the main module

Below are the main sources of the main module. Explanation I will explain each part of the comment.

odpt_dump.py


import requests

try:  #Commentary a1
    from . import config_secret
except ImportError:
    import config_secret

query_string = ('https://api-tokyochallenge.odpt.org/api/v4/odpt:{}.json'
                '?acl:consumerKey={}')   #Commentary a2
save_path = 'local_data/odpt_dump/{}.json'


def get_and_save(rdf_type):
    url = query_string.format(rdf_type, config_secret.apikey)
    print('Getting {}...'.format(rdf_type), end='', flush=True)
    try:
        response = requests.get(url)
        response.raise_for_status()  #Commentary a3
        with open(save_path.format(rdf_type), 'wb') as save_file:
            save_file.write(response.content)  #Commentary a4
        print('done.')
    except Exception as e:  #Commentary a5
        print('fail, due to: {}'.format(e))
        raise


if __name__ == '__main__':
    for rdf_type in [  #Commentary a6
            'Calendar',
            'Operator',
            'Station',
            'StationTimetable',
            'TrainTimetable',
            'TrainType',
            'RailDirection',
            'Railway']:
        get_and_save(rdf_type)

Explanation a1: Countermeasures for addictive Python relative imports

In config_secret.py, which is the same folder as odpt_dump.py, the API key for Tokyo Public Transport Open Data given to individual developers is defined. I think this project structure is relatively standard, but the success and failure will differ depending on the import method between program execution time and test execution time.

Execution method .With import .None import
Run the program as a file Failure*1 success
Program execution as a module success Failure*2
Test run success Failure*2

*1: ImportError: attempted relative import with no known parent package *2: ModuleNotFoundError: No module named 'config_secret'

If you want to execute the program as a module, specify preprocess.src.odpt_dump.

As a royal road, I think that you should unify to import with. And select execution in module when executing the program. However, since it is easier to execute the program as a file, I implemented it so that it is a little tricky, but multiple import methods are lined up and switched when an error is caught.

Explanation a2: String splitting with () that elegantly avoids the 79 character limit per line

There is a rule of 79 characters or less per line as a rule of pep8, and if this is not satisfied, an error will occur. At that time, the method of dividing a long character string does not use escape or concatenation with +, and it is elegant to use this (). Please note that it is not a tuple.

Commentary a3: Smart exception throw in requests

As a response to requests, it is smart to throw any HTTP error other than HTTP200 as an exception. There is a built-in function raise_for_status () for that.

Explanation a4: with open is basic

Basically, the open method for reading and writing files is written with with and closed implicitly.

Explanation a5: Exception handling collectively

Not only HTTP errors, but all exceptions along the way, including files, are caught here. It raises immediately, so it doesn't matter if you don't have it, but it feels like you're processing it properly. Lol

Explanation a6: Repeat in a loop in the main routine

In main, the above process get_and_save is repeated across multiple download target files. The for-in list format is the most basic way to write a Python loop.

Actually, at first, I wrote solidly in main instead of dividing it into modules. However, when it comes to testing, it's important to minimize the description of main and split it into affordable classes and modules.

3. Overview of test module

Below is the test module. Explanation I will explain each part of the comment.

test_odpt_dump.py


from preprocess.src import odpt_dump
import pytest
import requests

http404_msg = '404 Not Found'


def _mock_response(mocker, is_normal):
    mock_resp = mocker.Mock()  #Commentary b1
    mock_resp.raise_for_status = mocker.Mock()
    if not is_normal:
        mock_resp.raise_for_status.side_effect = requests.exceptions.HTTPError(
            http404_msg)  #Commentary b2
    mock_resp.status_code = 200 if is_normal else 404  #Commentary b3
    mock_resp.content = b'TEST'
    return mock_resp


@pytest.mark.parametrize('is_normal', [   #Commentary b4
    True,
    False,
])
def test_get_and_save(mocker, is_normal):
    mock_resp = _mock_response(mocker, is_normal)
    mocker.patch('requests.get').return_value = mock_resp  #Commentary b5

    mock_file = mocker.mock_open()
    mocker.patch('builtins.open', mock_file)  #Commentary b6

    with pytest.raises(Exception) as e:  #Commentary b7
        odpt_dump.get_and_save('Dummy')
        raise

    if (not is_normal) and (str(e.value) is http404_msg):  #Commentary b8
        return

    assert mock_file.call_count == 1   #Commentary b9
    assert mock_file().write.call_args[0][0] == mock_resp.content


if __name__ == '__main__':
    pytest.main(['-v', __file__])

Explanation b1: Mocking requests.Response object

In pytest, you can mock an object or a function with this description, which is equivalent to MagicMock. When the test is executed, the corresponding object or function of the target method is automatically replaced with the mock, and the process moves to the pre-defined mock. This is the first time I've used a mock, and I thought it was a devilishly mysterious mechanism. Conceptually similar to an API hook.

Explanation b2: Built-in exception handling by side_effect

When you mock an object, the property can simply be pseudo-implemented, but the method must be associated with yet another mock as a function. Of course, properties and methods that are not used in the executable code do not need to be pseudo-implemented. It's just a mock, so you only have to make it where you can see it.

In this case, raise_for_status () is the mock target. In addition, if you want to throw an exception in the process rather than just returning the result of the function's processing, use side_effect.

Explanation b3: Change of mock behavior by parameter

Since is_normal is a test parameter, it is easy to change the properties of a mock object based on its value. On the other hand, it seems that side_effect is needed when changing the behavior based on the input parameters of the mock when running the test. It's confusing, but it's a different thing.

Explanation b4: Switching test parameters by @ pytest.mark.parametrize

@ pytest.mark.parametrize allows you to switch test parameters and repeat test execution. It's smarter than implementing switching with loops or ifs within one test, and test execution is perceived as another independent test, making it easier to work with in VS Code.

It is also possible to switch as multiple parameter sets by describing the parameters in tuples.

Explanation b5: Patching requests.get

Hook the call to requests.get and replace the return value requests.Response object with a mock.

Explanation b6: Patching open

Replace the open method with the special mock mock_open. Here, there are many examples where the open method is expressed as'\ _ \ _ main__. Open', but this does not work and you need to express it as'builtins.open'.

Explanation b7: Exception handling in pytest

The specification of pytest is that if an exception occurs during code execution, the test will be stopped and the test will be considered successful. It seems that stopping at an exception is the correct behavior for the code. Therefore, it is necessary for pytest to explicitly describe the exception handling to determine whether the exception was thrown as intended or whether an unintended exception occurred.

You need to write a with pytest.raises statement to execute the code under test that raises the exception within its with scope.

Explanation b8: Post-processing when an exception occurs

Determine if the exception occurred as intended, outside of with. Here, we check whether the 404 error exception set in the test parameters and mock is occurring as intended.

Explanation b9: Normal result check

Check if writing to the file (replaced by Mock) is successful in the situation where no exception has occurred. The feature of pytest is that you can check the test result with a simple assert statement.

4. Overview of .vscode / setting.json

I will mainly explain setting.json where test settings are described. pytestArgs is a command line option when starting pytest from VS Code.

setting.json


{
    "python.pythonPath": ".pyvenv/bin/python",
    "python.testing.pytestArgs": [
        "-o",
        "junit_family=xunit1",
        "--cov=preprocess/src",
        "--cov-branch",
        "--cov-report=term-missing",
        "preprocess",
    ],
    "python.testing.unittestEnabled": false,
    "python.testing.nosetestsEnabled": false,
    "python.testing.pytestEnabled": true,
    "python.linting.flake8Enabled": true,
    "python.linting.enabled": true
}

-o junit_family = xunit1 must be pytest v5 series? Suppress the alerts that appear. --cov is a setting to display coverage. You can also check conditional branch coverage with --cov-branch, and clarify untested parts by line number with --cov-report = term-missing.

The detailed options around here are not built into the VS Code settings UI, and if you change the pytest settings in the VS Code settings UI, you need to edit the file directly to describe the settings. There is.

5. Test results

Except for the import exception of odpt_dump.py and the main routine, all covered tests have been executed and succeeded.

============================= test session starts ==============================
(snip)
collected 2 items

preprocess/tests/test_odpt_dump.py ..                                    [100%]

(snip)
---------- coverage: platform darwin, python 3.8.3-final-0 -----------
Name                              Stmts   Miss Branch BrPart  Cover   Missing
-----------------------------------------------------------------------------
preprocess/src/__init__.py            0      0      0      0   100%
preprocess/src/config_secret.py       1      0      0      0   100%
preprocess/src/odpt_dump.py          22      4      4      1    73%   13-14, 35->36, 36-45
-----------------------------------------------------------------------------
TOTAL                                23      4      4      1    74%

============================== 2 passed in 0.46s ===============================

(snip) means omitted in the middle

Recommended Posts

UpNext2 Development record # 2 Traffic information API-GET-Implementation of file saving and pytest-mock
Memorandum of saving and loading model
UpNext2 Development Record # 0 Set V2 Development Goals