[PYTHON] Complete understanding of numpy.pad functions

Target person

For those who are not familiar with the numpy.pad function that you see while studying convolutional neural networks (CNN) in deep learning. Official document will be crushed and translated into Japanese.

table of contents

-[What is the pad function](What is the #pad function) -[About the first argument](# About the first argument) -

[About the second argument](# About the second argument) -[For 1D](# For 1D) -[For 2D](# For 2D) -[For 4D](# For 4D) -[About the third argument](# About the third argument) -[Operation with ʻim2col`](Operation with # im2col)

What is the pad function?

The pad function that appears on CNN behaves quite confusingly, right? Most books aren't the main ones

pad_example.py


x = np.pad(x, [(0, 0), (0, 0), (pad, pad), (pad, pad)], "constant")

If so, I think it's only OK. So, I will thoroughly dissect this function. In the official documentation

numpy.pad(array, pad_width, mode='constant', **kwargs)

You wrote that the argument is specified like this. Let's look at each one first.

About the first argument

First, let's take a look at the official documentation.

array : array_like of rank N The array to pad.

Translated into Japanese

array: An array of rank N or something similar Array to pad

It will be. Rank (rank) is a technical term for linear algebra, and I think it's okay to recognize it as a dimension number here ... For more information, see here and [here]( Please see https://deepage.net/features/numpy-rank.html).

For the time being, you should know about this. Specify the array to be padded.

About the second argument

Well, the problem is the second argument.

pad_width : {sequence, array_like, int} Number of values padded to the edges of each axis. ((before_1, after_1), ..., (before_N, after_N)) unique pad widths for each axis. ((before, after),) yields same before and after pad for each axis. (pad,) or int is a shortcut for before = after = pad width for all axes.

I will translate it into Japanese.

pad_width: {sequence, array or similar, integer} The number of numbers padded at the end of each dimension. ((before_1, after_1), ..., (before_N, after_N)): Specify the padding width (before_i, after_i) specific to each dimension. ((before, after),): Specify the same padding width (before, after) for each dimension. (pad,) or integer: Specify the same padding width (before = after = pad) for all dimensions.

Well, the meaning is hard to understand. Let's take a look at the implementation as well.

pad_example.py


import numpy as np


x_1d = np.arange(1, 3 + 1)
print(x_1d)

In the case of one dimension

Let's start with a one-dimensional array. Try as specified in each document. First of all

((before_1, after_1), ..., (before_N, after_N))

is not it.

pad_example.py


print(np.pad(x_1d, ((1, 1))))
print(np.pad(x_1d, ((2, 1))))
print(np.pad(x_1d, ((1, 2))))
pad_1d_tuple_i.png You can understand this somehow, right? Since it is one-dimensional, only one `tuple` can be specified, for $ 0 $ each to the left of the array by the number specified by` before_1` and to the right of the array by the number specified by ʻafter_1`. It's padded. By the way, I intend to write it in double tuples, but in fact Python treats it in the same way as single tuples.

continue

((before, after),)

Let's do it.

pad_example.py


print(np.pad(x_1d, ((1, 1),)))
print(np.pad(x_1d, ((2, 1),)))
print(np.pad(x_1d, ((1, 2),)))
pad_1d_tuple.png Yes, the results are the same. Here, the argument is explicitly sent as a double tuple.

Finally

(Pad,) or an integer

Let's do it.

pad_example.py


print(np.pad(x_1d, (1,)))
print(np.pad(x_1d, (2,)))
print(np.pad(x_1d, 1))
print(np.pad(x_1d, 2))
pad_1d_pad_int.png The specified number of $ 0 $ is filled in both ends. With this specification method, the same number of pads will be padded at both ends.

In the case of 2D

Next, let's try a two-dimensional array.

pad_example.py


x_2d = np.arange(1, 3*3 + 1).reshape(3, 3)
print(x_2d)

print(np.pad(x_2d, ((1, 1), (2, 2))))
print(np.pad(x_2d, ((2, 2), (1, 1))))
print(np.pad(x_2d, ((1, 2), (1, 2))))
print(np.pad(x_2d, ((2, 1), (1, 2))))

print(np.pad(x_2d, ((1, 1),)))
print(np.pad(x_2d, ((1, 2),)))
print(np.pad(x_2d, ((2, 1),)))
print(np.pad(x_2d, ((2, 2),)))

print(np.pad(x_2d, (1,)))
print(np.pad(x_2d, (2,)))
print(np.pad(x_2d, 1))
print(np.pad(x_2d, 2))

Result of ((before_i, after_i)) pad_2d_tuple_i_1122.png pad_2d_tuple_i_2211.png pad_2d_tuple_i_1212.png pad_2d_tuple_i_2112.png Result of ((before, after),) pad_2d_tuple_11.png pad_2d_tuple_12.png pad_2d_tuple_21.png pad_2d_tuple_22.png Result of (pad,) pad_2d_pad_1.png pad_2d_pad_2.png Integer result pad_2d_int_1.png pad_2d_int_2.png Now, in the case of 2D, it is first padded in the 1st dimension row (upper and lower), and then in the 2nd dimension column (left and right). Other than that, it's the same as in one dimension.

In the case of 4 dimensions

As you can see, I will skip 3D and experiment in 4D. ** It is recommended to uncomment one by one and execute. It is very difficult to see because the output becomes long vertically. ** **

pad_example.py


def print_4darray(x):
    first, second, third, fourth = x.shape
    x_str_size = len(str(np.max(x)))
    for i in range(first):
        for k in range(third):
            for j in range(second):
                str_size = len(str(np.max(x[i, j, k, :])))
                if x_str_size != str_size:
                    add_size = "{: " +str(x_str_size - str_size)+ "d}"
                    np.set_printoptions(
                        formatter={'int': add_size.format})
                else:
                    np.set_printoptions()
                print(x[i, j, k, :], end=" ")
            print()
        print()

x_4d = np.arange(1, 3*3*3*3 + 1).reshape(3, 3, 3, 3)
print_4darray(x_4d)

print_4darray(np.pad(x_4d, ((1, 1), (2, 2), (0, 0), (0, 0))))
print_4darray(np.pad(x_4d, ((0, 0), (0, 0), (2, 2), (1, 1))))
print_4darray(np.pad(x_4d, ((1, 1), (0, 0), (2, 2), (0, 0))))
print_4darray(np.pad(x_4d, ((0, 0), (1, 1), (0, 0), (2, 2))))
print_4darray(np.pad(x_4d, ((0, 0), (1, 1), (2, 2), (0, 0))))
print_4darray(np.pad(x_4d, ((1, 1), (0, 0), (0, 0), (2, 2))))

#print_4darray(np.pad(x_4d, ((1, 1),)))
#print_4darray(np.pad(x_4d, ((1, 2),)))
#print_4darray(np.pad(x_4d, ((2, 1),)))
#print_4darray(np.pad(x_4d, ((2, 2),)))

#print_4darray(np.pad(x_4d, (1,)))
#print_4darray(np.pad(x_4d, (2,)))
#print_4darray(np.pad(x_4d, 1))
#print_4darray(np.pad(x_4d, 2))
`np.pad (x_4d, ((1, 1), (2, 2), (0, 0), (0, 0)))` result pad_4d_tuple_i_11220000.png
The result of `np.pad (x_4d, ((0, 0), (0, 0), (2, 2), (1, 1)))` pad_4d_tuple_i_00002211.png
`np.pad (x_4d, ((1, 1), (0, 0), (2, 2), (0, 0)))` result pad_4d_tuple_i_11002200.png
`np.pad (x_4d, ((0, 0), (1, 1), (0, 0), (2, 2)))` result pad_4d_tuple_i_00110022.png
`np.pad (x_4d, ((0, 0), (1, 1), (2, 2), (0, 0)))` result pad_4d_tuple_i_00112200.png
`np.pad (x_4d, ((1, 1), (0, 0), (0, 0), (2, 2)))` result pad_4d_tuple_i_11000022.png
Result of `np.pad (x_4d, ((1, 1),))` pad_4d_tuple_11.png
Result of `np.pad (x_4d, ((1, 2),))` pad_4d_tuple_12.png
Result of `np.pad (x_4d, ((2, 1),))` pad_4d_tuple_21.png
Result of `np.pad (x_4d, ((2, 2),))` pad_4d_tuple_22.png
Result of `np.pad (x_4d, (1,))` pad_4d_pad_1.png
Result of `np.pad (x_4d, (2,))` pad_4d_pad_2.png
Result of `np.pad (x_4d, 1)` pad_4d_int_1.png
Result of `np.pad (x_4d, 2)` pad_4d_int_2.png
As you can see, the padding is done in the order of vertical, horizontal, vertical and horizontal. In higher dimensions, padding is an array rather than a number.

Although it is a print_4darray function, it loops in the order of the 1st dimension, 3rd dimension, and 2nd dimension, and outputs the 4th dimension with the print function. At this time, ʻend =" "is used to output a half-width space instead of a line break. After that, I use line breaks for adjustment and thenp.set_printoptions` function to control whitespace at the time of output. I created it because it is hard to see in the standard output of numpy.

By the way, when I run the code, there are probably some things that don't fit on the screen. The image is a stack of multiple screenshots. Lol Also, I also widened the cell width of jupyter notebook.

About the third argument

Let's also look at the third argument. Because it is long, each part.

modestr or function, optional One of the following string values or a user supplied function.

An optional argument that specifies a string or function that specifies the mode. One of the strings below, or the user specifies the function.

There is no particular problem with the explanation of the arguments themselves. The user-specified functions will be described later.

Description of `constant`

‘constant’ (default) Pads with a constant value.

constant (default) Pad with a constant (0).

`constant` stands for zero padding. This is used in the ʻim2col` function. This was used in the examples up to [Second argument](in the case of #second argument). So I will omit the example.
ʻedge` description

‘edge’ Pads with the edge values of array.

edge Pad with the values at the ends of the matrix.

You can see this by looking at an example.

pad_example.py


print(np.pad(x_2d, 1, "edge"))
pad_2d_edge.png This is an example of padding in a two-dimensional array. The $ 3 \ times 3 $ element in the center was the original array. From the edge of the array after padding, it goes vertically, horizontally, and diagonally toward the center, copying the value that was first encountered.
Description of `linear_ramp`

‘linear_ramp’ Pads with the linear ramp between end_value and the array edge value.

linear_ramp Pad with a ramp function between the last value and the edge value.

I don't understand even if I translate it into Japanese. Let's actually move it.

pad_example.py


print(np.pad(x_2d, 3, "linear_ramp"))
pad_2d_linear_ramp.png Even if you actually move it, it's hard to understand at first glance lol. Let's divide it into blocks. pad_2d_linear_ramp.png I can't read the behavior of red blocks , but can't you see between blocks of other colors? Take light blue block as an example. Focusing on the vertical and horizontal directions of $ 3 $, it is $ 3210 $ toward the end value of $ 0 $. As an image, it feels like $ 0 \ le x \ le 3 $ is divided into 4 equal parts and truncated. Here, it is divided into equal parts like $ 0, 1, 2, 3 $, so it appears as it is. Notice the green block . The same rules can be applied here as well. Dividing $ 0 \ le x \ le 7 $ into four equal parts gives $ 0, 2. \ dot3, 4. \ dot6, 7 $, and $ 0, 2, 4, 7 $ appears. Also, for the other elements of each block, the values determined in the above example are arranged diagonally in a strip. The value at the end is $ 0 $.
Description of `maximum`

‘maximum’ Pads with the maximum value of all or part of the vector along each axis.

maximum Pads with the maximum value of all or part of the vector for each axis.

You can understand what you want to say. It feels like padding at the maximum value. However, this is unexpectedly complicated. pad_2d_maximum.png Is it subtle? ?? It feels like, but if you understand the rules, you will be satisfied. pad_2d_maximum.png The purple block is the original array. I will put padding on this. First, the vertical horizontal values are padded with the maximum value in each block. And after all of them are done, the value of corner is padded with the maximum value in the block.
Explanation of `mean`

‘mean’ Pads with the mean value of all or part of the vector along each axis.

mean Pads with the average value of all or part of the vectors for each axis.

This also works on the same principle as `maximum`. The only difference is whether it is the maximum value or the average value.

pad_example.py


print(np.pad(x_2d, 1, "mean"))
pad_2d_mean.png
Description of `median`

‘median’ Pads with the median value of all or part of the vector along each axis.

median Pads at the median of all or part of the vector for each axis.

It has the same operating principle as `maximum` and` median`. There is nothing special to mention.

pad_example.py


print(np.pad(x_2d, 1, "median"))
pad_2d_median.png
Description of `minimum`

‘minimum’ Pads with the minimum value of all or part of the vector along each axis.

minimum Pads with the minimum value of all or part of the vector for each axis.

Pads with the same operating principle as `maximum`.

pad_example.py


print(np.pad(x_2d, 1, "minimum"))
pad_2d_minimum.png
Explanation of `reflect`

‘reflect’ Pads with the reflection of the vector mirrored on the first and last values of the vector along each axis.

reflect Pads with the reflection of the vector that copied the first and last values of the vector for each axis.

I don't understand the meaning at all. Let's actually move it.

pad_example.py


print(np.pad(x_2d, 2, "reflect"))
pad_2d_reflect_2.png It's hard to understand ... but somehow I don't understand. Divide into blocks as in the example. pad_2d_reflect_2.png How about this? It's padded symmetrically with respect to the values located between blocks of the same color. Focusing on the light blue block , you can see that $ 1 $ is the center and $ 4, 7 $ below it is padded with a "reflected vector". The green block has a center of $ 3 $ and the $ 1, 2 $ to the left of it is padded with a "reflected vector" and In the red block , $ 5, 6, 8, 9 $ is padded with a "reflected vector" centered on $ 1 $.
`symmetric` description

‘symmetric’ Pads with the reflection of the vector mirrored along the edge of the array.

symmetric Pad with the reflection of the vector along the edge of the array.

The wording is similar to `reflect`. Let's see how it differs.

pad_example.py


print(np.pad(x_2d, 2, "symmetric"))
pad_2d_symmetric.png The biggest difference from `reflect` is whether the value at the end of the original array is" reflected "or" reflected without it ".
`wrap` description

‘wrap’ Pads with the wrap of the vector along the axis. The first values are used to pad the end and the end values are used to pad the beginning.

wrap Pad with a vector wrap along the axis. The first value is used to pad the last and the last value is used to pad the first.

?? ?? It is a feeling. Let's move it.

pad_example.py


print(np.pad(x_2d, 2, "wrap"))
pad_2d_wrap.png It's obvious when you look at it like this. A non-reflective version of `reflect`. I want you to write that as an official document anymore ...
ʻempty` description

‘empty’ Pads with undefined values. New in version 1.17.

empty Pads with an indefinite value. Added in version 1.17 of numpy.

This is a version that reserves only the padding memory and does not initialize it. It's similar to the `numpy.empty` function and so on. Let's check for the time being.

pad_example.py


import numpy as np


print(np.pad(np.arange(1, 3*3+1).reshape(3, 3), 2, "empty"))
print(np.pad(np.arange(1, 3*3+1).reshape(3, 3), 5, "empty"))
pad_2d_empty_2.png pad_2d_empty_5.png I'm creating a new notebook for experimentation (not a new cell). In my environment it looks like the above. Up to a padding width of 4 with the ʻempty` command, $ 0 $ padding is output, and above $ 5 $, an indefinite value, probably remaining in the allocated memory destination, is output. By the way, the image with padding width of $ 5 $ is only a part because it can't be helped to take the whole image.
\ description

<function> Padding function, see Notes.

Notes New in version 1.7.0. For an array with rank greater than 1, some of the padding of later axes is calculated from padding of previous axes. This is easiest to think about with a rank 2 array where the corners of the padded array are calculated by using padded values from the first axis.

The padding function, if used, should modify a rank 1 array in-place. It has the following signature:

padding_func(vector, iaxis_pad_width, iaxis, kwargs) where

vector: ndarray A rank 1 array already padded with zeros. Padded values are vector[:iaxis_pad_width[0]] and vector[-iaxis_pad_width[1]:].

iaxis_pad_width: tuple A 2-tuple of ints, iaxis_pad_width[0] represents the number of values padded at the beginning of vector where iaxis_pad_width[1] represents the number of values padded at the end of vector.

iaxis: int The axis currently being calculated.

kwargs: dict Any keyword arguments the function requires.

<function> Padding function. See note.

Notes Added in version 1.7.0 of numpy. Due to the rank 1 or higher array, some higher-order padding is calculated from the lower-order padding. This is most obvious when you consider using the padding values you have already applied to determine the corner elements of an array that has been padded for a two-dimensional array.

When using the padding function, it is necessary to change the one-dimensional array by the prescribed method. It looks like this:

padding_func(vector, iaxis_pad_width, iaxis, kwargs) For each argument

vector: ndarray The one-dimensional array is already padded with 0. The padded values are vector [: iaxis_pad_width [0]] and vector [-iaxis_pad_width [1]:].

iaxis_pad_width: tuple In a double tuple of integers, ʻiaxis_pad_width [0] represents the number of values padded at the beginning of the vector and ʻiaxis_pad_width [1] represents the number of values padded at the end of the vector.

iaxis: int The dimension currently being calculated.

kwargs: dict Some keyword arguments required by the function.

It seems that if you define a function according to `padding_func` and pass it, it will pad as it is. Let's actually experiment with the code on the official website.

pad_example.py


def pad_with(vector, pad_width, iaxis, kwargs):
    pad_value = kwargs.get('padder', 10)
    vector[:pad_width[0]] = pad_value
    vector[-pad_width[1]:] = pad_value

print(np.pad(x_2d, 2, pad_with))
print(np.pad(x_2d, 2, pad_with, padder=100))
pad_2d_function.png `vector`,` pad_width` and ʻiaxis` are passed automatically. If the user wants to specify another argument, pass it as a keyword argument to the `numpy.pad` function and retrieve it in the padding function (`pad_value = kwargs.get ('padder', 10) `).
Description of keyword argument `stat_length`

stat_length: sequence or int, optional Used in ‘maximum’, ‘mean’, ‘median’, and ‘minimum’. Number of values at edge of each axis used to calculate the statistic value. ((before_1, after_1), … (before_N, after_N)) unique statistic lengths for each axis. ((before, after),) yields same before and after statistic lengths for each axis. (stat_length,) or int is a shortcut for before = after = statistic length for all axes. Default is None, to use the entire axis.

stat_length: Sequence or integer, optional. Options that can be specified with maximum, mean, median, and minimum. The number of values at the end of each dimension is used to calculate the statistics. In ((before_1, after_1),… (before_N, after_N)), the statistical width is specified individually for each dimension. ((before, after),) uses the same stats for each dimension. (stat_length,) or an integer is a shortcut for using the before = after statistic width for all dimensions. The default is None, which is used for all dimensions.

It's long ... but if you read it properly, you'll understand what it means. In `maximum`,` mean`, `median`, and` minimum`, each statistic value is taken and padded with that value, but by specifying this keyword argument, the size of the vector for which statistic is taken (statistical width and selfishness) You can specify (called to).

pad_example.py


print(np.pad(x_2d, 1, "maximum", stat_length=2))
pad_2d_maximum_stat_length=2.png As you can see by comparing it with the output result of `maximum`, the size of the vector that takes the maximum value is $ 2 $ instead of $ 3 $ (whole). pad_2d_maximum_stat_length=2.png
Description of the keyword argument `constant_values`

constant_values: sequence or scalar, optional Used in ‘constant’. The values to set the padded values for each axis. ((before_1, after_1), ... (before_N, after_N)) unique pad constants for each axis. ((before, after),) yields same before and after constants for each axis. (constant,) or constant is a shortcut for before = after = constant for all axes. Default is 0.

constant_values: Sequence or real number, optional. This option can be specified with constant. You can set the padding value for each dimension. ((before_1, after_1), ... (before_N, after_N)) sets constants for padding individually for each dimension. ((before, after),) sets the same padding constants for each dimension. The (constant,) or constant is a shortcut that applies a constant before = after to all dimensions. The default is $ 0 $.

I think this is easy to understand. `Constant` fills $ 0 $, but you can freely set the value to fill.

pad_example.py


print(np.pad(x_2d, 1, "constant", constant_values=(-1, -2),))
pad_2d_constant_constant_values=((-1, -2),).png
Explanation of keyword argument ʻend_values`

end_values: sequence or scalar, optional Used in ‘linear_ramp’. The values used for the ending value of the linear_ramp and that will form the edge of the padded array. ((before_1, after_1), ... (before_N, after_N)) unique end values for each axis. ((before, after),) yields same before and after end values for each axis. (constant,) or constant is a shortcut for before = after = constant for all axes. Default is 0.

ʻEnd_values: Sequence or real number, optional. This option can be specified with linear_ramp. Sets the last value in the linear_ramp function and fills the end value with the specified value. ((before_1, after_1), ... (before_N, after_N))sets each dimension individually. ((before, after),)sets the same for each dimension. A(constant,)or constant is a shortcut that applies a value ofbefore = after` to all dimensions.

I think this is also easy to understand. With `linear_ramp`, you can change the end value in various ways other than $ 0 $.

pad_example.py


print(np.pad(x_2d, 3, "linear_ramp", end_values=((-1, -2), (-3, -4))))
pad_2d_linear_ramp_end_value.png Something like this~
Explanation of keyword argument `reflect_type`

reflect_type: {‘even’, ‘odd’}, optional Used in ‘reflect’, and ‘symmetric’. The ‘even’ style is the default with an unaltered reflection around the edge value. For the ‘odd’ style, the extended part of the array is created by subtracting the reflected values from two times the edge value.

reflect_type: ʻeven or ʻodd, optional. Options that can be specified with reflect and symmetric. ʻEven is the default style, with an invariant reflection around the edge value. In the ʻodd style, the value of the padding part of the array is determined by subtracting the reflected value from twice the value at the end.

Hmm ... well, let's do it.

pad_example.py


print(np.pad(x_2d, 2, "reflect", reflect_type="odd"))
pad_2d_reflect_odd.png You can see this result by comparing it with the explanation. pad_2d_reflect_odd.png Focusing on the light blue block , it is the same as ʻeven` (default) in that it is centered on $ 1 $, but the padding value is completely different. Let's calculate according to the explanation. The description says, "It is determined by subtracting the reflected value from twice the edge value", so from $ 2 $, which is twice the edge value $ 1 $, the reflected value $ 7,4 $ If you subtract, it will be $ -5, -2 $, which matches the output image! The same is true for red blocks . If you subtract the reflected value of $ 2, 1 $ from $ 6 $, which is twice the edge value of $ 3 $, you get $ 4, 5 $.
By the way, the return value of the `numpy.pad` function is a new array, not a view.

pad_example.py


print(np.pad(x_2d, 2, "constant").base)
#The output will be None.

The base attribute returns None if the array is original (no memory is shared), otherwise it returns the value of the array.

Operation with ʻim2col`

By the way, this article thoroughly explains the ʻim2col` function, but the code that appears here

im2col.py


    pad_zero = (0, 0)
    
    O_h = int(np.ceil((I_h - F_h + 2*pad_ud)/stride_ud) + 1)
    O_w = int(np.ceil((I_w - F_w + 2*pad_lr)/stride_lr) + 1)
    
    pad_ud = int(np.ceil(pad_ud))
    pad_lr = int(np.ceil(pad_lr))
    pad_ud = (pad_ud, pad_ud)
    pad_lr = (pad_lr, pad_lr)
    images = np.pad(images, [pad_zero, pad_zero, pad_ud, pad_lr], \
                    "constant")

There is a part called. You already know what the pad function here is doing. Since the 1st and 2nd dimensions are pad_zero, there is no padding, and the 3rd and 4th dimensions are padded only with pad_ud and pad_lr, respectively. The whole bundle doesn't have to be a tuple type, isn't it? The 1st and 2nd dimensions are batches and the number of channels, and the 3rd and 4th dimensions are image data, so you can understand that only the area around the image is padded.

in conclusion

The pad function is deep ...

reference

-[Meaning of rank of matrix (equivalent definition of 8 ways)](https://mathtrain.jp/matrixrank#:~:text=%E3%83%A9%E3%83%B3%E3%82%AF % EF% BC% 88% E9% 9A% 8E% E6% 95% B0% EF% BC% 8Crank% EF% BC% 89% E3% 81% A8,% E3% 82% 92% E5% 8F% 82% E7% 85% A7% E3% 81% 97% E3% 81% A6% E4% B8% 8B% E3% 81% 95% E3% 81% 84% EF% BC% 89% E3% 80% 82) -How to use the linalg.matrix_rank function to find the rank with NumPy

Deep learning series

-Introduction to Deep Learning ~ Basics ~ -Introduction to Deep Learning ~ Coding Preparation ~ -Introduction to Deep Learning ~ Forward Propagation ~ -Introduction to Deep Learning ~ Backpropagation ~ -List of activation functions (2020) -Thorough understanding of im2col -Complete understanding of numpy.pad function