[Python] Isn't it copied even though "copy ()" is done? Beliefs and failures about deep copying

Posted: 2020/9/13

Introduction

This article is about reference assignment, shallow copy, and deep copy. There are already several articles, but this article includes the situation when I noticed the mistake and other new findings during my research.

I didn't understand shallow and deep copies and thought that reference assignment = shallow copy, .copy () = deep copy. However, after investigating this failure, I found that there are three types of substitution.

3 substitutions

Reference assignment

a = [1,2]
b = a
b[0] = 100
print(a)  # [100, 2]
print(b)  # [100, 2]

Then, if you rewrite b, ʻa will also be rewritten. This is ** reference assignment ** [^ d1]. Since ʻa and b refer to the same object, if you rewrite one, it looks like the other is also rewritten.

Let's check the object ID with ʻid () `.

print(id(a))  # 2639401210440
print(id(b))  # 2639401210440
print(a is b)  # True

The id is the same. You can see that ʻa and b` are the same. [^ d1]: I wrote it as a reference assignment, but I couldn't find such a phrase on the internet in Python. "Passing by reference" is a term used in function arguments, and I couldn't find any other good way to say it, so I decided to assign it by reference.

Shallow copy

So what if you want to treat b as a separate object from ʻa? I usually use .copy ()`.

a = [1,2]
b = a.copy()  #Shallow copy
b[0] = 100
print(a, id(a))  # [1, 2] 1566893363784
print(b, id(b))  # [100, 2] 1566893364296
print(a is b)  # False

ʻAandb` are properly separated. This is a ** shallow copy **. There are other ways to make a shallow copy.

A shallow copy of the list


#A shallow copy of the list
import copy
a = [1,2]

b = a.copy()  # .copy()Example of using
b = copy.copy(a)  #Example of using copy module
b = a[:]  #Example of using slices
b = [x for x in a]  #Example of using list comprehension
b = list(a)  # list()Example of using

A shallow copy of the dictionary


#A shallow copy of the dictionary
import copy
a = {"hoge":1, "piyo":2}

b = a.copy()  # .copy()Example of using
b = copy.copy(a)  #Example of using copy module
b = dict(a.items())  # items()An example of using what was retrieved in

Deep copy?

Now let's do a ** deep copy **.

import copy

a = [1,2]
b = copy.deepcopy(a)  #Deep copy
b[0] = 100
print(a, id(a))  # [1, 2] 2401980169416
print(b, id(b))  # [100, 2] 2401977616520
print(a is b)  # False

The result is the same as a shallow copy.

"Copy ()" but not copied

But what about the following example?

a = [[1,2], [3,4]]  #Change
b = a.copy()
b[0][0] = 100
print(a)  # [[100, 2], [3, 4]]
print(b)  # [[100, 2], [3, 4]]

I made ʻa on the first line a two-dimensional list. I should have made a copy, but ʻa has also been rewritten. What is the difference from the previous example?

Mutable object

Python has mutable (modifiable) objects and immutable (immutable) objects. When classified,

Mutable: list, dict, numpy.ndarray [^ m1], etc. Immutable: int, str, tuple, etc.

It feels like [^ m2] [^ m3]. In the above example, the list is put in the list. In other words, I put a mutable object inside a mutable object. Now let's check if they are the same object.

print(a is b, id(a), id(b))
# False 2460506990792 2460504457096

print(a[0] is b[0], id(a[0]), id(b[0]))
# True 2460503003720 2460503003720

The outer list ʻa b is different, but the list ʻa [0] b [0] in it is the same. In other words, if you rewrite b [0], a [0] will also be rewritten.

So this behavior is due to using a shallow copy even though the object contains a mutable object [^ m4]. And ** deep copy ** is used in such a case.

[^ m1]: It seems that ndarray can be immutable (https://note.nkmk.me/python-numpy-ndarray-immutable-read-only/) [^ m2]: https://hibiki-press.tech/python/data_type/1763 (Major built-in mutable, immutable, iterable) [^ m3]: https://gammasoft.jp/blog/python-built-in-types/ (Python built-in data type classification table (mutable, etc.))

[^ m4]: The Python documentation states that "composite objects (including other objects such as lists and class instances" Object) "is written. So to be precise, it's because of a shallow copy of the composite object. As I wrote in another section, the same thing happens even if I put the list inside the immutable tuple.

Solution-Deep copy

1. Use a deep copy

Use ** deep copy **.

import copy

a = [[1,2], [3,4]]
b = copy.deepcopy(a)
b[0][0] = 100
print(a)  # [[1, 2], [3, 4]]
print(b)  # [[100, 2], [3, 4]]

Let's see if they are the same object.

print(a is b, id(a), id(b))
# False 2197556646152 2197556610760

print(a[0] is b[0], id(a[0]), id(b[0]))
# False 2197556617864 2197556805320

print(a[0][1] is b[0][1], id(a[0][1]), id(b[0][1]))
# True 140736164557088 140736164557088

That is the same at the end. b [0] [1] is an immutable object ʻint`, and there is no problem because another object is automatically created when reassigning [^ k1].

Other than that, the mutable objects have different ids, so you can see that they have been copied.

[^ k1]: https://atsuoishimoto.hatenablog.com/entry/20110414/1302750443 (Mystery of is operator) Python seems to use immutable objects to reduce memory.

2. Use numpy.ndarray

It's a little different from the content this time, and it's more difficult, so I brought it down. See the section "[Solution 2 Make it numpy.ndarray](# Solution 2 Make it numpyndarray)".

Code when I noticed a mistake

I'll post almost the same code as when I noticed the mistake. I created the following data.

import numpy as np

a = {"data":[
        {"name": "img_0.jpg ", "size":"100x200", "img": np.zeros((100,200))},
        {"name": "img_1.jpg ", "size":"100x100", "img": np.zeros((100,100))},
        {"name": "img_2.jpg ", "size":"150x100", "img": np.zeros((150,100))}],
    "total_size": 5000
}

In this way, I created data with nested mutable objects, like a list in a dictionary, a dictionary in it, and an image (ndarray). And I created another dictionary for json export, omitting only ʻimg` from this.

After that, when I tried to retrieve ʻimgfrom the original dictionary, I got aKeyError`. I was wondering why I should have copied it for a while, and realized that the references to the objects in the dictionary could be the same.

#The code that caused the problem
data = a["data"].copy()  #This is wrong
for i in range(len(data)):
    del data[i]["img"]  #Remove img from dictionary
b = {"data":data, "total_size":a["total_size"]}  #New dictionary

img_0 = a["data"][0]["img"]  #KeyError even though I haven't touched a
# KeyError: 'img'

The easiest solution is to change to a deep copy like data = copy.deepcopy (a ["data "]), but in this case you have to copy the image you want to erase later. It may affect memory and execution speed.

Therefore, instead of erasing unnecessary data from the original data, I think it is better to write in the form of extracting the necessary data.

#Code rewritten to retrieve necessary data
data = []
for d in a["data"]:
    new_dict = {}
    for k in d.keys():
        if(k=="img"):  #Do not include only img
            continue
        new_dict[k] = d[k]  #Note that it is not a copy
    data.append(new_dict)
b = {"data":data, "total_size":a["total_size"]}  #New dictionary

img_0 = a["data"][0]["img"]  #Operate

I used it to export the copied data in json format, so the above code is fine, but if I want to rewrite the copied data I have to use deepcopy (if it contains mutable objects) ).

Shallow copy and deep copy

As you can see from the above example

Shallow copy: Only the target object Deep copy: Target object + All mutable objects contained in the target object

Will be copied. For more information, see Python documentation (copy) (https://docs.python.org/ja/3/library/copy.html). I think it's a good idea to read it once.

Execution speed verification

We created a dictionary ʻa` containing text and tested the execution speed of shallow and deep copies.

import copy
import time
import numpy as np

def test1(a):
    start = time.time()
    # b = a
    # b = a.copy()
    # b = copy.copy(a)
    b = copy.deepcopy(a)
    process_time = time.time()-start
    return process_time

a = {i:"hogehoge"*100 for i in range(10000)}
res = []
for i in range(100):
    res.append(test1(a))
print(np.average(res)*1000, np.min(res)*1000, np.max(res)*1000)

result

processing average(ms) minimum(ms) maximum(ms)
b=a 0.0 0.0 0.0
a.copy() 0.240 0.0 1.00
copy.copy(a) 0.230 0.0 1.00
copy.deepcopy(a) 118 78.0 414

It's a proper verification, so it's not very reliable, but you can see that the difference between shallow copy and deep copy is large. Therefore, it seems better to use the data that is not rewritten depending on the data to be used and the method of use, such as using a shallow copy.

Other verification, etc.

Copy of homebrew class

import copy
class Hoge:
    def __init__(self):
        self.a = [1,2,3]
        self.b = 3

hoge = Hoge()
# hoge_copy = hoge.copy() #Error because there is no copy method
hoge_copy = copy.copy(hoge)  #Shallow copy
hoge_copy.a[1] = 10000
hoge_copy.b = 100
print(hoge.a)  # [1, 10000, 3](Rewritten)
print(hoge.b)  #3 (not rewritten)

Even for your own class, if the member variable is a mutable object, a shallow copy is not enough.

Copy of tuple

Even if you say a tuple, it is a case where you put a mutable object in the tuple.

import copy
a = ([1,2],[3,4])
b = copy.copy(a)  #Shallow copy
print(a)  # ([1, 2], [3, 4])
b[0][0] = 100  #This can be done
print(a)  # ([100, 2], [3, 4])(Rewritten)
b[0] = [100,2]  #Cannot be rewritten with Type Error

Since tuples are immutable, their values cannot be rewritten, but the mutable objects contained in tuples can be rewritten. In this case as well, the shallow copy does not copy the objects inside.

About .copy () in the list

I was wondering what the process of copying the list b = a.copy () is, so I took a look at the Python source code.

cpytnon / Objects / listobject.c Line 812 (quoted from the master branch as of 9/11/2020) Source Link (Position may have changed)

/*[clinic input]
list.copy
Return a shallow copy of the list.
[clinic start generated code]*/

static PyObject *
list_copy_impl(PyListObject *self)
/*[clinic end generated code: output=ec6b72d6209d418e input=6453ab159e84771f]*/
{
    return list_slice(self, 0, Py_SIZE(self));
}

As you can see in the comments

Return a shallow copy of the list.

And it is written that it is a shallow copy. The implementation below it is also written as list_slice, so it seems that it's just sliced likeb = a [0: len (a)].

Solution # 2-numpy.ndarray

It's a little different from this story, but if you're dealing with multidimensional arrays, you can also use NumPy's ndarray instead of a list. However, be careful.

import numpy as np
import copy

a = [[1,2],[3,4]]
a = np.array(a)  #Convert to ndarray
b = copy.copy(a)
# b = a.copy()  #This is also possible
b[0][0] = 100
print(a)
# [[1 2]
# [3 4]]
print(b)
# [[100   2]
# [  3   4]]

As you can see, using copy.copy () or .copy () is fine, but using slices will rewrite the original array, just like a list. This is due to the difference between NumPy copy and view.

Reference: https://deepage.net/features/numpy-copyview.html (An easy-to-understand explanation of NumPy copy and view)

#When using slices
import numpy as np

a = [[1,2], [3,4]]
a = np.array(a)
b = a[:]  #slice(=Create view)
# b = a[:,:]  #This is the same
# b = a.view()  #Same as this
b[0][0] = 100
print(a)
# [[100   2]
# [  3   4]]
print(b)
# [[100   2]
# [  3   4]]

Also, in this case, the comparison result by ʻis` will not be the same as the list.

import numpy as np

def check(a, b):
    print(id(a[0]), id(b[0]))
    print(a[0] is b[0], id(a[0])==id(b[0]))

#When slicing the list
a = [[1,2],[3,4]]
b = a[:]
check(a,b)
# 1778721130184 1778721130184
# True True

#When ndarray is sliced (view is created)
a = np.array([[1,2],[3,4]])
b = a[:]
check(a,b)
# 1778722507712 1778722507712
# False True

As you can see from the last line, the id is the same, but the comparison result by is is False. Therefore, it is necessary to check the identity of the object with the ʻis` operator, and even if it is False, it may be rewritten.

When I look up the ʻis` operator, it says that it returns whether the ids are the same or not ^ n1 ^ n2 [^ n3], but this is not the case. Is numpy special? It is not well understood.

The execution environment is Python 3.7.4 & numpy1.16.5 + mkl.

[^ n3]: https://docs.python.org/ja/3/reference/expressions.html#is (6.10.3. Identity comparison) The official reference also says "Object identity is the id () function. It is judged using. "

in conclusion

I've never had a problem with .copy () so far, so I didn't care about copying at all. It's very scary to think that some of the code I've written so far may have been unexpectedly rewritten.

The problem with mutable objects in Python also talks about function default arguments. If you do not know this, you will generate unintended data without noticing it, so please check it if you do not know it.   http://amacbee.hatenablog.com/entry/2016/12/07/004510 (Python passing by value and passing by reference) https://qiita.com/yuku_t/items/fd517a4c3d13f6f3de40 (Default argument value should be immutable)

#It is better not to specify a mutable object as the default argument

def hoge(data=[1,2]): #bad example
def hoge(data=None): #Good example 1
def hoge(data=(1,2)): #Good example 2
#This also happens
a = [[0,1]] * 3
print(a)  # [[0, 1], [0, 1], [0, 1]]
a[0][0] = 3
print(a)  # [[3, 1], [3, 1], [3, 1]](Allhavebeenrewritten)

reference

[1] https://qiita.com/Kaz_K/items/a3d619b9e670e689b6db (About Python copy and deepcopy) [2] https://www.python.ambitious-engineer.com/archives/661 (copy module shallow copy and deep copy) [3] https://snowtree-injune.com/2019/09/16/shallow-copy/ (Python ♪ Next, let's remember by reason "pass by reference" "shallow copy" "deep copy") [4] https://docs.python.org/ja/3/library/copy.html (copy --- shallow copy and deep copy operations)

Recommended Posts

[Python] Isn't it copied even though "copy ()" is done? Beliefs and failures about deep copying
Python shallow copy and deep copy
Python shallow and deep copy
Python # About reference and copy
[Python] Sweet Is it sweet? About suites and expressions in the official documentation
Import error even though python is installed
About the difference between "==" and "is" in python
About shallow and deep copies of Python / Ruby