[PYTHON] "Deep copy" and "Shallow copy" to understand with the smallest example

Deep copy / shallow copy ...

A word that often causes confusion.

Here, I would like to see how the program behaves differently for each copy method, using the smallest example in which the difference between each copy method is apparent.

Disclaimer

It is a disclaimer. If you don't mind, skip to the "What are you doing?" Chapter.

I use the word "shared delivery"

In this article, we will use "shared passing" to unify the names of the methods called "pass-by-sharing", "call-by-sharing", "shared passing", and "reference passing by value".

Please note that the author does not recommend this term. For convenience of comparing multi-language specifications, we've just given the names used in this article.

I will also use the word "○○ pass" in the case of substitution

In this article, the terms "pass by reference" and "pass by share" are also used to describe "methods for variable assignment."

Normally, it is a word used when passing arguments to a function, but since the same concept holds true for assignments, we will use the word as it is.

(This is my personal opinion, but I'm not sure about the merit of using the word "passing XX" only when passing arguments to a function. I think it's okay to use it for assignment, so I'll do that in this article. I will.)

Those that do not generate a new entity will not be called "shallow copies"

When I search, I get a lot of articles saying "Shallow copy is a method of copying only a reference or pointer and not creating a new entity".

In this article, the case where "a new entity is created, but somewhere in its contents refers to the same entity as before copying" is called "shallow copy".

English Wikipedia and Official Python documentation But that's the case.

(This is my personal opinion, but at least this case of "creating a new entity but referring to the copy source" should be called "shallow copy", so "shallow copy is new". I think the explanation "things that do not create an entity" is incorrect. There is no contradiction if "things that do not create a new entity ** are also called ** shallow copies", but I think that is also an error. )

What are you doing?

Substitute a "copy in some sense" of the variable ʻa to the variable b, play with the b, and then check the contents of the ʻa.

At that time, not only two types of copying methods

--By reference assignment --Shared assignment --Substitute a shallow copy --Substitute deep copy

Let's compare the four.

The meaning of each term will be explained together when the code is explained below.

What to do with the smallest example

assignment.dart


void main() {
  var a = [
    ["did deep copy"]
  ];

  //Here, the process of substituting a copy of a in some sense for b

  b[0][0] = "did shallow copy";
  b[0] = ["did pass-by-sharing"];
  b = [
    ["did pass-by-reference"]
  ];
  print(a[0][0]);
}

I just wanted to introduce the processing content, and I didn't want to talk about it depending on a specific language, so I wrote it in Dart, which seems to be used by few people. (Recently increased ...?)

The processing content is

  1. Assign a list of strings to the variable ʻa`.
  2. Substitute a copy of ʻa in some sense for the variable b`.
  3. Substitute a different character string for the 0th element of the 0th element of b.
  4. Substitute a list of different strings for the 0th element of b.
  5. Substitute a list of different strings for b.
  6. Output the 0th element of the 0th element of ʻa`.

Let's see why this makes a difference.

When passing by reference

If the assignment b = a is passed by reference, then b will have a "reference" that points to ʻa, and all subsequent processing on b will also affect ʻa. I will.

It is easier to understand the movement if you think of ʻaas being given the aliasb`.

In this case, the behavior of the code above is as follows.

assignment.dart


void main() {
  var a = [
    ["did deep copy"]
  ];

  //Now pass a by reference to b

  //All of the following processing is equivalent to that performed for a
  b[0][0] = "did shallow copy";
  b[0] = ["did pass-by-sharing"];
  b = [
    ["did pass-by-reference"]
  ];
  print(a[0][0]); // did pass-by-reference
}

Last

  b = [
    ["did pass-by-reference"]
  ];

The content of ʻa` is a list of lists with this string.

Therefore, the output will be did pass-by-reference.

By the way, this string means "passed by reference".

In case of shared delivery

Strictly speaking, shared delivery is not just a term that refers to how to pass.

In a shared language, the content of a variable is not the value itself, but a reference to that value. (This is for many languages such as Java, Python, JavaScript.)

Then, when b = a is set, the" reference "stored in ʻa is copied and stored in b` as well.

This is why shared passing is sometimes referred to as "passing by reference".

In this case, the above code behaves as follows.

assignment.dart


void main() {
  //a has a reference to this double array
  var a = [
    ["did deep copy"]
  ];

  //Here, share a to b

  //This process affects a because a and b have references that point to the same entity.
  b[0][0] = "did shallow copy";

  //This process also affects a
  b[0] = ["did pass-by-sharing"];

  //Here, b stores a new reference pointing to the new entity, so this process does not affect a.
  b = [
    ["did pass-by-reference"]
  ];
  print(a[0][0]); // did pass-by-sharing
}

Last

  b = [
    ["did pass-by-reference"]
  ];

Has assigned " [[" did pass-by-reference "]], a reference that points to a different entity, to b, so this process affects ʻa`. No.

Therefore, the output will be did pass-by-sharing.

By the way, this string means "shared and passed".

For shallow copy

We continue to talk about languages where the content of a variable is not the value itself, but a reference to that value. (Java, Python, JavaScript, etc.)

In languages where the content of the variable is the value itself (such as C ++), there is no shallow copy. (I think it can be reproduced using pointers and passing by reference)

Shallow copy, as the name implies, makes a copy.

If the content is "reference", copy it as it is.

Let's see what that means in the code.

assignment.dart


void main() {
  //a has a reference to this double array
  var a = [
    ["did deep copy"]
  ];

  //Now make a shallow copy of a and assign it to b

  //a and b have references that point to different entities.
  //However, both entities have the same "reference" in their 0th element, so
  //This process affects a
  b[0][0] = "did shallow copy";

  //The 0th element of b is rewritten as a reference that points to something different.
  //This process does not affect a.
  b[0] = ["did pass-by-sharing"];

  //Here, b stores a new reference pointing to the new entity, so this process also does not affect a.
  b = [
    ["did pass-by-reference"]
  ];
  print(a[0][0]); // did shallow copy
}

ʻA and its shallow copy are different in substance. If you output ʻid in Python and hashcode in Dart and check it, you can see that ʻaandb` point to different entities.

However, the same contents are copied.

For that matter, a "reference" pointing to the same thing has been copied.

So, if you rewrite the contents of the "reference destination of contents", both ʻaandb` will be affected.

It is this.

  b[0][0] = "did shallow copy";

On the other hand, if you rewrite the "contents of b "itself, it will not affect ʻa`.

that is

  b[0] = ["did pass-by-sharing"];

is. This process does not affect ʻa. The result is a did shallow copy`.

For deep copy

Like shallow copy, deep copy makes a copy, but

Examine the referenced value in the contents and copy it as well.

If it is also a "reference", check the value of the reference and copy it as well.

Repeat this until you have copied all the references.

ʻA and b` no longer share anything.

Operations performed on one do not affect the other at all.

assignment.dart


void main() {
  //a has a reference to this double array
  var a = [
    ["did deep copy"]
  ];

  //Now pass a deep copy of a to b. After that, the operation to b does not affect a.

  b[0][0] = "did shallow copy";
  b[0] = ["did pass-by-sharing"];
  b = [
    ["did pass-by-reference"]
  ];
  print(a[0][0]); // did deep copy
}

Since nothing has changed in ʻa, the initial value did deep copy` is output.

Let's see with an example

Now let's see how they are actually assigned in some languages! !! !!

JavaScript

JavaScript will pass shared when assigned normally.

assignment.js



a = [["did deep copy"]];

b = a;

b[0][0] = "did shallow copy";
b[0] = ["did pass-by-sharing"];
b = [["did  pass-by-reference"]];
console.log(a[0][0]); // did pass-by-sharing

If you don't want to share the entity, if it's an array, you can use slice to make a copy and have it.

assignment.js


a = [["did deep copy"]];

b = a.slice(0, a.length);

b[0][0] = "did shallow copy";
b[0] = ["did pass-by-sharing"];
b = [["did pass-by-reference"]];
console.log(a[0][0]); // did shallow copy

The copy in this case is a shallow copy. If you want to make a deep copy, you need to devise.

Even if it is an object instead of an array, it is shared by assigning it normally. If you want to copy it, you can do as follows.

assignment.js


a = { x: { y: "did deep copy" } };

b = Object.assign({}, a); //Here b=If a is set, it will be shared

b.x.y = "did shallow copy";
b.x = { y: "did pass-by-sharing" };
b = { x: { y: "did pass-by-reference" } };
console.log(a.x.y); // did shallow copy

Python

Next is Python.

assignment.py


import copy

a = [['did deep copy']]

b = a

b[0][0] = 'did shallow copy'
b[0] = ['did pass-by-sharing']
b = [['did pass-by-reference']]
print(a[0][0]) # did pass-by-sharing

Even in Python, if you assign it normally, it will be shared.

Python has a module called copy that allows you to explicitly make shallow and deep copies.

Click here for documentation copy --- shallow copy and deep copy operations

You can make a shallow copy with copy.copy.

assignment.py


import copy

a = [['did deep copy']]

b = copy.copy(a)

b[0][0] = 'did shallow copy'
b[0] = ['did pass-by-sharing']
b = [['did pass-by-reference']]
print(a[0][0])  # did shallow copy

You can make a deep copy with copy.deepcopy.

assignment.py


import copy

a = [['did deep copy']]

b = copy.deepcopy(a)

b[0][0] = 'did shallow copy'
b[0] = ['did pass-by-sharing']
b = [['did pass-by-reference']]
print(a[0][0])  # did deep copy

Objects can be copied in this module as well. It's convenient.

Dart

assignment.dart


void main() {
  var a = [
    ["did deep copy"]
  ];

  var b = a;

  b[0][0] = "did shallow copy";
  b[0] = ["did pass-by-sharing"];
  b = [
    ["did pass-by-reference"]
  ];
  print(a[0][0]); // did pass-by-sharing
}

If you substitute Dart normally, it will be shared.

C++

C ++ behaves very differently from the languages so far.

C ++ is not shared when assigned normally.

In C ++, the content of a variable is not a "reference" but a "value itself".

Since it is copied as it is, there is no relationship with the copy source at that point.

No matter how you copy and create it, it doesn't affect the copy source.

It behaves exactly like ** deep copy **.

assignment.cpp


#include <iostream>
#include <vector>
using namespace std;

int main()
{
    vector<vector<string>> a{vector<string>{"did deep copy"}};

    vector<vector<string>> b = a;

    b[0][0] = "did shallow copy";
    b[0] = vector<string>{"did pass-by-sharing"};
    b = vector<vector<string>>{vector<string>{"did pass-by-reference"}};
    cout << a[0][0] << endl; // did deep copy
}

You can also pass by reference in C ++.

In the case of passing by reference, all operations performed by copying and creating will affect the copy source.

assignment.cpp


#include <iostream>
#include <vector>
using namespace std;

int main()
{
    vector<vector<string>> a{vector<string>{"did deep copy"}};

    vector<vector<string>> &b = a;

    b[0][0] = "did shallow copy";
    b[0] = vector<string>{"did pass-by-sharing"};
    b = vector<vector<string>>{vector<string>{"did pass-by-reference"}};
    cout << a[0][0] << endl; // did pass-by-reference
}

end

As we have seen so far, if you write this process, you can clarify what is being assigned at the time of assignment.

If you know that, you should be less likely to be bothered by unexpected behavior.

Also, although it is a similar article, there is an article that promotes understanding by comparing the behavior when passing arguments to the function and the behavior when assigning, so please read it if you like.

Understand passing by value / shared / by reference with the smallest example (5 lines)

If you make a mistake in this article, I would be very grateful if you could point it out! Thank you.

Recommended Posts

"Deep copy" and "Shallow copy" to understand with the smallest example
Python shallow copy and deep copy
Python shallow and deep copy
Artificial intelligence, machine learning, deep learning to implement and understand
[pyqtgraph] Add region to the graph and link it with the graph region
I captured the Touhou Project with Deep Learning ... I wanted to.
Repeat with While. Scripts to Tweet and search from the terminal
[Statistics] Visualize and understand the Hamiltonian Monte Carlo method with animation.
[Required subject DI] Implement and understand the mechanism of DI with Go
The strongest way to use MeCab and CaboCha with Google Colab
To improve the reusability and maintainability of workflows created with Luigi
Try to implement and understand the segment tree step by step (python)
Put Cabocha 0.68 on Windows and try to analyze the dependency with Python
Specify the browser to use with Jupyter Notebook. Especially Mac. (And Vivaldi)
Connect to VPN with your smartphone and turn off / on the server
I tried to express sadness and joy with the stable marriage problem.
How to get the date and time difference in seconds with python
Search for Twitter keywords with tweepy and write the results to Excel
Convert the spreadsheet to CSV and upload it to Cloud Storage with Cloud Functions
I tried to learn the angle from sin and cos with chainer
How to query BigQuery with Kubeflow Pipelines and save the result and notes
I tried to control the network bandwidth and delay with the tc command
Try to separate the background and moving object of the video with OpenCV
Record the steps to understand machine learning
Introduction to Deep Learning ~ Convolution and Pooling ~
Fractal to make and play with Python
Script example to display BoundingBox with PIL
Understand the Decision Tree and classify documents
copy consumer offset to another with kafka-python
Match the colorbar to the figure with matplotlib
I really wanted to copy with selenium
The road to compiling to Python 3 with Thrift
Function to extract the maximum and minimum values ​​in a slice with Go
How to get started with the 2020 Python project (windows wsl and mac standardization)
How to pass the path to the library built with pyenv and virtualenv in PyCharm
How to get the key on Amazon S3 with Boto 3, implementation example, notes
Deep Learning from scratch The theory and implementation of deep learning learned with Python Chapter 3
Zip-compress any file with the [shell] command to create a file and delete the original file.
Specify the start and end positions of files to be included with qiitap
Build a python environment to learn the theory and implementation of deep learning
[Introduction to AWS] I tried porting the conversation app and playing with text2speech @ AWS ♪