Python list is not a list

Python is popular, isn't it? One of the strengths of Python is list. It's a generous list that doesn't choose the type of object you put inside, but as the title suggests, it's not exactly a ** list **. This time, I will explain about that with the CPython source code.

First of all, what is a list?

[List](https://ja.wikipedia.org/wiki/%E3%83%AA%E3%82%B9%E3%83%88_(%E6%8A%BD%E8%B1%A1%E3%] 83% 87% E3% 83% BC% E3% 82% BF% E5% 9E% 8B))) is an "ordered data container". In short, it's a collection of data in order.

I think that many people who read this article are Pythonista, so I would like to implement it in Python, but since Python can not implement a strict list, I will write a minimal list program in C ++. I don't want to talk about C ++, so I don't use templates and make a list of ints.

class List {
private:
    int base;  //Hold value
    int *next; //Address of the following value
public:
    /*
     *append and various methods
     */
}

Only this. It's easy? This is a ** unidirectional list **, and as the name implies, it is a list whose references are ** unidirectional **. On the other hand, ** bidirectional list ** is a list that has the address of the previous value and can be referred to backwards. If you want to refer to the next value, just refer to the data at the next position. In other words, it may be a little different from the list you imagine, but ** the data is not in a row in memory **.

Then what is an array?

I hope you found the list in the previous chapter. So what is an array (https://ja.wikipedia.org/wiki/%E9%85%8D%E5%88%97)? The simple answer is ** just a data string **. I will explain it because it is a little misleading and not accurate.

As explained in the previous chapter, the list was ** a column of data holding the address of the next data **. Arrays, on the other hand, are ** data arranged in memory **.

Let me give you a simple example. Suppose you have an array of alphabets starting with data ʻAat address 0 in memory. (Data size is 1 for simplicity) Then there should beB next to address 1 in memory, that is, ʻA. Next to it is C, next to it is D, and so on, the array is ** all data in a straight line in memory without exception **.

This is the difference between a list and an array. And from here let's look at Python's list implementation.

Finally read CPython

As you know, CPython is the official processing system of Python implemented in C language. The source code is currently (2020/04/10) latest. (But the built-in source code doesn't change that often, so it doesn't have to be tightly matched.)

Where is the implementation of list

Looking at the CPython source code, I think that many people are overwhelmed by the various directories. The basic object definition is in ʻInclude / cpython / xxxobject.h`, so let's take a look there. Then, I think there is such a definition.

Include/cpython/listobject.h


typedef struct {
    PyObject_VAR_HEAD
    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     * list.sort() temporarily sets allocated to -1 to detect mutations.
     *
     * Items must normally not be NULL, except during construction when
     * the list is not yet visible outside the function that builds it.
     */
    Py_ssize_t allocated;
} PyListObject;

You need to know PyObject to read the Python source code. A PyObject is the underlying object for all objects, and all Python objects are provided as extensions to this object. So this code

"Hmm? I'm saying a list, but I don't have the address for the next element!" That's right.

** Python lists are arrays. ** ** Say again ** Python lists are arrays. ** **

"But what makes arrays and lists so happy and sad after all?" So, from here, I will explain not the technical difference but how it will change concretely.

Random access speed

Random access, as the name implies, means accessing random locations in a data structure. Here, we will introduce the difference in random access between arrays and lists.

Random access to the array

If you want to do random access on an array, just add the position from the beginning of the element you want to access to the beginning address. In the previous alphabet example, there is ʻA at address 0, so to access the 6th G, just add 6 to the position of ʻA, that is, 0. If the name of the alphabet array is alphabet, such access can be written in C language as follows.

char *alphabet = {'A', 'B', 'C', ... , 'X', 'Y', 'Z'};
char a = *(alphabet + 0); //A position is 0
char g = *(alphabet + 6); //G position is 6

And this can also be written in C language.

char a = alphabet[0];
char g = alphabet[6];

Isn't it a familiar shape? Yes, it's the same subscript access as Python. The form name [i] means ** get the element ahead of ** name ʻi` **.

What is inside Python doing during random access? Let's take a look at the CPython source code again. Let's take a look at Include / cpython / listobject.h, which is the same as before.

Include/cpython/listobject.h


/* Cast argument to PyTupleObject* type. */
#define _PyList_CAST(op) (assert(PyList_Check(op)), (PyListObject *)(op))

#define PyList_GET_ITEM(op, i) (_PyList_CAST(op)->ob_item[i])
#define PyList_SET_ITEM(op, i, v) (_PyList_CAST(op)->ob_item[i] = (v))
#define PyList_GET_SIZE(op)    Py_SIZE(_PyList_CAST(op))
#define _PyList_ITEMS(op)      (_PyList_CAST(op)->ob_item)

On the 4th line, a macro called PyList_GET_ITEM is defined. Looking at the contents, (_PyList_CAST (op)-> ob_item [i]) was! Subscript access! It turns out that it is also accessed by subscripts inside Python. By the way, one line below that is a macro that assigns a value to the specified position, but it is also accessed by subscripts here as well.

Random access to the list

So what happens when you randomly access a list? The list does not have elements in a row, so you need to follow the addresses of the next elements one by one. Repeat 6 times for B after ʻA, CafterB, and so on, and you will reach G`. It's less efficient than an array by any means.

As you can see, ** lists are not good at random access, arrays are good ** data structures, and arrays are adopted in Python. (I don't know why I named it list)

Finally

I hope you've seen that the internal implementation of Python's list is actually an array, and that's what makes it grateful. If you have any questions, questions, or suggestions for mistakes, please feel free to comment. If you have any questions, you can come to Twitter. Thank you very much

Recommended Posts

Python list is not a list
Python round is not strictly round
What is a python map?
python> check NoneType or not> if a == None:> if a is None:
[Python] list
[Python] What is a zip function?
[Python] What is a with statement?
[python] Manage functions in a list
python / Make a dict from a list.
python note: when easy_install is not available
Python basics: list
[Python] Name Error: name'urlparse' is not defined
[Python] How to convert a 2D list to a 1D list
Display a list of alphabets in Python 3
Hash in Perl is a dictionary in Python
What is a dog? Python installation volume
[python] Get a list of instance variables
[python] [meta] Is the type of python a type?
What is python
Python> Comprehension / Comprehension> List comprehension
Python is instance
[Python] Get a list of folders only
Python list manipulation
About February 02, 2020 * This is a Python article.
What is Python
[Introduction to Python] What is the difference between a list and a tuple?
Why Python slicing is represented by a colon (:)
[Python] List Comprehension Various ways to create a list
How to clear tuples in a list (Python)
Python Pandas is not suitable for batch processing
Is sys.settrace, a python genius feature, another language?
[python] Create a list of various character types
Python log is not output with docker-compose up
Make a copy of the list in Python
View drug reviews using a list in Python
python memo-"if not A and B" was "if (not A) and B"
A simple to-do list created with Python + Django
Launch a shell while a Python script is running
A python amateur tries to summarize the list ②
Sort list elements in a specified order in Python
Judge whether it is a prime number [Python]
[Python] Manipulating elements in a list (array) [Sort]
python Boolean operation does not return a Boolean value
Tell me what a conformal map is, Python!
Sorted list in Python
python int is infinite
Python Exercise 2 --List Comprehension
What is a distribution?
Python> list> extend () or + =
A * algorithm (Python edition)
[Python] Take a screenshot
Create a Python module
[Python] What is Pipeline ...
Python Not Implemented Error
What is a terminal?
A python lambda expression ...
Python list comprehension speed
Filter List in Python
What is a hacker?
Daemonize a Python process
Python3 List / dictionary memo