Python data structure and internal implementation ~ List ~

Introduction

I use Qiita a lot, but this is my first post! Nice to meet you!

There are many useful articles about Python, but I have the impression that there aren't many articles that touch on the internal implementation of Python, so I'm motivated to be able to explain various data structures in conjunction with the internal implementation. This time I will write about Python list.

About this article

This is an article about how Python lists work. But it's impossible to write how all the methods in the list work, so mainly

――What kind of data structure is a list? Isn't it an array? --What is the internal implementation of the list type? --What is a variable length array? What are the rules for changing the size?

I wrote an article that can solve such questions.

Target audience

--People who have the above questions --People who want to know a little more by reading Python introductory books and tutorials --People who want to know more but are tired of reading official documentation or cpython code

data structure

Let's see what kind of data structure the Python list type is.

review

What is a list?

>>> x = [1,2,3]
>>> x.append(4)
>>> x
[1, 2, 3, 4]

This is the familiar one.

x.sort()
x.append(element)
x.clear()
x.extend(element)
x.index(element)
x.insert(element)
x.pop()
x.remove(element)
x.reverse()

There are many methods like this (miscellaneous).

Linked lists and arrays

It's a basic story, so if you know it, please skip it.

array_vs_linkedlist.png

The array is

--Secure a continuous area in memory and put elements in it --It takes O (1) to access the element and O (N) to insert or delete the element.

Linked list (unidirectional)

--One node has an element and a pointer to the next element --It takes O (N) to access the element and O (1) to insert or delete the element.

There is a feature.

data structure of list

list is a standard data structure and is one of the sequence types. Since it is named a list, some people may think that it is implemented using a linked list, but Python lists are implemented as variable length arrays (dynamic arrays). So ** Python list is an array **. The name is confusing. ..

A list is a contiguous array with references to other objects. The structure at the top of the list (PyListObject) has a pointer and length to this array. Looking at the actual cpython code (ʻInclude / cpython / listobject.h`) (comment omitted)

listobject.h


typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

It is defined like this. The list is of type PyListObject and the elements in the list are of type PyObject and are represented internally. (PyObject is the base type for all Python object types.) ʻOb_item is an array of pointers to the elements in the list, and ʻallocated is the allocated size.

Untitled Diagram (2).png

Elements can have different data types in the same list, as long as they are PyObject.

x = [1,"a",[1,2,3]]

Variable length array

I talked about Python lists being variable length arrays. Variable length arrays resize the referenced array as elements are added or removed. However, it does not change the size of the array every time. It feels good to decide when to increase the size and its size.

Resize

The process when changing the size of the array to add a new element is as follows.

array_expansion.png

If there is no free space, reserve a new space, copy all the current elements, and add a new element.

growth factor How much to grow when the array is full depends on the ** growth factor ** (about how many times the size of the existing array is multiplied). ** growth factor ** depends on the language. (For example Python is 1.125, C is 2)

For example, when the growth factor is 2, the size (capacity) will be as follows when elements are added to the array in order. DinamicArray.png

When deleting an element, the reduction is similar to the enlargement.

Computational complexity

Since the operation to expand the array copies all the elements, the amount of calculation is O (k) when the current number of elements is k.

Now consider that the complexity of list.append (element) is O (1). Conclusion Considering the amount of calculation, this is O (1).

When adding n elements to an empty array in order, if the growth factor is 2, the amount of calculation will be

\begin{align}
O(n+2+2^2+2^3+\cdots+2^{logn}) \\
= O(n+2\times2^{logn})\\
= O(n)
\end{align}

Therefore, the amount of calculation when adding n elements is O (n). Therefore, the complexity of list.append (element) is O (1).

View implementation

Python's growth factor is 1.125, but let's see how to grow it concretely. Operations related to list are described in ʻObjects / listobject.c`. The important part of the function to be resized is as follows.

listobject.c


static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
    /* This over-allocates proportional to the list size, making room
     * for additional growth.  The over-allocation is mild, but is
     * enough to give linear-time amortized behavior over a long
     * sequence of appends() in the presence of a poorly-performing
     * system realloc().
     * Add padding to make the allocated size multiple of 4.
     * The growth pattern is:  0, 4, 8, 16, 24, 32, 40, 52, 64, 76, ...
     * Note: new_allocated won't overflow because the largest possible value
     *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
     */
    new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
}
    new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;

It's hard to understand the mask calculation in the latter half, isn't it?

In short, multiply the current size by $ \ frac {9} {8} $ and add a little. Certainly the growth factor is 1.125.

Summary

--Python list is an array --list is internally of type PyListObject and has pointers to PyObjects. --Python growth factor is 1.125

References

--cpython git repository -python Japanese documentation -Expert Python Programming

Recommended Posts

Python data structure and internal implementation ~ List ~
Python internal structure
Python data structure and operation (Python learning memo ③)
[Python tutorial] Data structure
data structure python push pop
List of Python libraries for data scientists and data engineers
Python list and tuples and commas
Python list comprehensions and generators
Maxout description and implementation (Python)
[Python] Chapter 04-01 Various data structures (list creation and element retrieval)
Solve the spiral book (algorithm and data structure) with python!
Difference between list () and [] in Python
Hashing data in R and Python
Picture book data structure algorithm Python
[Python] list
Data pipeline construction with Python and Luigi
Implementation module "deque" in queue and Python
[Python] Chapter 04-03 Various data structures (multidimensional list)
Python comprehension (list and generator expressions) [additional]
[Python] Chapter 04-04 Various data structures (see list)
Difference between append and + = in Python list
PointNet theory and implementation (point cloud data)
[Python] Chapter 04-02 Various data structures (list manipulation)
Easily graph data in shell and Python
[Python Iroha] Difference between List and Tuple
Compress python data and write to sqlite
Exchange encrypted data between Python and C #
Make a decision tree from 0 with Python and understand it (4. Data structure)
Python basics: list
[Python] Precautions when retrieving data by scraping and putting it in the list
Python variables and data types learned in chemoinformatics
Data analysis python
Receive and display HTML form data in Python
[Python] Swapping rows and columns in Numpy data
[Python] How to read data from CIFAR-10 and CIFAR-100
[Python] Conversion memo between time data and numerical data
list and sum
list and numpy
Implementation of TRIE tree with Python and LOUDS
Python> Comprehension / Comprehension> List comprehension
List of Python code to move and remember
Try importing MLB data on Mac and Python
Explanation of edit distance and implementation in Python
cv2 functions and data types (OpenCV python bindings)
Python list manipulation
[python] Read data
What to use for Python stacks and queues (speed comparison of each data structure)
Full-width and half-width processing of CSV data in Python
Implemented List and Bool in Python and SQLite3 (personal note)
Merge sort implementation / complexity analysis and experiment in Python
[# 2] Make Minecraft with Python. ~ Model drawing and player implementation ~
Logical symbols learned in marriage (and implementation examples in Python)
List of Python code used in big data analysis
[Python] How to sort dict in list and instance in list
Let's distinguish between data structure manipulation and logic code.
Investigate Java and python data exchange with Apache Arrow
Overview of generalized linear models and implementation in Python
[Python] Chapter 04-05 Various data structures (tuple creation and features)
Sorted list in Python
[python] Compress and decompress
Python Exercise 2 --List Comprehension