[PYTHON] I read the implementation of range (Objects / rangeobject.c)

I've been worried about "Python-ness" for the past year, and when I realized it, I was reading the implementation of range.

Let's use xrange instead of range! It was a super derailment from the story of the Python 2 era.

About CPython

Repository: https://hg.python.org/cpython Mirror: https://github.com/python/cpython (I'm not sure about Mercurial, so I'm git all below)

https://github.com/python/cpython/tree/master/Objects There are various interesting things in the line. All of them are long and unreadable.

Source files for various builtin objects

About range

It seems that there was a story that range and xrange are different in Python2. It seems that Python 3 doesn't have to worry about it. It seems that there is something called 2to3, and there is also a program that properly converts xrange to range. https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_xrange.py

This time https://github.com/python/cpython/blob/28b093620c3ae9be02449ea0cecd8b4d649c80d2/Objects/rangeobject.c I Read. There are 1246 lines, but the difficulty level is probably low.

Let's try

3〜4: include

The header file is https://github.com/python/cpython/tree/master/Include It is in.

What to read in rangeobject.c

Two of. The array of PyMemberDef seems to define the names, types and offsets of the members of the C structure.

rangeobject

6〜19: struct rangeobject

The structure rangeobject

have. It looks like a range. PyObject_HEAD and PyObject are described in Common Object Structures for all object types. Is an extension of PyObject.

The value PY_SSIZE_T_MAX appears in the comment. Py_ssize_t is a signed integer unlike ssize_t. ssize_t is mentioned in PEP 353 --Using ssize_t as the index type | Python.org.

21〜39: validate_step(PyObject *step)

Auxiliary function for validation steps. Step 0 is not suitable as a range, so it seems to be checking it. If no step is specified, it is regarded as 1.

41〜42: compute_range_length(PyObject *start, PyObject *stop, PyObject *step)

→ Line 151

44〜64: make_range_object(PyTypeObject *type, PyObject *start, PyObject *stop, PyObject *step)

Create a rangeobject from the arguments. It is the following range_new that really creates rangeobject, and make_range_object is called if there is no problem after checking various things with range_new. Since rangeobject is an object, it is an extension of PyObject. So, after creating PyObject, give the value of range (start etc.).

What is PyObject_New (type, typeobj)? (Type *) _PyObject_New (typeobj) [indicates](https://github.com/python/cpython/blob/28b093620c3ae9be02449ea0cecd8b4d649c80d2/Include/objimpl.h#L134 -L135), which creates PyObject by allocating memory. For more information, follow object.c. Abbreviation.

66〜129: range_new(PyTypeObject *type, PyObject *args, PyObject *kw)

Make various checks properly and make rangeobject. If an inappropriate value is passed, give up creating it, allocate the allocated memory, and return NULL. There was a comment like this.

/* XXX(nnorwitz): should we error check if the user passes any empty ranges?
   range(-10)
   range(0, -5)
   range(0, 5, -1)
*/

131〜139: PyDoc

Documents for range (stop) and range (start, stop [, step]) are written using a macro called PyDoc_STRVAR. PyDoc_STRVAR

#define PyDoc_VAR(name) static char name[]

To arrive at. It's a macro for inline documents.

141〜149: range_dealloc(rangeobject *r)

Release of members of struct rangeobject → Release of structure.

151〜227: compute_range_length(PyObject *start, PyObject *stop, PyObject *step)

Calculates and returns the number of items in range (lo, hi, step). The algorithm seems to be the same as get_len_of_range (lo, hi, step) starting from line 865, and the detailed story is written there. The difference is that the argument is PyObject and that PyObject is regarded as PyLong and the story goes on.

229〜233: range_length(rangeobject *r)

Returns the member length as Py_ssize_t. The PyLong_AsSsize_t used here seems to return Py_ssize_t from an int type object.

235〜248: compute_item(rangeobject *r, PyObject *i)

Calculate the value from start, i and step, and get the next value. The comment also says return r-> start + (i * r-> step). As it is.

250〜307: compute_range_item(rangeobject *r, PyObject *arg)

Those who call the above compute_item. Before that, i of compute_item (rangeobject * r, PyObject * i) is issued from arg (actually it seems to be PyLong) and the range is checked.

309〜319: range_item(rangeobject *r, Py_ssize_t i)

The side that calls the above compute_range_item. This takes care of function arguments and references.

321〜358: compute_slice(rangeobject *r, PyObject *_slice)

Make a slice of rangeobject. It also handles failures and releases references.

360〜409: range_contains_long(rangeobject *r, PyObject *ob)

In the next range_contains, range_index, range_count

PyLong_CheckExact(ob) || PyBool_Check(ob)

Is executed with the check mark. PyLong_CheckExact and PyBool_Check /boolobject.h#L12) is also a type-checking macro with the same meaning.

As for the contents, is ob simply within the range indicated by the range object (= does it exist)?

411〜419: range_contains(rangeobject *r, PyObject *ob)

(PyLong_CheckExact(ob) || PyBool_Check(ob))Then the above range_contains_Use long, otherwise_PySequence_IterSearchMake a judgment with.

421〜465: range_equals(rangeobject *r0, rangeobject *r1)

A function that compares two rangeobjects. A comment outlines where to look in order to make a decision.

/*
   if r0 is r1:
       return True
   if len(r0) != len(r1):
       return False
   if not len(r0):
       return True
   if r0.start != r1.start:
       return False
   if len(r0) == 1:
       return True
   return r0.step == r1.step
*/

467〜495: range_richcompare(PyObject *self, PyObject *other, int op)

The argument is PyObject, but there is a check to see if other is PyRange_Type. op points to the comparison operator (https://github.com/python/cpython/blob/28b093620c3ae9be02449ea0cecd8b4d649c80d2/Include/object.h#L927), and this function targets Py_NE and Py_EQ only.

Eventually call range_equals.

497〜550: range_hash(rangeobject *r)

Hash function for rangeobject. Return it with the type Py_hash_t. The values used to get the hash are len (), start and step. The processing is changed according to the value of len (r).

/* Hash function for range objects.  Rough C equivalent of

   if not len(r):
       return hash((len(r), None, None))
   if len(r) == 1:
       return hash((len(r), r.start, None))
   return hash((len(r), r.start, r.step))
*/

552〜570: range_count(rangeobject *r, PyObject *ob)

PyLong_CheckExact(ob) || PyBool_Check(ob)At the time of range_contains_long, otherwise_PySequence_Count with IterSearch.

572〜602: range_index(rangeobject *r, PyObject *ob)

PyLong_CheckExact(ob) || PyBool_Check(ob)At the time of range_contains_long, otherwise_PySequence_Search with IterSearch. In case of range_contains_long, after checking that it exists

idx = PyNumber_FloorDivide(tmp, r->step);

The location is calculated and put out.

604〜613: range_as_sequence

Register various functions according to PySequenceMethods. here

PySequenceMethods range_as_sequence
lenfunc sq_length range_length → line 229
ssizeargfunc sq_item range_item → line 309
objobjproc sq_contains range_contains → line 411

Only are registered.

615〜633: range_repr(rangeobject *r)

Output the output string in PyUnicode. When step is 1, step is omitted.

635〜641: range_reduce(rangeobject *r, PyObject *args)

It seems to be an auxiliary function of pickle. Py_BuildValue (\ _Py_BuildValue_SizeT Give the information of struct rangeobject to Python / modsupport.c # L441-L450)).

643〜662: range_subscript(rangeobject* self, PyObject* item)

item is If you go through PyIndex_Check (essentially int), you can use compute_range_item. If you go through PySlice_Check (essentially slice), you can use compute_slice. Returns the element. Otherwise it is an error.

665〜669: range_as_mapping

This time, it is registered according to PyMappingMethods. 2 out of 3 functions are implemented.

PyMappingMethods range_as_mapping
lenfunc mp_length range_length → line 229
binaryfunc mp_subscript range_subscript → line 643

671: range_iter(PyObject *seq)

→ Line 1076

672: range_reverse

→ Line 1134

674〜682: PyDoc

Documents for reverse, count and index.

684〜690: range_methods

PyMethodDef links functions on Python with functions implemented in C, argument information, and documentation. thing.

Here, the following functions are linked.

Built-in Implementation in C
__reversed__ range_reverse →1134
__reduce__ range_reduce →635
count range_count →552
index range_index →572

692〜697: range_members

PyMemberDef represents the name, type and offset of the members of the C structure. Here, it is an array of start, stop, and step information.

699〜738: PyRange_Type

A type definition called PyRange_Type. There are various setting items, but in the case of range, the following information is registered There is.

PyTypeObject PyRange_Type
const char *tp_name "range"
Py_ssize_t tp_basicsize sizeof(rangeobject)
destructor tp_dealloc (destructor)range_dealloc →141
reprfunc tp_repr (reprfunc)range_repr →615
PySequenceMethods *tp_as_sequence &range_as_sequence →604
PyMappingMethods *tp_as_mapping &range_as_mapping →665
hashfunc tp_hash (hashfunc)range_hash →497
getattrofunc tp_getattro PyObject_GenericGetAttr
unsigned long tp_flags Py_TPFLAGS_DEFAULT
const char *tp_doc range_doc
richcmpfunc tp_richcompare range_richcompare →467
getiterfunc tp_iter range_iter →1076
struct PyMethodDef *tp_methods range_methods →684
struct PyMemberDef *tp_members range_members →692
newfunc tp_new range_new →66

By the way, PyVarObject_HEAD_INIT seems to be related to defining the initial segment of PyObject.

rangeiterobject

740〜753: struct rangeiterobject

The definition is similar to rangeobject, but slightly different.

755〜764: rangeiter_next(rangeiterobject *r)

Get the next value by adding index * step to start. It is being cast to unsigned and returned to avoid overflow.

766〜770: rangeiter_len(rangeiterobject *r)

Calculate the length with len --index.

772〜773: PyDoc

length_hint_doc. It seems to be a private method that gives an estimate of the length of the list.

775〜802: rangeiter_reduce(rangeiterobject *r)

Function for pickle. It takes more time than range object because there is conversion to PyLong.

804〜817: rangeiter_setstate(rangeiterobject *r, PyObject *state)

Set the value in index.

819〜820: PyDoc

doc for reduce and setstate. Both seem to be pickle related.

822〜830: rangeiter_methods

Built-in Implementation in C
__length_hint__ rangeiter_len →766
__reduce__ rangeiter_reduce →775
__setstate__ rangeiter_setstate → 804

832〜863: PyRangeIter_Type

PyTypeObject PyRangeIter_Type
const char *tp_name "range_iterator"
Py_ssize_t tp_basicsize sizeof(rangeiterobject)
destructor tp_dealloc (destructor)PyObject_Del
getattrofunc tp_getattro PyObject_GenericGetAttr
unsigned long tp_flags Py_TPFLAGS_DEFAULT
getiterfunc tp_iter PyObject_SelfIter
iternextfunc tp_iternext (iternextfunc)rangeiter_next →755
struct PyMethodDef *tp_methods rangeiter_methods →822

I wonder why there are PyObject_Del and PyObject_DEL ...

865〜891: get_len_of_range(long lo, long hi, long step)

Returns the number of items in range (lo, hi, step).

    /* -------------------------------------------------------------
    If step > 0 and lo >= hi, or step < 0 and lo <= hi, the range is empty.
    Else for step > 0, if n values are in the range, the last one is
    lo + (n-1)*step, which must be <= hi-1.  Rearranging,
    n <= (hi - lo - 1)/step + 1, so taking the floor of the RHS gives
    the proper value.  Since lo < hi in this case, hi-lo-1 >= 0, so
    the RHS is non-negative and so truncation is the same as the
    floor.  Letting M be the largest positive long, the worst case
    for the RHS numerator is hi=M, lo=-M-1, and then
    hi-lo-1 = M-(-M-1)-1 = 2*M.  Therefore unsigned long has enough
    precision to compute the RHS exactly.  The analysis for step < 0
    is similar.
    ---------------------------------------------------------------*/

The comment description is longer than the code. The point is just dividing the section by the number of steps.

893〜915: fast_range_iter(long start, long stop, long step)

Make a rangeiter. If it is longer than C long type, NULL is returned as an error.

longrangeiterobject

917〜923: struct longrangeiterobject

This is PyObject.

925〜929: longrangeiter_len(longrangeiterobject *r, PyObject *no_args)

Use PyNumber_Subtract to subtract the index from len.

931〜958: longrangeiter_reduce(longrangeiterobject *r)

For pickle. Calculate stop and calculate the value by creating a range object with make_range_object.

960〜987: longrangeiter_setstate(longrangeiterobject *r, PyObject *state)

Put state in index.

989〜997: longrangeiter_methods

Built-in Implementation in C
__length_hint__ longrangeiter_len →925
__reduce__ longrangeiter_reduce →931
__setstate__ longrangeiter_setstate → 960

999〜1007: longrangeiter_dealloc(longrangeiterobject *r)

References are deleted in the order of struct member → longrangeiterobject.

1009〜1041: longrangeiter_next(longrangeiterobject *r)

After all, it is calculated by adding step * index to start.

1043〜1074: PyLongRangeIter_Type

PyTypeObject PyRangeIter_Type
const char *tp_name "longrange_iterator"
Py_ssize_t tp_basicsize sizeof(longrangeiterobject)
destructor tp_dealloc (destructor)longrangeiter_dealloc →999
getattrofunc tp_getattro PyObject_GenericGetAttr
unsigned long tp_flags Py_TPFLAGS_DEFAULT
getiterfunc tp_iter PyObject_SelfIter
iternextfunc tp_iternext (iternextfunc)longrangeiter_next →1009
struct PyMethodDef *tp_methods longrangeiter_methods →989

iter generation

1076〜1132: range_iter(PyObject *seq)

There is a pattern to make a rangeiter from rangeobject seq and a pattern to make a longrangeiter. The boundary is whether start, stop, and range can be represented by C long type.

1134〜1246: range_reverse(PyObject *seq)

This is also checked to see if it fits in a long. Since it is reverse, step also needs to check -step. The rest is generated by replacing start and stop well.

Impressions

It was a long time, but I feel like I somehow understood the atmosphere of CPython. I read a decent C language for the first time and learned it. (I just did Shoboi C as a procedural language at university)

So, after all, what is it like Python? The feeling of Python is difficult.

Recommended Posts

I read the implementation of range (Objects / rangeobject.c)
I read the implementation of golang channel
Read the implementation of ARM global timer
I read and implemented the Variants of UKR
I followed the implementation of the du command (first half)
I followed the implementation of the du command (second half)
I read the SHAP paper
I investigated the mechanism of flask-login!
I want to read the html version of "OpenCV-Python Tutorials" OpenCV 3.1 version
[Python] I thoroughly explained the theory and implementation of logistic regression
[Python] I thoroughly explained the theory and implementation of decision trees
I made AI think about the lyrics of Kenshi Yonezu (implementation)
I tried to summarize the frequently used implementation method of pytest-mock
I checked the options of copyMakeBorder of OpenCV
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
I summarized the folder structure of Flask
I didn't know the basics of Python
I see, I will read the UNIX process
The Python project template I think of.
Read all the contents of proc / [pid]
Othello-From the tic-tac-toe of "Implementation Deep Learning" (2)
[Python] I thoroughly explained the theory and implementation of support vector machine (SVM)
[Python] Read the source code of Bottle Part 2
I tried cluster analysis of the weather map
Set the range of active strips to the preview range
I solved the deepest problem of Hiroshi Yuki.
Why the Python implementation of ISUCON 5 used Bottle
Organize the meaning of methods, classes and objects
Specifying the range of ruby and python arrays
I checked the list of shortcut keys of Jupyter
I tried to touch the API of ebay
I tried to correct the keystone of the image
Read the output of subprocess.Popen in real time
[Python] Read the source code of Bottle Part 1
Try the free version of Progate [Python I]
I checked the session retention period of django
I checked the processing speed of numpy one-dimensionalization
I touched some of the new features of Python 3.8 ①
I want to customize the appearance of zabbix
I tried using the image filter of OpenCV
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to sort out the objects from the image of the steak set meal-④ Clustering
I took a look at the contents of sklearn (scikit-learn) (1) ~ What about the implementation of CountVectorizer? ~
[Trainer's Recipe] I touched the flame of the Python framework.
Template of python script to read the contents of the file
I want to grep the execution result of strace
Play with the UI implementation of Pythonista3 [Super Super Introduction]
I tried to summarize the basic form of GPLVM
I tried the MNIST tutorial for beginners of tensorflow.
The End of Catastrophic Programming # 04 "Mistaken Exception Catch Range"
I compared the identity of the images by Hu moment
Maybe I overestimated the impact of ShellShock on CGI
Visualize the range of interpolation and extrapolation with python
A python implementation of the Bayesian linear regression class
About testing in the implementation of machine learning models
I checked the output specifications of PyTorch's Bidirectional LSTM
Summarize the knowledge of reading Go's HTTP implementation ~ Slice ~
I checked out the versions of Blender and Python
I want to fully understand the basics of Bokeh
The performance of PHP was better than I expected