I tried adding post-increment to CPython Implementation

Link list

All three times plus extra edition. Overview and summary of adding post-increment to CPython Implementation of CPython with post-increment List of all changes when adding post-increment to CPython Extra edition of adding post-increment to CPython

Implementation

It is finally the stage to actually implement it.

Build

environment

Download Python 3.5.0

Download Python 3.5.0 (latest version as of October 22, 2015 at the start of this experiment) from the Official Website and place it in an appropriate directory.

Build

$ cd Python-3.5.0/
$ CFLAGS="-O0" ./configure --prefix=$HOME/local

$ ./configure      
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
...
creating Modules/Setup.local
creating Makefile

$ make -j 8 && make install
gcc -pthread -c -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes    -Werror=declaration-after-statement   -I. -IInclude -I./Include    -DPy_BUILD_CORE -o Programs/python.o ./Programs/python.c
gcc -pthread -c -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes    -Werror=declaration-after-statement   -I. -IInclude -I./Include    -DPy_BUILD_CORE -o Parser/acceler.o Parser/acceler.c
...

$ ./local/bin/python3
Python 3.5.0 (default, Nov 12 2015, 16:48:32)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

...

There was no particular error and it didn't stop. In Building and Using a Debug Version of Python-Python Extension Patterns 0.1.0 documentation,

$ ../configure CFLAGS='-DPy_DEBUG -DPy_TRACE_REFS' --with-pydebug

It seems that various information is output so that it is easier to debug if you add options like, and I found that such macros are defined in the source code, but I did not understand how to use it after all.

Tools used to mess with

While writing this, I found a page called Technology for reading source code.

Fumbling!

Cut into tokens

For now, follow the movement with gdb to see how Python works. Start gdb as M-x gud-gdb in Emacs and read the following Python script to check the behavior.

test.py


a = 0
a += 1
a++

If you try to do this obediently

  File "test.py", line 3
    a++
      ^
SyntaxError: invalid syntax

Is displayed.

Therefore, set a breakpoint in the main function and follow the operation.

The main function is in Programs / python.c. At the beginning, the variable ʻargv_copy` continues to be like this, but since it is only processing the arguments, if you skip it, from the 69th line

Programs/python.c


res = Py_Main(argc, argv_copy);
    for (i = 0; i < argc; i++) {
            PyMem_RawFree(argv_copy2[i]);
    }
    PyMem_RawFree(argv_copy);
    PyMem_RawFree(argv_copy2);
    return res;

Can be found. Obviously suspicious, so let's look inside the Py_Main function. It is in Modules / main.c.

If you follow it, you will find sts = run_file (fp, filename, & cf); on line 768, so when you step in, this time on line 318, run = PyRun_AnyFileExFlags (fp, filename_str, filename! = NULL, p_cf ); Is found.

If you follow the movement using the name as a clue in this way,

Python/pythonrun.c, 68: PyRun_AnyFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags) Python/pythonrun.c, 339: PyRun_SimpleFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags) PyRun_FileExFlags(fp, filename, Py_file_input, d, d, closeit, flags); Python/pythonrun.c, 396: PyRun_FileExFlags(fp, filename, Py_file_input, d, d, closeit, flags);

If you follow PyRun_FileExFlags, you will find mod = PyParser_ASTFromStringObject (str, filename, start, flags, arena); in it (line 916 of Python / pythonrun.c).

In addition to having "Parser" in the function name, NULL is assigned to mod here, and the if statement immediately after it does go to exit;. It's ugly. Look inside.

If you follow Python / pythonrun.c: PyParser_ASTFromStringObject Parser / parstok.c: PyParser_ParseFileObjectParser / parstok.c: parsetok, you will find the following part from the 201st line.

    for (;;) {
            char *a, *b;
        int type;
            size_t len;
            char *str;
        int col_offset;

            type = PyTokenizer_Get(tok, &a, &b);
        if (type == ERRORTOKEN) {
                    err_ret->error = tok->done;
                    break;
            }

        ...

        len = b - a; /* XXX this may compute NULL - NULL */
            str = (char *) PyObject_MALLOC(len + 1);
        if (str == NULL) {
         
...

Since it is an infinite loop, I guess that it is processing the input from one end, and when I turn the loop while displaying the value of str as a trial, str is "a", "=", "0" in order. , “”, “A”, “+ =”, “1”, “”, “a”, “+”, “+”, “”, and the return value indicating an error on line 263 PyParser_AddToken You can see that it is returning.

As soon as I entered PyParser_AddToken, I found the line ʻilabel = classify (ps, type, str);, and after that, it seems that I am jumping to the place to handle errors according to the value of ʻilabel.

Inside classify


static int
classify(parser_state *ps, int type, const char *str)
{
    grammar *g = ps->p_grammar;
    int n = g->g_ll.ll_nlabels;

    ...

    {
        label *l = g->g_ll.ll_label;
            int i;
            for (i = n; i > 0; i--, l++) {
                    if (l->lb_type == type && l->lb_str == NULL) {
                        D(printf("It's a token we know\n"));
                        return n - i;
                    }
            }
    }

    D(printf("Illegal token\n"));
    return -1;
}

It is like that.

A quick glance, it seems that there is a list of tokens, from which you are looking for a matching token.

So, first, improve to read the increment "++" as one token. If I can somehow add "++" to this list, I think I can clear it.

Going back, from type = PyTokenizer_Get (tok, & a, & b); in the parsetok function, Parser / tokenizer.c: int PyTokenizer_Get (struct tok_state * tok, char ** p_start, char ** p_end) → It comes with Parser / tokenizer.c: static int tok_get (struct tok_state * tok, char ** p_start, char ** p_end).

If you skip the part that seems to separate line breaks and numbers with reference to the comment, functions such as PyToken_TwoChars and PyToken_ThreeChars are called in the block starting with the comment / * Check for two-character token * / You can see that it has been released. Looking inside these guys

Parser/tokenizer.c


    switch (c1) {
    case '=':
                switch (c2) {
                    case '=':               return EQEQUAL;
                }
                break;

The switch statement continues like this, and if you add it here, it will recognize ++ as a token.

First, try adding ʻINCREMENT to the end of ʻInclude / token.h where this returning ʻEQEQUAL` is defined.

Include/token.h


#define ERRORTOKEN    56
#define N_TOKENS    57

#ifdef DOSS_INCREMENT
#define INCREMENT 58
#endif

Below, the changes are shown by enclosing them in #ifdef DOSS_INCREMENT and #endif.

Furthermore, if you try grep“ EQEQUAL ”-r -I, there is an array of token names in Parser / tokenizer.c, so modify it as follows.

Parser/tokenizer.c


/* Token names */

const char *_PyParser_TokenNames[] = {
 ...
    "<ERRORTOKEN>",
    "<N_TOKENS>"
    #ifdef DOSS_INCREMENT
    ,"INCREMENT"
    #endif
 };

Parser/tokenizer.c



int
PyToken_TwoChars(int c1, int c2)
{
    switch (c1) {
        ...
    case '+':
            switch (c2) {
            case '=':               return PLUSEQUAL;
            #ifdef DOSS_INCREMENT
            case '+':               return INCREMENT;
            #endif
        }
        break;
    
(Omitted)

Add the defined ʻINCREMENT` to the location of the switch statement.

If you do this and remake it and chase it with gdb, you can see that ++ is certainly read as one token. In addition, in order to enable #ifdef DOSS_INCREMNET ~ #endif, -DDOSS_INCREMENT = 1 was added to CFLAGS and make was performed.

Of course, this change alone still causes the PyParser_AddToken function to return an error return value.

Big favorite Grammar

By the way, when I googled with python grammar, it turned out that a file called Grammar existed. The site is here. It says "the full Python grammar, as it is read by the parser generator and used to parse Python source files". I have to read it.

When you open it, you will find a grammar definition that is very similar to a regular expression. It seems to be an extended BNF notation. I couldn't believe that the grammar was written in a file that wasn't in C, so I deleted this file and made it. Then I couldn't build. Apparently it is really used. When I looked near the power **, which is similar to increment, there was something close, so I decided to imitate and change it.

Grammar


#######################################
# #ifdef DOSS_INCREMENT
# Tips: We should not use Japanese here!!
#######################################
#factor: ('+'|'-'|'~') factor | power
#power: atom_expr ['**' factor]
# factor: power
factor: ['+'|'-'|'~'] inc ['**' factor]
inc: atom_expr ['++']
#######################################

The first two lines,

Grammar


factor: ('+'|'-'|'~') factor | power
power: atom_expr ['**' factor]

Originally, the two lines below,

Grammar


factor: ['+'|'-'|'~'] inc ['**' factor]
inc: atom_expr ['++']

Is after the change. Please note that the comment symbol is #. As long as increments are implemented, I thought that it would be bad if multiple codes were added, so I made it possible to add at most one sign. Also, the precedence of ++ operators is higher than **.

This Grammar file will help you a lot later. A rough route for the future was written in the commented out part. The first is the following comment at the end of the sentence.

Grammar


# The reason that keywords are test nodes instead of NAME is that using NAME
# results in an ambiguity. ast.c makes sure it's a NAME.
# "test '=' test" is really "keyword '=' test", but we have no such token.
# These need to be in a single rule to avoid grammar that is ambiguous
# to our LL(1) parser. Even though 'test' includes '*expr' in star_expr,
# we explicitly match '*' here, too, to give it proper precedence.
# Illegal combinations and orderings are blocked in ast.c:
# multiple (test comp_for) arguements are blocked; keyword unpackings
# that precede iterable unpackings are blocked; etc.

It's long, so in summary, it seems that you should also play with ast.c. Also, at the beginning of Grammar, there is something like this.

Grammar


# NOTE WELL: You should also follow all the steps listed at
# https://docs.python.org/devguide/grammar.html

It also introduces a useful site. This site, the official Python page, has instructions for adding new grammar. However, it's very simple.

Abstract syntax tree appeared

I decided to rewrite ast.c as I was told in Grammar. It seems that it is making a tree structure, but seeing that the return value is an int type, it seems that it is just checking for syntax errors. The tree structure may have already been made. I'm not sure, so I'll imitate other code and implement something for incrementing in an atmosphere. There is a knack for simply imitating here. I changed it in Grammar because it is factor and power, so I know that I should probably change or delete these two and type inc. After that, we will investigate what kind of behavior is around +,-, ~ of the same unary operator as ++. See here for specific changes.

Let's make immediately.

In file included from Python/ast.c:7:0:
Python/ast.c: In function ‘ast_for_inc’:
Python/ast.c:2492:22: error: ‘UInc’ undeclared (first use in this function)
       return UnaryOp(UInc, expression, LINENO(n), n->n_col_offset, c->c_arena);
                      ^
Include/Python-ast.h:503:49: note: in definition of macro ‘UnaryOp’
 #define UnaryOp(a0, a1, a2, a3, a4) _Py_UnaryOp(a0, a1, a2, a3, a4)
                                                 ^
Python/ast.c:2492:22: note: each undeclared identifier is reported only once for each function it appears in
       return UnaryOp(UInc, expression, LINENO(n), n->n_col_offset, c->c_arena);
                      ^
Include/Python-ast.h:503:49: note: in definition of macro ‘UnaryOp’
 #define UnaryOp(a0, a1, a2, a3, a4) _Py_UnaryOp(a0, a1, a2, a3, a4)
                                                 ^
make: *** [Python/ast.o]Error 1

I'm getting an error in ast.c. Certainly, I don't remember defining UInc. After this, I will take a look at the useful site that I saw before in Tekito. If you excerpt a part of the checklist,

  • Grammar/Grammar: OK, you’d probably worked this one out :)
  • Parser/Python.asdl may need changes to match the Grammar. Run make to regenerate Include/Python-ast.h and Python/Python-ast.c.
  • Python/ast.c will need changes to create the AST objects involved with the Grammar change.
  • Parser/pgen needs to be rerun to regenerate Include/graminit.h and Python/graminit.c. (make should handle this for you.)
  • Python/symtable.c: This handles the symbol collection pass that happens immediately before the compilation pass.
  • Python/compile.c: You will need to create or modify the compiler_* functions to generate opcodes for your productions.

I did it at the top. The second one doesn't make much sense. Do you need it? ast.c has been rewritten. I look at pgen and symbol, but I can't find anything to change. I will skip it for the time being. There seems to be something like compile.c. Compile with Python, which should be an interpreted language ...? I found a useful site when I googled it. It seems to be how to make a compiler. It looks like it's really compiling. You can also see bytecode. As a result of various investigations, it seems that there is a Python Virtual Machine, and it seems to be bytecode for it. It's getting lower layer.

Rewriting compile.c

Certainly compile.c has

compile.c


/*
 * This file compiles an abstract syntax tree (AST) into Python bytecode.
 *

What is written. Search for words such as Unary, UAdd, and power, and naturally add increments to fit around.

compile.c


    case UNARY_POSITIVE:
    case UNARY_NEGATIVE:
    case UNARY_NOT:
    case UNARY_INVERT:
#ifdef DOSS_INCREMENT
    case UNARY_INCREMENT:
#endif
        return 0;

compile.c


    case UAdd:
        return UNARY_POSITIVE;
    case USub:
        return UNARY_NEGATIVE;
#ifdef DOSS_INCREMENT
    case UInc:
        return UNARY_INCREMENT;
#endif

I changed it like this. By the way, how to use macros in compile.c is very interesting, so if you can afford it, I will explain it in hoge. Now let's make.

In file included from Python/ast.c:7:0:
Python/ast.c: In function ‘ast_for_inc’:
Python/ast.c:2492:22: error: ‘UInc’ undeclared (first use in this function)
       return UnaryOp(UInc, expression, LINENO(n), n->n_col_offset, c->c_arena);
                      ^
Include/Python-ast.h:503:49: note: in definition of macro ‘UnaryOp’
 #define UnaryOp(a0, a1, a2, a3, a4) _Py_UnaryOp(a0, a1, a2, a3, a4)
                                                 ^
Python/ast.c:2492:22: note: each undeclared identifier is reported only once for each function it appears in
       return UnaryOp(UInc, expression, LINENO(n), n->n_col_offset, c->c_arena);
                      ^
Include/Python-ast.h:503:49: note: in definition of macro ‘UnaryOp’
 #define UnaryOp(a0, a1, a2, a3, a4) _Py_UnaryOp(a0, a1, a2, a3, a4)
                                                 ^
Python/compile.c: In function ‘PyCompile_OpcodeStackEffect’:
Python/compile.c:877:7: error: ‘UNARY_INCREMENT’ undeclared (first use in this function)
  case UNARY_INCREMENT:
       ^
Python/compile.c:877:7: note: each undeclared identifier is reported only once for each function it appears in
Python/compile.c: In function ‘unaryop’:
Python/compile.c:2755:7: error: ‘UInc’ undeclared (first use in this function)
  case UInc:
       ^
Python/compile.c:2756:10: error: ‘UNARY_INCREMENT’ undeclared (first use in this function)
   return UNARY_INCREMENT;
          ^
make: *** [Python/ast.o]Error 1
make: ***Waiting for an incomplete job....
make: *** [Python/compile.o]Error 1

I was angry that UInc and UNARY_INCREMENT were not defined. Certainly I don't have a defined memory.

Define UNARY_INCREMENT

Let's solve it from UNARY_INCREMENT first. To find out where to define

$ grep UNARY_INCREMENT . -r -I

Then, ./Include/opcode.h:#define UNARY_POSITIVE 10 was found. Let's add it to ad hoc.

opcode.h


#define UNARY_POSITIVE           10
#define UNARY_NEGATIVE           11
#define UNARY_NOT                12
#ifdef DOSS_INCREMENT
#define UNARY_INCREMENT          13
#endif
#define UNARY_INVERT             15

Define UInc

Next, in order to find the file that should define UInc, first check how the unary operator (sign) + is called in ast.c.

ast.c


case PLUS:
            return UnaryOp(UAdd, expression, LINENO(n), n->n_col_offset,
                           c->c_arena);

It looks like this. When I jumped to the definition source of ʻUnaryOp ()` with a symbolic link, it was in Python-ast.h. Basically, when you find a new source code, there is an explanation about the source code at the beginning or the end, so look at it without exception.

Python-ast.h


/* File automatically generated by Parser/asdl_c.py. */

This is written on the first line. This file itself seems to be automatically generated. In asdl_c.py, the code that seems to be automatically generated is written. Let's take a look at the actual automatic generation site. The command executed when make is done is in the Makefile.

$ grep asdl_c.py Makefile
ASDLGEN_FILES=    $(srcdir)/Parser/asdl.py $(srcdir)/Parser/asdl_c.py
ASDLGEN=    python3 $(srcdir)/Parser/asdl_c.py

It is certainly used. Next time,

$ grep ASDLGEN Makefile
ASDLGEN_FILES=    $(srcdir)/Parser/asdl.py $(srcdir)/Parser/asdl_c.py
ASDLGEN=    python3 $(srcdir)/Parser/asdl_c.py
$(AST_H): $(AST_ASDL) $(ASDLGEN_FILES)
    $(ASDLGEN) -h $(AST_H_DIR) $(AST_ASDL)
$(AST_C): $(AST_H) $(AST_ASDL) $(ASDLGEN_FILES)
    $(ASDLGEN) -c $(AST_C_DIR) $(AST_ASDL)

View the production site as. It seems that it is AST_ASDL that actually drives the generation, so

$ grep AST_ASDL Makefile
AST_ASDL=    $(srcdir)/Parser/Python.asdl
$(AST_H): $(AST_ASDL) $(ASDLGEN_FILES)
    $(ASDLGEN) -h $(AST_H_DIR) $(AST_ASDL)
$(AST_C): $(AST_H) $(AST_ASDL) $(ASDLGEN_FILES)
    $(ASDLGEN) -c $(AST_C_DIR) $(AST_ASDL)

Then you will find Python.asdl. Let's take a look. If you look at it, you will see words like those you saw in Grammar. If you search for words such as power and UAdd, you will find the next line.

Python.asdl


    unaryop = Invert | Not | UAdd | USub

It seems to specify the type of unary operator. So let's add UInc to this.

Python.asdl


    -- #ifdef DOSS_INCREMENT
    -- unaryop = Invert | Not | UAdd | USub
    unaryop = Invert | Not | UAdd | USub | UInc
    -- #endif

The comment seems to be--. Let's make. I was able to build it! However, I feel like I glanced at an error ... So run it with make | grep error.

/Modules/parsermodule.c: In function ‘validate_power’:
/Modules/parsermodule.c:2501:37: error: ‘power’ undeclared (first use in this function)
     int res = (validate_ntype(tree, power) && (nch >= 1)
                                     ^
/Modules/parsermodule.c:2501:37: note: each undeclared identifier is reported only once for each function it appears in
/Modules/parsermodule.c: In function ‘validate_node’:
/Modules/parsermodule.c:3390:16: error: ‘power’ undeclared (first use in this function)
           case power:
                ^

It looks like parsermodule.c also needs to be modified.

Changes to parsermodule.c

There are a lot of functions named validate, and it seems that they are checking whether the number and types of grammar arguments defined in Grammar are correct. I made a new inc with Grammar and turned off the power, so I have to make the corresponding changes here as well. There are many changes, so please check here.

If you change it so far, both make and make install will pass, so try running it for the time being.

$ ./python3
Python 3.5.0 (default, Nov 16 2015, 17:03:57)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> i=0
>>> i++
XXX lineno: 1, opcode: 101
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: unknown opcode
>>> +++++1
  File "<stdin>", line 1
    +++++1
     ^
SyntaxError: invalid syntax

How ʻi ++is now "grammatically" acceptable. However, it was played if such an opcode was not registered at the time of execution. Certainly not registered. On the other hand,++++ + 1` became a Syntax Error. It's an acceptable expression in Python, but it's forbidden in Grammar to match the increment implementation. Since it is a grammar error, the grammar rewriting is successful.

Implementation of "contents"

All that remains is the "contents" of ++. The policy is to find out where the unknown opcode is issued and define the increment behavior in the definition source. Keep track of it using gdb. At this time, it was set as run test.py, but the contents of test.py are as follows.

i=0
i++

The error message seems to be issued by pythonrun.c: 401, PyErr_Print ().

pythonrun.c


    } else {
        /* When running from stdin, leave __main__.__loader__ alone */
        if (strcmp(filename, "<stdin>") != 0 &&
            set_main_loader(d, filename, "SourceFileLoader") < 0) {
            fprintf(stderr, "python: failed to set __main__.__loader__\n");
            ret = -1;
            goto done;
        }
        v = PyRun_FileExFlags(fp, filename, Py_file_input, d, d,
                              closeit, flags);
    }
    flush_io();
    if (v == NULL) {
        PyErr_Print();
        goto done;
    }
    Py_DECREF(v);

Since v == NULL, an error message is displayed, so go into PyRun_FileExFlags (). The return value of this function is pythonrun.c: 961, run_mod (), and it goes through ... Then you arrive at ceval.c, and you can see that the error message is set as PyErr_SetString (PyExc_SystemError," unknown opcode "); on line 3429. The explanation of ceval.c is very simple,

ceval.c


/* Execute compiled code */

It has become. The heart of the Python Virtual Machine, the ALU of the VM. So it seems that you can implement increment by playing around with it. However, I couldn't seem to get a clue even if I looked at it casually.

A little backtrack

Even so, I defined the opcode in opcode.h, but wondering what it means to be unknown, and open opcode.h again. First line

opcode.h


/* Auto-generated by Tools/scripts/generate_opcode_h.py */

What was written. It didn't make sense to change it directly. Search for Makefile.

$ grep opcode.h Makefile -3
PGENOBJS=    $(POBJS) $(PGOBJS)

##########################################################################
# opcode.h generation
OPCODE_H_DIR=     $(srcdir)/Include
OPCODE_H_SCRIPT= $(srcdir)/Tools/scripts/generate_opcode_h.py
OPCODE_H=    $(OPCODE_H_DIR)/opcode.h
OPCODE_H_GEN=    python3  $(OPCODE_H_SCRIPT) $(srcdir)/Lib/opcode.py $(OPCODE_H)
#
##########################################################################

It seems that opcode.py is generated. I added it like this.

opcode.py


def_op('NOP', 9)
def_op('UNARY_POSITIVE', 10)
def_op('UNARY_NEGATIVE', 11)
def_op('UNARY_NOT', 12)
#ifdef DOSS_INCREMENT
def_op('UNARY_INCREMENT',13)
#endif
def_op('UNARY_INVERT', 15)

Missing function

make further evolved the error message.

In file included from Python/ceval.c:891:0:
Python/ceval.c: In function ‘PyEval_EvalFrameEx’:
Python/opcode_targets.h:15:5: error: label ‘TARGET_UNARY_INCREMENT’ used but not defined
     &&TARGET_UNARY_INCREMENT,
     ^
gcc -pthread -c -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -O0 -g -DDOSS_INCREMENT=1   -Werror=declaration-after-statement   -I. -IInclude -I./Include    -DPy_BUILD_CORE -o Python/future.o Python/future.c
make: *** [Python/ceval.o]Error 1
make: ***Waiting for an incomplete job....

As per the error, TARGET_UNARY_INCREMENT was automatically generated in opcode_targets.h. The problem seems to be that the definition is not in ceval.c. But when I search for ceval.c with TARGET_UNARY_POSITIVE, there is no match.

$ grep TARGET_UNARY_POSITIVE . -r -I
./Python/opcode_targets.h:    &&TARGET_UNARY_POSITIVE,

Only used here ...! This is strange. If you narrow down the conditions and search for ʻUNARY_POSITIVE` in ceval.c

ceval.c


            TARGET(UNARY_POSITIVE) {
                PyObject *value = TOP();
                PyObject *res = PyNumber_Positive(value);
                Py_DECREF(value);
                SET_TOP(res);
                if (res == NULL)
                    goto error;
                DISPATCH();

Such a thing was a hit. This is not a function ... It's a macro! You can find the definition of the evil macro near the beginning. Such a wording.

ceval.c


/* Computed GOTOs, or
       the-optimization-commonly-but-improperly-known-as-"threaded code"
   using gcc's labels-as-values extension
   (http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html).

As far as I read the explanation, it seems that it is faster to make full use of the goto statement using macros than to classify opcodes with switch statements. A glimpse of the ingenuity for speeding up the interpreter language. This significantly reduces the readability of the code.

So create TARGET (UNARY_INCREMENT).

ceval.c


    #ifdef DOSS_INCREMENT
            TARGET(UNARY_INCREMENT) {
                PyObject *left = TOP();
                PyObject *inv, *sum;
                // note that -(~x) == x+1 for all x
                inv = PyNumber_Invert(left);
                //Py_DECREF(left);
                if (inv == NULL)
                    goto error;
                sum = PyNumber_Negative(inv);
                Py_DECREF(inv);
                if (sum == NULL)
                    goto error;

                SET_TOP(sum);

                DISPATCH();
            }
    #endif

TOP and SET_TOP seem to be access to the stack. Also, it seems that the Py_DECREF () macro is often called, but it is probably garbage collection.

Cowardly definition

You will see PyNumber_Invert () and PyNumber_Negative () above. Certainly, in order to implement increment legitimately, it may be better to create a function called PyNumber_Increment (). Or maybe I should have created an object that represents 1 and realized it with an addition instruction. However, I went to find out the definition source of the function and tried various things, but it was complicated and unclear, and there was a time constraint, so I confess that I decided to make good use of the existing function.

-(~x) == x+1 for all x

Using the fact, we succeeded in generating x + 1 from x with only existing functions. If you make and move this,

$ ./python3
Python 3.5.0 (default, Nov 16 2015, 18:50:21)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> i=0
>>> print(i++)
1
>>> i
0

I recognized it and started to move. The value is now returned properly for ʻi ++. However, the value of ʻi was not rewritten. Up until now, we have defined increments "naturally", but it seems that variable rewriting was leaked somewhere due to its naturalness.

eval and exec

I found the following meaningful comment in parsermodule.c:

parsermodule.c


/*  There are two types of intermediate objects we're interested in:
 *  'eval' and 'exec' types.  These constants can be used in the st_type
 *  field of the object type to identify which any given object represents.
 *  These should probably go in an external header to allow other extensions
 *  to use them, but then, we really should be using C++ too.  ;-)
 */

That is, it seems that there are two types, eval and exec. This is represented by ʻi + 1 and ʻi + = 1. The problem now is that ++ i has both of these properties.

Variable rewriting is done with ʻi + = 1`. If you look at how this is defined in Grammar, it's an Aug Assign, which is in a completely different location from the + and-that you've seen so far. From the grammar stage, it existed in a place completely different from the variable rewriting part. Looking back with the debugger, the STORE instruction was executed in ceval.c in AugAssign. Due to time constraints, I would like to avoid reviewing from the grammar stage, and above all, if I focus on + =, the variables may be rewritten but no value may be returned.

overview Again

Here is the flow of Python.

  1. Divide the input string into tokens with the tokenizer
  2. Create a concrete syntax tree according to Grammar
  3. Create an abstract syntax tree with ast.c
  4. Convert the abstract syntax tree to bytecode with compile.c
  5. Execute bytecode on the VM with ceval.c

Last claw

Since eval and exec implement increments that are contrary to the Python design concept, they cannot be implemented neatly. However, if you choose to change one of the above options, the most orthodox thing seems to be rewriting compile.c. Until now, it only output opcodes that explicitly return a value of +1 for the syntax tree for increments. On the other hand, it is considered that the desired operation can be obtained by adding the STORE instruction.

From the notation ʻi ++, the operand of ++ should be i. I was able to execute this by copying the STORE instruction in the location corresponding to AugAssign, performing the + 1` operation, and then adding it.

Recommended Posts

I tried adding post-increment to CPython Implementation
I tried adding post-increment to CPython Extra edition
I tried adding post-increment to CPython. Overview and summary
I tried adding post-increment to CPython. List of all changes
I tried to create a linebot (implementation)
I tried to paste
I tried adding VPS to ConoHa ~ SSH connection
I tried to learn PredNet
I tried to organize SVM.
I tried to implement PCANet
I tried to reintroduce Linux
I tried to introduce Pylint
I tried to summarize SparseMatrix
I tried to touch jupyter
I tried to implement StarGAN (1)
I tried to implement Deep VQE
I tried to create Quip API
I tried to implement adversarial validation
I tried to explain Pytorch dataset
I tried Watson Speech to Text
I tried to touch Tesla's API
I tried to implement hierarchical clustering
I tried to organize about MCMC.
I tried to implement Realness GAN
I tried to move the ball
I tried to estimate the interval.
I tried to summarize the frequently used implementation method of pytest-mock
I tried to summarize Python exception handling
I tried to implement PLSA in Python
I tried using Azure Speech to Text.
I tried to implement Autoencoder with TensorFlow
I tried to summarize the umask command
I tried to implement permutation in Python
I tried to create a linebot (preparation)
I tried to visualize AutoEncoder with TensorFlow
I tried to recognize the wake word
I tried to get started with Hy
I tried to implement PLSA in Python 2
Python3 standard input I tried to summarize
I tried to classify text using TensorFlow
I tried to summarize the graphical modeling.
I tried to implement ADALINE in Python
I tried to let optuna solve Sudoku
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried to implement PPO in Python
I tried to implement CVAE with PyTorch
I tried to make a Web API
I tried to solve TSP with QAOA
[Python] I tried to calculate TF-IDF steadily
I tried to touch Python (basic syntax)
I tried my best to return to Lasso
I tried to summarize Ansible modules-Linux edition
I tried to predict Covid-19 using Darts
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.
I tried to build a super-resolution method / ESPCN
I tried to program bubble sort by language
I tried web scraping to analyze the lyrics.
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to use lightGBM, xgboost with Boruta