All three times plus extra edition. Overview and summary of adding post-increment to CPython Implementation of CPython with post-increment List of all changes when adding post-increment to CPython Extra edition of adding post-increment to CPython
It is finally the stage to actually implement it.
Download Python 3.5.0 (latest version as of October 22, 2015 at the start of this experiment) from the Official Website and place it in an appropriate directory.
$ cd Python-3.5.0/
$ CFLAGS="-O0" ./configure --prefix=$HOME/local
$ ./configure
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
...
creating Modules/Setup.local
creating Makefile
$ make -j 8 && make install
gcc -pthread -c -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Werror=declaration-after-statement -I. -IInclude -I./Include -DPy_BUILD_CORE -o Programs/python.o ./Programs/python.c
gcc -pthread -c -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Werror=declaration-after-statement -I. -IInclude -I./Include -DPy_BUILD_CORE -o Parser/acceler.o Parser/acceler.c
...
$ ./local/bin/python3
Python 3.5.0 (default, Nov 12 2015, 16:48:32)
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
...
There was no particular error and it didn't stop. In Building and Using a Debug Version of Python-Python Extension Patterns 0.1.0 documentation,
$ ../configure CFLAGS='-DPy_DEBUG -DPy_TRACE_REFS' --with-pydebug
It seems that various information is output so that it is easier to debug if you add options like, and I found that such macros are defined in the source code, but I did not understand how to use it after all.
M-x gud-gdb
find . -name "hogehoge"
grep hogehoge -r . -I
ctags **/*.c **/*.h
While writing this, I found a page called Technology for reading source code.
For now, follow the movement with gdb to see how Python works.
Start gdb as M-x gud-gdb
in Emacs and read the following Python script to check the behavior.
test.py
a = 0
a += 1
a++
If you try to do this obediently
File "test.py", line 3
a++
^
SyntaxError: invalid syntax
Is displayed.
Therefore, set a breakpoint in the main function and follow the operation.
The main function is in Programs / python.c
.
At the beginning, the variable ʻargv_copy` continues to be like this, but since it is only processing the arguments, if you skip it, from the 69th line
Programs/python.c
res = Py_Main(argc, argv_copy);
for (i = 0; i < argc; i++) {
PyMem_RawFree(argv_copy2[i]);
}
PyMem_RawFree(argv_copy);
PyMem_RawFree(argv_copy2);
return res;
Can be found. Obviously suspicious, so let's look inside the Py_Main
function. It is in Modules / main.c
.
If you follow it, you will find sts = run_file (fp, filename, & cf);
on line 768, so when you step in, this time on line 318, run = PyRun_AnyFileExFlags (fp, filename_str, filename! = NULL, p_cf );
Is found.
If you follow the movement using the name as a clue in this way,
Python/pythonrun.c, 68: PyRun_AnyFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags) Python/pythonrun.c, 339: PyRun_SimpleFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags) PyRun_FileExFlags(fp, filename, Py_file_input, d, d, closeit, flags); Python/pythonrun.c, 396: PyRun_FileExFlags(fp, filename, Py_file_input, d, d, closeit, flags);
If you follow PyRun_FileExFlags, you will find mod = PyParser_ASTFromStringObject (str, filename, start, flags, arena);
in it (line 916 of Python / pythonrun.c
).
In addition to having "Parser" in the function name, NULL
is assigned to mod
here, and the if statement immediately after it does go to exit;
. It's ugly. Look inside.
If you follow Python / pythonrun.c: PyParser_ASTFromStringObject
→ Parser / parstok.c: PyParser_ParseFileObject
→ Parser / parstok.c: parsetok
, you will find the following part from the 201st line.
for (;;) {
char *a, *b;
int type;
size_t len;
char *str;
int col_offset;
type = PyTokenizer_Get(tok, &a, &b);
if (type == ERRORTOKEN) {
err_ret->error = tok->done;
break;
}
...
len = b - a; /* XXX this may compute NULL - NULL */
str = (char *) PyObject_MALLOC(len + 1);
if (str == NULL) {
...
Since it is an infinite loop, I guess that it is processing the input from one end, and when I turn the loop while displaying the value of str as a trial, str is "a", "=", "0" in order. , “”, “A”, “+ =”, “1”, “”, “a”, “+”, “+”, “”, and the return value indicating an error on line 263 PyParser_AddToken
You can see that it is returning.
As soon as I entered PyParser_AddToken
, I found the line ʻilabel = classify (ps, type, str);, and after that, it seems that I am jumping to the place to handle errors according to the value of ʻilabel
.
Inside classify
static int
classify(parser_state *ps, int type, const char *str)
{
grammar *g = ps->p_grammar;
int n = g->g_ll.ll_nlabels;
...
{
label *l = g->g_ll.ll_label;
int i;
for (i = n; i > 0; i--, l++) {
if (l->lb_type == type && l->lb_str == NULL) {
D(printf("It's a token we know\n"));
return n - i;
}
}
}
D(printf("Illegal token\n"));
return -1;
}
It is like that.
A quick glance, it seems that there is a list of tokens, from which you are looking for a matching token.
So, first, improve to read the increment "++" as one token. If I can somehow add "++" to this list, I think I can clear it.
Going back, from type = PyTokenizer_Get (tok, & a, & b);
in the parsetok function, Parser / tokenizer.c: int PyTokenizer_Get (struct tok_state * tok, char ** p_start, char ** p_end)
→ It comes with Parser / tokenizer.c: static int tok_get (struct tok_state * tok, char ** p_start, char ** p_end)
.
If you skip the part that seems to separate line breaks and numbers with reference to the comment, functions such as PyToken_TwoChars
and PyToken_ThreeChars
are called in the block starting with the comment / * Check for two-character token * /
You can see that it has been released. Looking inside these guys
Parser/tokenizer.c
switch (c1) {
case '=':
switch (c2) {
case '=': return EQEQUAL;
}
break;
The switch statement continues like this, and if you add it here, it will recognize ++
as a token.
First, try adding ʻINCREMENT to the end of ʻInclude / token.h
where this returning ʻEQEQUAL` is defined.
Include/token.h
#define ERRORTOKEN 56
#define N_TOKENS 57
#ifdef DOSS_INCREMENT
#define INCREMENT 58
#endif
Below, the changes are shown by enclosing them in #ifdef DOSS_INCREMENT
and #endif
.
Furthermore, if you try grep“ EQEQUAL ”-r -I
, there is an array of token names in Parser / tokenizer.c
, so modify it as follows.
Parser/tokenizer.c
/* Token names */
const char *_PyParser_TokenNames[] = {
...
"<ERRORTOKEN>",
"<N_TOKENS>"
#ifdef DOSS_INCREMENT
,"INCREMENT"
#endif
};
Parser/tokenizer.c
int
PyToken_TwoChars(int c1, int c2)
{
switch (c1) {
...
case '+':
switch (c2) {
case '=': return PLUSEQUAL;
#ifdef DOSS_INCREMENT
case '+': return INCREMENT;
#endif
}
break;
(Omitted)
Add the defined ʻINCREMENT` to the location of the switch statement.
If you do this and remake it and chase it with gdb, you can see that ++
is certainly read as one token. In addition, in order to enable #ifdef DOSS_INCREMNET
~ #endif
, -DDOSS_INCREMENT = 1
was added to CFLAGS and make was performed.
Of course, this change alone still causes the PyParser_AddToken
function to return an error return value.
By the way, when I googled with python grammar, it turned out that a file called Grammar existed. The site is here. It says "the full Python grammar, as it is read by the parser generator and used to parse Python source files". I have to read it.
When you open it, you will find a grammar definition that is very similar to a regular expression. It seems to be an extended BNF notation. I couldn't believe that the grammar was written in a file that wasn't in C, so I deleted this file and made it. Then I couldn't build. Apparently it is really used. When I looked near the power **, which is similar to increment, there was something close, so I decided to imitate and change it.
Grammar
#######################################
# #ifdef DOSS_INCREMENT
# Tips: We should not use Japanese here!!
#######################################
#factor: ('+'|'-'|'~') factor | power
#power: atom_expr ['**' factor]
# factor: power
factor: ['+'|'-'|'~'] inc ['**' factor]
inc: atom_expr ['++']
#######################################
The first two lines,
Grammar
factor: ('+'|'-'|'~') factor | power
power: atom_expr ['**' factor]
Originally, the two lines below,
Grammar
factor: ['+'|'-'|'~'] inc ['**' factor]
inc: atom_expr ['++']
Is after the change. Please note that the comment symbol is #. As long as increments are implemented, I thought that it would be bad if multiple codes were added, so I made it possible to add at most one sign. Also, the precedence of ++ operators is higher than **.
This Grammar file will help you a lot later. A rough route for the future was written in the commented out part. The first is the following comment at the end of the sentence.
Grammar
# The reason that keywords are test nodes instead of NAME is that using NAME
# results in an ambiguity. ast.c makes sure it's a NAME.
# "test '=' test" is really "keyword '=' test", but we have no such token.
# These need to be in a single rule to avoid grammar that is ambiguous
# to our LL(1) parser. Even though 'test' includes '*expr' in star_expr,
# we explicitly match '*' here, too, to give it proper precedence.
# Illegal combinations and orderings are blocked in ast.c:
# multiple (test comp_for) arguements are blocked; keyword unpackings
# that precede iterable unpackings are blocked; etc.
It's long, so in summary, it seems that you should also play with ast.c. Also, at the beginning of Grammar, there is something like this.
Grammar
# NOTE WELL: You should also follow all the steps listed at
# https://docs.python.org/devguide/grammar.html
It also introduces a useful site. This site, the official Python page, has instructions for adding new grammar. However, it's very simple.
I decided to rewrite ast.c as I was told in Grammar. It seems that it is making a tree structure, but seeing that the return value is an int type, it seems that it is just checking for syntax errors. The tree structure may have already been made. I'm not sure, so I'll imitate other code and implement something for incrementing in an atmosphere. There is a knack for simply imitating here. I changed it in Grammar because it is factor and power, so I know that I should probably change or delete these two and type inc. After that, we will investigate what kind of behavior is around +,-, ~ of the same unary operator as ++. See here for specific changes.
Let's make
immediately.
In file included from Python/ast.c:7:0:
Python/ast.c: In function ‘ast_for_inc’:
Python/ast.c:2492:22: error: ‘UInc’ undeclared (first use in this function)
return UnaryOp(UInc, expression, LINENO(n), n->n_col_offset, c->c_arena);
^
Include/Python-ast.h:503:49: note: in definition of macro ‘UnaryOp’
#define UnaryOp(a0, a1, a2, a3, a4) _Py_UnaryOp(a0, a1, a2, a3, a4)
^
Python/ast.c:2492:22: note: each undeclared identifier is reported only once for each function it appears in
return UnaryOp(UInc, expression, LINENO(n), n->n_col_offset, c->c_arena);
^
Include/Python-ast.h:503:49: note: in definition of macro ‘UnaryOp’
#define UnaryOp(a0, a1, a2, a3, a4) _Py_UnaryOp(a0, a1, a2, a3, a4)
^
make: *** [Python/ast.o]Error 1
I'm getting an error in ast.c. Certainly, I don't remember defining UInc. After this, I will take a look at the useful site that I saw before in Tekito. If you excerpt a part of the checklist,
- Grammar/Grammar: OK, you’d probably worked this one out :)
- Parser/Python.asdl may need changes to match the Grammar. Run make to regenerate Include/Python-ast.h and Python/Python-ast.c.
- Python/ast.c will need changes to create the AST objects involved with the Grammar change.
- Parser/pgen needs to be rerun to regenerate Include/graminit.h and Python/graminit.c. (make should handle this for you.)
- Python/symtable.c: This handles the symbol collection pass that happens immediately before the compilation pass.
- Python/compile.c: You will need to create or modify the compiler_* functions to generate opcodes for your productions.
I did it at the top. The second one doesn't make much sense. Do you need it? ast.c has been rewritten. I look at pgen and symbol, but I can't find anything to change. I will skip it for the time being. There seems to be something like compile.c. Compile with Python, which should be an interpreted language ...? I found a useful site when I googled it. It seems to be how to make a compiler. It looks like it's really compiling. You can also see bytecode. As a result of various investigations, it seems that there is a Python Virtual Machine, and it seems to be bytecode for it. It's getting lower layer.
Certainly compile.c has
compile.c
/*
* This file compiles an abstract syntax tree (AST) into Python bytecode.
*
What is written. Search for words such as Unary, UAdd, and power, and naturally add increments to fit around.
compile.c
case UNARY_POSITIVE:
case UNARY_NEGATIVE:
case UNARY_NOT:
case UNARY_INVERT:
#ifdef DOSS_INCREMENT
case UNARY_INCREMENT:
#endif
return 0;
compile.c
case UAdd:
return UNARY_POSITIVE;
case USub:
return UNARY_NEGATIVE;
#ifdef DOSS_INCREMENT
case UInc:
return UNARY_INCREMENT;
#endif
I changed it like this. By the way, how to use macros in compile.c is very interesting, so if you can afford it, I will explain it in hoge. Now let's make
.
In file included from Python/ast.c:7:0:
Python/ast.c: In function ‘ast_for_inc’:
Python/ast.c:2492:22: error: ‘UInc’ undeclared (first use in this function)
return UnaryOp(UInc, expression, LINENO(n), n->n_col_offset, c->c_arena);
^
Include/Python-ast.h:503:49: note: in definition of macro ‘UnaryOp’
#define UnaryOp(a0, a1, a2, a3, a4) _Py_UnaryOp(a0, a1, a2, a3, a4)
^
Python/ast.c:2492:22: note: each undeclared identifier is reported only once for each function it appears in
return UnaryOp(UInc, expression, LINENO(n), n->n_col_offset, c->c_arena);
^
Include/Python-ast.h:503:49: note: in definition of macro ‘UnaryOp’
#define UnaryOp(a0, a1, a2, a3, a4) _Py_UnaryOp(a0, a1, a2, a3, a4)
^
Python/compile.c: In function ‘PyCompile_OpcodeStackEffect’:
Python/compile.c:877:7: error: ‘UNARY_INCREMENT’ undeclared (first use in this function)
case UNARY_INCREMENT:
^
Python/compile.c:877:7: note: each undeclared identifier is reported only once for each function it appears in
Python/compile.c: In function ‘unaryop’:
Python/compile.c:2755:7: error: ‘UInc’ undeclared (first use in this function)
case UInc:
^
Python/compile.c:2756:10: error: ‘UNARY_INCREMENT’ undeclared (first use in this function)
return UNARY_INCREMENT;
^
make: *** [Python/ast.o]Error 1
make: ***Waiting for an incomplete job....
make: *** [Python/compile.o]Error 1
I was angry that UInc and UNARY_INCREMENT were not defined. Certainly I don't have a defined memory.
Let's solve it from UNARY_INCREMENT first. To find out where to define
$ grep UNARY_INCREMENT . -r -I
Then, ./Include/opcode.h:#define UNARY_POSITIVE 10
was found. Let's add it to ad hoc.
opcode.h
#define UNARY_POSITIVE 10
#define UNARY_NEGATIVE 11
#define UNARY_NOT 12
#ifdef DOSS_INCREMENT
#define UNARY_INCREMENT 13
#endif
#define UNARY_INVERT 15
Next, in order to find the file that should define UInc, first check how the unary operator (sign) + is called in ast.c.
ast.c
case PLUS:
return UnaryOp(UAdd, expression, LINENO(n), n->n_col_offset,
c->c_arena);
It looks like this. When I jumped to the definition source of ʻUnaryOp ()` with a symbolic link, it was in Python-ast.h. Basically, when you find a new source code, there is an explanation about the source code at the beginning or the end, so look at it without exception.
Python-ast.h
/* File automatically generated by Parser/asdl_c.py. */
This is written on the first line. This file itself seems to be automatically generated. In asdl_c.py, the code that seems to be automatically generated is written. Let's take a look at the actual automatic generation site. The command executed when make
is done is in the Makefile
.
$ grep asdl_c.py Makefile
ASDLGEN_FILES= $(srcdir)/Parser/asdl.py $(srcdir)/Parser/asdl_c.py
ASDLGEN= python3 $(srcdir)/Parser/asdl_c.py
It is certainly used. Next time,
$ grep ASDLGEN Makefile
ASDLGEN_FILES= $(srcdir)/Parser/asdl.py $(srcdir)/Parser/asdl_c.py
ASDLGEN= python3 $(srcdir)/Parser/asdl_c.py
$(AST_H): $(AST_ASDL) $(ASDLGEN_FILES)
$(ASDLGEN) -h $(AST_H_DIR) $(AST_ASDL)
$(AST_C): $(AST_H) $(AST_ASDL) $(ASDLGEN_FILES)
$(ASDLGEN) -c $(AST_C_DIR) $(AST_ASDL)
View the production site as. It seems that it is AST_ASDL that actually drives the generation, so
$ grep AST_ASDL Makefile
AST_ASDL= $(srcdir)/Parser/Python.asdl
$(AST_H): $(AST_ASDL) $(ASDLGEN_FILES)
$(ASDLGEN) -h $(AST_H_DIR) $(AST_ASDL)
$(AST_C): $(AST_H) $(AST_ASDL) $(ASDLGEN_FILES)
$(ASDLGEN) -c $(AST_C_DIR) $(AST_ASDL)
Then you will find Python.asdl. Let's take a look. If you look at it, you will see words like those you saw in Grammar. If you search for words such as power and UAdd, you will find the next line.
Python.asdl
unaryop = Invert | Not | UAdd | USub
It seems to specify the type of unary operator. So let's add UInc to this.
Python.asdl
-- #ifdef DOSS_INCREMENT
-- unaryop = Invert | Not | UAdd | USub
unaryop = Invert | Not | UAdd | USub | UInc
-- #endif
The comment seems to be--. Let's make
. I was able to build it! However, I feel like I glanced at an error ... So run it with make | grep error
.
/Modules/parsermodule.c: In function ‘validate_power’:
/Modules/parsermodule.c:2501:37: error: ‘power’ undeclared (first use in this function)
int res = (validate_ntype(tree, power) && (nch >= 1)
^
/Modules/parsermodule.c:2501:37: note: each undeclared identifier is reported only once for each function it appears in
/Modules/parsermodule.c: In function ‘validate_node’:
/Modules/parsermodule.c:3390:16: error: ‘power’ undeclared (first use in this function)
case power:
^
It looks like parsermodule.c also needs to be modified.
There are a lot of functions named validate, and it seems that they are checking whether the number and types of grammar arguments defined in Grammar are correct. I made a new inc with Grammar and turned off the power, so I have to make the corresponding changes here as well. There are many changes, so please check here.
If you change it so far, both make
and make install
will pass, so try running it for the time being.
$ ./python3
Python 3.5.0 (default, Nov 16 2015, 17:03:57)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> i=0
>>> i++
XXX lineno: 1, opcode: 101
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: unknown opcode
>>> +++++1
File "<stdin>", line 1
+++++1
^
SyntaxError: invalid syntax
How ʻi ++is now "grammatically" acceptable. However, it was played if such an opcode was not registered at the time of execution. Certainly not registered. On the other hand,
++++ + 1` became a Syntax Error. It's an acceptable expression in Python, but it's forbidden in Grammar to match the increment implementation. Since it is a grammar error, the grammar rewriting is successful.
All that remains is the "contents" of ++. The policy is to find out where the unknown opcode is issued and define the increment behavior in the definition source. Keep track of it using gdb. At this time, it was set as run test.py
, but the contents of test.py are as follows.
i=0
i++
The error message seems to be issued by pythonrun.c: 401, PyErr_Print ().
pythonrun.c
} else {
/* When running from stdin, leave __main__.__loader__ alone */
if (strcmp(filename, "<stdin>") != 0 &&
set_main_loader(d, filename, "SourceFileLoader") < 0) {
fprintf(stderr, "python: failed to set __main__.__loader__\n");
ret = -1;
goto done;
}
v = PyRun_FileExFlags(fp, filename, Py_file_input, d, d,
closeit, flags);
}
flush_io();
if (v == NULL) {
PyErr_Print();
goto done;
}
Py_DECREF(v);
Since v == NULL
, an error message is displayed, so go into PyRun_FileExFlags (). The return value of this function is pythonrun.c: 961, run_mod (), and it goes through ... Then you arrive at ceval.c, and you can see that the error message is set as PyErr_SetString (PyExc_SystemError," unknown opcode ");
on line 3429.
The explanation of ceval.c is very simple,
ceval.c
/* Execute compiled code */
It has become. The heart of the Python Virtual Machine, the ALU of the VM. So it seems that you can implement increment by playing around with it. However, I couldn't seem to get a clue even if I looked at it casually.
Even so, I defined the opcode in opcode.h, but wondering what it means to be unknown, and open opcode.h again. First line
opcode.h
/* Auto-generated by Tools/scripts/generate_opcode_h.py */
What was written. It didn't make sense to change it directly. Search for Makefile.
$ grep opcode.h Makefile -3
PGENOBJS= $(POBJS) $(PGOBJS)
##########################################################################
# opcode.h generation
OPCODE_H_DIR= $(srcdir)/Include
OPCODE_H_SCRIPT= $(srcdir)/Tools/scripts/generate_opcode_h.py
OPCODE_H= $(OPCODE_H_DIR)/opcode.h
OPCODE_H_GEN= python3 $(OPCODE_H_SCRIPT) $(srcdir)/Lib/opcode.py $(OPCODE_H)
#
##########################################################################
It seems that opcode.py is generated. I added it like this.
opcode.py
def_op('NOP', 9)
def_op('UNARY_POSITIVE', 10)
def_op('UNARY_NEGATIVE', 11)
def_op('UNARY_NOT', 12)
#ifdef DOSS_INCREMENT
def_op('UNARY_INCREMENT',13)
#endif
def_op('UNARY_INVERT', 15)
make
further evolved the error message.
In file included from Python/ceval.c:891:0:
Python/ceval.c: In function ‘PyEval_EvalFrameEx’:
Python/opcode_targets.h:15:5: error: label ‘TARGET_UNARY_INCREMENT’ used but not defined
&&TARGET_UNARY_INCREMENT,
^
gcc -pthread -c -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -O0 -g -DDOSS_INCREMENT=1 -Werror=declaration-after-statement -I. -IInclude -I./Include -DPy_BUILD_CORE -o Python/future.o Python/future.c
make: *** [Python/ceval.o]Error 1
make: ***Waiting for an incomplete job....
As per the error, TARGET_UNARY_INCREMENT was automatically generated in opcode_targets.h. The problem seems to be that the definition is not in ceval.c. But when I search for ceval.c with TARGET_UNARY_POSITIVE, there is no match.
$ grep TARGET_UNARY_POSITIVE . -r -I
./Python/opcode_targets.h: &&TARGET_UNARY_POSITIVE,
Only used here ...! This is strange. If you narrow down the conditions and search for ʻUNARY_POSITIVE` in ceval.c
ceval.c
TARGET(UNARY_POSITIVE) {
PyObject *value = TOP();
PyObject *res = PyNumber_Positive(value);
Py_DECREF(value);
SET_TOP(res);
if (res == NULL)
goto error;
DISPATCH();
Such a thing was a hit. This is not a function ... It's a macro! You can find the definition of the evil macro near the beginning. Such a wording.
ceval.c
/* Computed GOTOs, or
the-optimization-commonly-but-improperly-known-as-"threaded code"
using gcc's labels-as-values extension
(http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html).
As far as I read the explanation, it seems that it is faster to make full use of the goto statement using macros than to classify opcodes with switch statements. A glimpse of the ingenuity for speeding up the interpreter language. This significantly reduces the readability of the code.
So create TARGET (UNARY_INCREMENT)
.
ceval.c
#ifdef DOSS_INCREMENT
TARGET(UNARY_INCREMENT) {
PyObject *left = TOP();
PyObject *inv, *sum;
// note that -(~x) == x+1 for all x
inv = PyNumber_Invert(left);
//Py_DECREF(left);
if (inv == NULL)
goto error;
sum = PyNumber_Negative(inv);
Py_DECREF(inv);
if (sum == NULL)
goto error;
SET_TOP(sum);
DISPATCH();
}
#endif
TOP and SET_TOP seem to be access to the stack. Also, it seems that the Py_DECREF () macro is often called, but it is probably garbage collection.
You will see PyNumber_Invert ()
and PyNumber_Negative ()
above. Certainly, in order to implement increment legitimately, it may be better to create a function called PyNumber_Increment ()
. Or maybe I should have created an object that represents 1 and realized it with an addition instruction. However, I went to find out the definition source of the function and tried various things, but it was complicated and unclear, and there was a time constraint, so I confess that I decided to make good use of the existing function.
-(~x) == x+1 for all x
Using the fact, we succeeded in generating x + 1 from x with only existing functions. If you make
and move this,
$ ./python3
Python 3.5.0 (default, Nov 16 2015, 18:50:21)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> i=0
>>> print(i++)
1
>>> i
0
I recognized it and started to move. The value is now returned properly for ʻi ++. However, the value of ʻi
was not rewritten. Up until now, we have defined increments "naturally", but it seems that variable rewriting was leaked somewhere due to its naturalness.
I found the following meaningful comment in parsermodule.c:
parsermodule.c
/* There are two types of intermediate objects we're interested in:
* 'eval' and 'exec' types. These constants can be used in the st_type
* field of the object type to identify which any given object represents.
* These should probably go in an external header to allow other extensions
* to use them, but then, we really should be using C++ too. ;-)
*/
That is, it seems that there are two types, eval and exec. This is represented by ʻi + 1 and ʻi + = 1
. The problem now is that ++ i
has both of these properties.
Variable rewriting is done with ʻi + = 1`. If you look at how this is defined in Grammar, it's an Aug Assign, which is in a completely different location from the + and-that you've seen so far. From the grammar stage, it existed in a place completely different from the variable rewriting part. Looking back with the debugger, the STORE instruction was executed in ceval.c in AugAssign. Due to time constraints, I would like to avoid reviewing from the grammar stage, and above all, if I focus on + =, the variables may be rewritten but no value may be returned.
Here is the flow of Python.
Since eval and exec implement increments that are contrary to the Python design concept, they cannot be implemented neatly. However, if you choose to change one of the above options, the most orthodox thing seems to be rewriting compile.c. Until now, it only output opcodes that explicitly return a value of +1 for the syntax tree for increments. On the other hand, it is considered that the desired operation can be obtained by adding the STORE instruction.
From the notation ʻi ++, the operand of ++ should be i. I was able to execute this by copying the STORE instruction in the location corresponding to AugAssign, performing the
+ 1` operation, and then adding it.
Recommended Posts