An introduction to Python's AST (Abstract Syntax Tree) starting from just one line

Introduction

This article is the 20th day article of 2020 RevComm Advent Calendar. The 19th day was @ metal-president's "For mobile team growth and KMM introduction".

My name is @rhoboro and I joined RevComm Inc. in November. In my previous job, I mainly work in GCP x Python, and in my current job, I mainly work in AWS x Python. At RevComm, I work from Onomichi in Hiroshima prefecture with full remote work, so if you are interested in such a way of working, please see this article.

Now let's get down to the main topic.

What is Python's AST (Abstract Syntax Tree)?

This article is an introductory article on AST for those who have never touched Python's AST (Abstract Syntax Tree). In the first place, I will mainly introduce what AST is and what you can do if you understand AST.

As the title says, let's type a one-line command first. Prepare the following module schema.py and then execute the python3 command under it.

schema.py


#This class was below
# https://docs.python.org/ja/3/tutorial/classes.html
class MyClass:
    """A simple example class"""
    i = 12345

    def f(self):
        return 'hello world'
$ python3 -c 'import ast; print(ast.dump(ast.parse(open("schema.py").read()), indent=4))'

When I run the command, I get the following output: (This result was run in Python 3.9) If you compare it with schema.py earlier, you can see that it expresses the same thing as the source code, although it looks different. Also, if Module, ClassDef, Expr, etc. are Python class names, this result will also look like a Python object.

Module(
    body=[
        ClassDef(
            name='MyClass',
            bases=[],
            keywords=[],
            body=[
                Expr(
                    value=Constant(value='A simple example class')),
                Assign(
                    targets=[
                        Name(id='i', ctx=Store())],
                    value=Constant(value=12345)),
                FunctionDef(
                    name='f',
                    args=arguments(
                        posonlyargs=[],
                        args=[
                            arg(arg='self')],
                        kwonlyargs=[],
                        kw_defaults=[],
                        defaults=[]),
                    body=[
                        Return(
                            value=Constant(value='hello world'))],
                    decorator_list=[])],
            decorator_list=[])],
    type_ignores=[])

As you may have noticed, this is a Python AST object. In this way, AST (Abstract Syntax Tree) is a tree structure that parses the source code that is a character string. In other words, when Python executes a program, the following processing works.

  1. Parsing the source code to generate an AST object
  2. A code object is generated from the AST object
  3. Executable bytecode is generated from the code object and executed

How to read AST in Python

By now, you should have somehow understood that ** AST is an intermediate representation between source code and executable bytecode **. Let's take a closer look inside the AST object. First of all, I would like to introduce the tools needed for that purpose.

Standard library ast module

The Python standard library has a convenient ast module for working with AST objects. The command I just executed also used the following two helper functions. Both are explained here in one word, so please see the official documentation link for details.

$ python3 -c 'import ast; print(ast.dump(ast.parse(open("schema.py").read()), indent=4))'

--ast.parse (): Parses the passed source and returns an AST object -ast.dump (): Dump the tree structure of the passed AST object for easy viewing.

Also, the keywords such as ClassDef, Expr, and Assign in the command output result above are all subclasses of the ast.AST class. A list of defined subclasses can be found in the Official Documentation Abstract Grammar (https://docs.python.org/ja/3/library/ast.html#abstract-grammar). There is a class for each symbol on the left side of the abstract grammar, and each constructor on the right side is a subclass of the symbol on the left side.

Decipher the AST object

Now that you have everything you need, let's take a look at the AST. However, the output result above is too large, so here we will look at the AST object of the very simple Python source code, x = 1.

$ python3 -c 'import ast; print(ast.dump(ast.parse("x=1"), indent=4))'
Module(
    body=[
        Assign(
            targets=[
                Name(id='x', ctx=Store())],
            value=Constant(value=1))],
    type_ignores=[])

Since Module was mentioned earlier, if you ignore it here, it is Assign that expresses x = 1.

Assign(
    targets=[
        Name(id="x", ctx=Store())
    ],
    value=Constant(value=1)
)

Assign is, as the name implies, a node that represents an assignment. The one on the left side of the assignment is stored in targets, and the one on the right side is stored in value. [^ 1]

Therefore, you can see that the node representing the left side x of the assignment is Name (id = "x", ctx = Store ()). The Name argument ctx corresponds to storing, loading, and deleting variables, which are Store (), Load (), and Del (), respectively. Since 1 on the right side is a constant, it is Constant (value = 1) as it is.

Now you can see that this AST object represents the expression x = 1. The AST object that is the result of the first command in this article can also be read by looking at the ast module documentation one by one in the same way.

Utilization of AST objects

The AST object is, as mentioned earlier, an intermediate representation between the source code and the executable bytecode **. Still, it's also a Python object, so it's easier to work with from a Python program than source code or code objects. Therefore, AST objects can be used in various ways.

For example, it is used in static analysis tools such as mypy and flake8, and in pytest, [change AST object of assert statement](https://github.com/pytest-dev/pytest/blob /) bd894e3065ba6fa13327ad5dfc94f2b6208cf0ff/src/_pytest/assertion/rewrite.py # L823) makes the assert statement more convenient. Alternatively, you can create an AST object from a file that is not regular Python source code and run it as a Python object. [^ 2]

You can also use ast.unparse () added in Python 3.9 to generate source code from an AST object. Using this, I built an AST object from a JSON file and created a library pydantic-generator that generates a model class of pydantic. Please touch it if you like.

Note at the end

** Modifying an AST object can be confusing and behave unexpectedly by users and other developers. ** ** ** Do not use unless there is no other way. ** [^ 3]

in conclusion

Like myself, many people have the impression that AST is difficult. However, when I open the lid, it says Document is only one page, and it is very simple because it has a one-to-one correspondence with the source code. Take a look at the AST objects in various modules in this handy line.

$ python3 -c 'import ast; print(ast.dump(ast.parse(open("YourFile.py").read()), indent=4))'

Tomorrow is @k_ishi from the research team. 2020 RevComm Advent Calendar continues for a day without interruption, so stay tuned for tomorrow!

[^ 1]: The list on the left is due to unpacking like a, b = (1, 2).

Recommended Posts

An introduction to Python's AST (Abstract Syntax Tree) starting from just one line
[Introduction] From installing kibana to starting
Getting Started with Python's ast Module (Following the Abstract Syntax Tree)
Compiler in Python: PL / 0 Abstract Syntax Tree (AST)
How to create an article from the command line
An introduction to machine learning from a simple perceptron
How to parse Java source code with AST (Abstract Syntax Tree) using ANTLR and Python
[Introduction to Udemy Python3 + Application] 32.1 When one line becomes long