I tried to make an original language "PPAP Script" that imaged PPAP (Pen Pineapple Appo Pen) with Python

Yesterday (6th day) was seiketkm's "[I developed a Robophone app that came from the future](http://qiita.com/seiketkm/items/ It was 46992f933294a7668dba) ". This article is the 7th day article of Tech-Circle Hands on Advent Calendar 2016.


This time, I would like to create an original language using PLY (lex + yacc), which is a Python lexical analysis / parsing library.

Speaking of the original language, TrumpScript, a programming language inspired by Donald Trump, was released before. https://github.com/samshadwell/TrumpScript

TrumpScript has the following features.

Such…. In this way, it is a language full of sense that faithfully reproduces Mr. Donald Trump.

Therefore, this time, in opposition to TrumpScript, "[PPAPScript](https://github.com/sakaro01/PPAPScript." I'm going to create "git)". pikotaro.jpg

PPAP Script specification

The specifications that I came up with are like this.

What is ply

Before implementing PPAPScript, I will explain the ply used this time. ply is a Python library that implements lex and yacc in Python and puts them together as a module.

Introduction method

Installation of ply can be done with pip. It also supports python3.

$ pip install ply 

From here, I will explain the minimum usage in lex.py and yacc.py.

Explanation of lex.py

This is an explanation of lex.py, which is responsible for lexical analysis.

1. Import lex.

import ply.lex as lex 

2. Define the words you want to parse in a variable called "tokens" in tuple format.

tokens = (
    'NUMBER',
    'PLUS',
    'MINUS',
    'TIMES',
    'DIVIDE',
    'LPAREN',
    'RPAREN',
)

3. Define a regular expression lexical analysis rule.

There are two ways to define it. In either method, the naming convention for variable names and function names is defined in the form t_ (token name).

Definition of simple lexical analysis rules

t_PLUS   = r'\+'
t_MINUS  = r'-'
t_TIMES  = r'\*'
t_DIVIDE = r'/'
t_LPAREN = r'\('
t_RPAREN = r'\)'

When processing during lexical analysis

Define the regular expression on the first line of the function. A LexToken object is always passed as an argument. This will be the lexical object that matches. In the following example, the token value that matches the regular expression rule is converted to int type.

def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

4. Skip unnecessary strings.

A special variable called t_ignore allows you to skip a string. Spaces and tabs are skipped in the example below.

t_ignore = ' \t'

5. Define the syntax for destroying tokens.

You can define commenting regular expression rules by using a special variable called t_ignore_COMMENT.

t_ignore_COMMENT = r'\#.*'

6. Define error handling.

The t_error function is called if no lexical match is found.

def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(t)

7. Build.

Build with lex (). This completes the preparation for lexical analysis.

lex.lex()

Explanation of yacc.py

This is a description of yacc.py, which is responsible for parsing.

1. Import yacc.

import ply.yacc as yacc

2. Write the parsing rule.

The following example defines an addition syntax rule.

def p_expression_minus(p):
    'expression : expression PLUS term'
    p[0] = p[1] - p[3]

The following are the rules for defining.

def p_expression_minus(p):
    'expression : expression MINUS term'
    #Non-terminal symbol:Non-terminal symbol終端記号 非終端記号
def p_expression_minus(p):
    'expression : expression MINUS term'
    #  p[0]         p[1]     p[2] p[3]
 
    p[0] = p[1] - p[3]
def p_statement_assign(p):
    """statement : NAME EQUALS expression"""
    names[p[1]] = p[3]


def p_expression_minus(p):
    'expression : expression MINUS term'
 
    p[0] = p[1] - p[3]

3. Synthesize the syntax rules.

Similar syntax rules can be grouped together, as shown below.

def p_expression_binop(p):
    """expression : expression PLUS expression
                  | expression MINUS expression
                  | expression TIMES expression
                  | expression DIVIDE expression"""
    if p[2] == '+':
        p[0] = p[1] + p[3]
    elif p[2] == '-':
        p[0] = p[1] - p[3]
    elif p[2] == '*':
        p[0] = p[1] * p[3]
    elif p[2] == '/':
        p[0] = p[1] / p[3]

4. Define error handling.

Similar to lex, it is called when no syntax rule is matched.

def p_error(p):
    print "Syntax error in input"

5. Parse.

Create a paraser object with yacc () and parse it with parser.parse (). Pass the string you want to parse as an argument.

parser = yacc.yacc()
parser.parse(data)

Implement PPAP Script

The implementation will be created based on the README Example in the ply repository. https://github.com/dabeaz/ply/blob/master/README.md

Start the program with "PPAP"

The flag is controlled by the part that executes yacc.parse ().

# Started flag is true by "PPAP" command
has_started = False

def parse(data, debug=0):
    if data == "PPAP":
        global has_started
        has_started = True
        print("Started PPAPScript!")
        return

    if has_started:
        return yacc.parse(data, debug=debug)
    else:
        print('PPAPScript run by "PPAP" command.')
        return

Create a bypass where the lexical analysis of the variable (t_NAME) catches "PPAP" so that the regular expression ignores "PPAP".

def t_NAME(t):
    r"""(?!PPAP)[a-zA-Z_][a-zA-Z0-9_]*"""
    return t

Only the combination of "pen", "pineapple" and "apple" can be used (case is ignored)

You can limit the variable name with a lex regular expression, but since you want to issue a dedicated error message, use the re module to handle the error.

def t_NAME(t):
    r"""(?!PPAP)[a-zA-Z_][a-zA-Z0-9_]*"""
    pattern = re.compile(r'^(apple|pineapple|pen)+', re.IGNORECASE)
    if pattern.match(t.value):
        return t
    else:
        print("This variable name can't be used '%s'.\n "
              "Variable can use 'apple', 'pineapple', 'pen'." % t.value)
        t.lexer.skip(t)

Be sure to add "I_have_a" or "I_have_an" to the variable declaration assignment.

It is defined in def to prioritize lexical analysis. (Rex takes precedence in the order defined by def) In this case, the definition is required before t_NAME.

def t_DECLARE(t):
    r"""I_have_(an|a)"""
    return t

The output function is "Ah!"

Both lex and yacc have ordinary definitions.

def t_PRINT(t):
    r"""Ah!"""
    return t
def p_statement_print_expr(p):
    """statement : PRINT expression"""
    print(p[2])

Executing PPAP Script

The finished product is published in the following repository, so I will clone it. PPAPScript

$ git clone https://github.com/sakaro01/PPAPScript.git

Install ply.

$ pip install -r requirements.txt

Execute PPAPScript.

$ python ppapscript.py

Let's play interactively. (Currently only interactive) PPAPScript_.gif

Summary

next time

Next time Tech-Circle Hands on Advent Calendar 2016 will be in charge of my synchronization Koga Yuta is. Probably a robot. It may be interesting to apply this article to create an original robot command language.

reference

Recommended Posts

I tried to make an original language "PPAP Script" that imaged PPAP (Pen Pineapple Appo Pen) with Python
I tried to make an image similarity function with Python + OpenCV
[Python] I tried to make an application that calculates salary according to working hours with tkinter
I tried to implement an artificial perceptron with python
I tried to make an OCR application with PySimpleGUI
I tried to make various "dummy data" with Python faker
I tried to make GUI tic-tac-toe with Python and Tkinter
[1 hour challenge] I tried to make a fortune-telling site that is too suitable with Python
[5th] I tried to make a certain authenticator-like tool with python
I tried to make an activity that collectively sets location information
[2nd] I tried to make a certain authenticator-like tool with python
[Python] A memo that I tried to get started with asyncio
I tried to make a periodical process with Selenium and Python
I tried to make a 2channel post notification application with Python
I tried to make a todo application using bottle with python
[4th] I tried to make a certain authenticator-like tool with python
[Python] Simple Japanese ⇒ I tried to make an English translation tool
[1st] I tried to make a certain authenticator-like tool with python
I tried a functional language with Python
Python: I tried to make a flat / flat_map just right with a generator
I tried to make an open / close sensor (Twitter cooperation) with TWE-Lite-2525A
I tried to make a traffic light-like with Raspberry Pi 4 (Python edition)
[Zaif] I tried to make it easy to trade virtual currencies with Python
I want to make a game with Python
I tried to get CloudWatch data with Python
I tried to output LLVM IR with Python
I tried to detect an object with M2Det!
I tried to automate sushi making with python
I tried sending an email with SendGrid + Python
I tried to make a periodical process with CentOS7, Selenium, Python and Chrome
[Python] I tried to make a Shiritori AI that enhances vocabulary through battles
I tried to make a real-time sound source separation mock with Python machine learning
I tried to implement Minesweeper on terminal with python
I tried to get started with blender python script_Part 01
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to solve the soma cube with python
[Mac] I want to make a simple HTTP server that runs CGI with Python
Continuation ・ I tried to make Slackbot after studying Python3
I tried to get started with blender python script_Part 02
I tried to automatically generate a password with Python3
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I tried to aggregate & compare unit price data by language with Real Gachi by Python
I tried to find an alternating series with tensorflow
I tried to build an environment for machine learning with Python (Mac OS X)
I tried to solve AOJ's number theory with Python
I tried to make Othello AI that I learned 7.2 million hands by deep learning with Chainer
I tried to find the entropy of the image with python
I tried to simulate how the infection spreads with Python
I tried various methods to send Japanese mail with Python
I tried sending an email from Amazon SES with Python
[Python] I tried to visualize tweets about Corona with WordCloud
I tried to make a stopwatch using tkinter in python
I tried changing the python script from 2.7.11 to 3.6.0 on windows10
I tried to divide the file into folders with Python
I tried to create an article in Wiki.js with SQLAlchemy
I tried fp-growth with python
I tried scraping with Python
I tried gRPC with Python
I tried scraping with python