DSL development using ANTLR 4.7.1

Preface

This time, from the user who is using the Java-made automatic code generation tool, Damascus that I am making, just use the template for automatic generation. Instead, there was a request to generate a template from the source code and create a custom template.

To implement it, regular expressions alone couldn't handle it well, and when I was looking for a solution, I solved it by creating a simple DSL using the compiler / compiler ANTLR.

After some trial and error, I was able to implement DSL, so I will expose my development flow this time. As a sample project, this time I will use my work Damascus.

Understand the basic operation of ANTLR

ANTLR is a Java tool for creating your own processing system (programming language). It automatically generates a lexical analyzer (Lexer) and a parser (Parser) from the file `` `* .g4``` described in the extended BNF.

I don't know right or left, but I referred to the commentary book "[The Definitive ANTLR 4 Reference](https://pragprog.com/book/tpantlr2/the-definitive]" written by Professor Terence Parr, the author of ANTLR. -antlr-4-reference) ". (Hereafter, ANTLR book) The number of pages is about 300 pages, which is quite thick, but it is written in plain English, with abundant code examples in the text and Sample code on Github Since there is remenska / Grammars), I actually tried to move it while watching it, and I was able to read it by touching the code sample in about 2 weeks.

At first, I just looked at the sample code and played around with it, but after all I wasn't sure, so I read the ANTLR book from the beginning. After all, I thought that was the fastest way. There is a lot of information on the website and Antlr's Github repository, but it's fragmented and not all covered, so [ANTLR books](https://pragprog.com/book/tpantlr2/the-definitive- It is highly recommended that you read all chapters of antlr-4-reference) first.

Development environment

I usually use IntelliJ, but I felt that the Antler plug-in of Eclipse was the most usable. So I am

I was developing the same project by opening it in two IDEs. As of January 2018, Oxygen is the latest version, so here's how to install the Antler plugin with it.

  1. Go to Help-> Eclipse Market place and search for Antler.
  2. Install ANTLR 4 IDE 0.3.6
  3. Eclipse restarts after installation
  4. From the top menu, Window-> Show view-> others-> ANTLR 4
  5. Select both Parse Tree and Syntax Diagram.
  6. Open the Parse Tree in Eclipse (a pane should appear)
  7. Import Damascus into Eclipse as a project and open `` `DmscSrcParser.g4```.
  8. Double-click the `file``` rule at the top of `` DmscSrcParser.g4to display DmscSrcParser :: file``` in the Parse Tree pane and the block diagram will appear. Is displayed

Development flow

Lexer / Parser design

In Damascus, there are two files, `DmscSrcParser.g4``` and `DmscSrcLexer.g4```. It was developed in the following cycle.

Lexer design

Using the extended BNF notation, I created a token definition in `` `DmscSrcLexer.g4``` while checking the Syntax Diagram of the installed Antlr plugin. If the syntax is incorrect, an error message will be displayed. Since it is difficult to create a token definition from scratch, refer to Sample in ANTLR book, and in this case, Island Language in ANTLR book. ) Was planned to be created using tags, so refer to ModeTagsLexer.g4 etc. , Created the base.

Parser design

Using the token definition designed above, I defined a parsing rule in `` `DmscSrcParser.g4```. I think it's difficult to define rules well from the beginning, but if you have to define very complicated rules in Parser, or if you get stuck in a situation where rules are not applied well, review the Lexer token definition Please try. If Lexer is well designed, Parser should be simple.

Here, I think it is necessary to repeat trial and error while repeating fine adjustments, referring to the sample of the ANTLR book. It is difficult to make a big rule from the beginning, so in this island language, create a small rule that can first judge whether it is a DSL start tag or not, and if that operation works, tag I think the point is the consciousness of locally testing small rules, such as testing the rules of the attributes inside, and combining them to create a large rule.

DmscSrcParser.Top-level rule in g4 (file for Damascus))Click to paste the syntax of interest and visually check if it is correctly decomposed into a block diagram in the Parse Tree. Also, while looking at the syntax error, Parser/We will adjust the Lexer.



### Listener development
 Now that Lexer and Parser can handle the syntax almost correctly, it's time to implement the Listener. ANTLR can output Listner method and Visitor method interface implementation. The Visitor method is suitable for sequential processing like an interpreter, but this time we will process it all at once, so we developed it using Listner.

 The `` `build.gradle``` of [Damascus](https://github.com/yasuflatland-lf/damascus) has a `` `generateGrammarSource``` task defined, which is used to root the project. When you run `` `gradle generateGrammarSource``` on, it will generate a Lexer / Parser from the ``` * .g4``` file. We will design the implementation by inheriting the generated ``` DmscSrcParserBaseListener```.

 At this stage, we will implement unit tests. JUnit is fine, but since the test framework called Spock allows you to flexibly build tests, [Damascus](https://github.com/yasuflatland-lf/damascus) uses Spock tests.

## Summary
 I learned that designing Lexer and Paser is the most difficult, especially while designing Paser, the design of Lexer is bad, and rewriting Lexer many times makes Parser simple as a result. It was a big surprise to be able to easily handle complex syntax that cannot be handled by regular expressions by using the parsing engine generated by ANTLR.

 It's a poem-like article, but I hope it helps someone!

 Tips
### If the Lexer doesn't reload properly when you change it
 Eclipse's Antler plugin works fine, but when I change Lexer frequently, it sometimes doesn't load the changed syntax well. I solved it by the following method, so I will write it for reference.

 1. Run `` `gradle generateGrammarSource```, then run `` `gradle eclipse```
 2. If it still doesn't work, in the directory where `` `* .g4``` is located (` `/ src / main / antlr```) ```antlr4 DmscSrcLexer.g4; antlr4 DmscSrcParser.g4; javac Execute Dms * .java``` and convert the generated `` `* .tokens``` and `` `* .interp``` to ``` src / main / java / com / liferay / damascus / antlr / Copy it to template```, run ``` gradle eclipse```, double-click on` `file``` in` `DmscSrcParser.g4```, and` `Parse Tree`` `Recheck the pane.
 3. If that doesn't work, try restarting Eclipse.


Recommended Posts

DSL development using ANTLR 4.7.1
Team development using Git (eclipse edition)
Development of Flink using DataStream API
[Rails 6] API development using GraphQL (Query)
Html5 development with Java using TeaVM
MOD development notes using Minecraft 14.4 Fabric API # 1
Building a Kotlin development environment using SDKMAN
Game development with two people using java 2
Game development with two people using java 1
Game development with two people using java 3