The story of writing Java in Emacs

Hello, I'm Peter Russo. I will do my best to select the governor! !! This article is the 21st day article of Emacs Advent Calandar 2016. By the way, recently I've been able to write Java in Emacs, and this time I'd like to talk about that.

Emacs and Java development environment

When it comes to Java development environments, it's common to use the IDE. Is it currently Eclipse or IntelliJ? Is it like that?

The IDE is certainly useful, but it still has a heavyweight image. If you are using Emacs in the basic main, you can not dispel the image that it seems heavy by all means Many people have a strong desire to develop Java with Emacs as well.

What are some ways to write Java in Emacs?

To name a few, you would use the following packages:

JDEE
Malabar-mode
Eclim(Emacs-eclim)
ENSIME

All of them had a hard time just installing them a long time ago, but now they are all very easy to try so that they can be installed via MELPA.

Now, I would like to see what kind of package each package is.

JDEE

Since ancient times, Emacs has a huge CEDET-based package called JDEE []. There are quite a lot of features. Since it was registered with MELPA, it can be said that the introduction is considerably lower than before.

I've recently moved the repository to GitHub and are trying to resume development, but there are still many unsupported parts of Java 5 and above syntax such as generics and extension for statements. Also, since it is based on CEDET, it will be difficult to maintain it anymore. The build tools originally only support Ant, and the features that are fundamentally supported are also old. (Volunteers may have added various functions, but not all of them have been merged into the main unit.)

JDEE [] has old features but is numerous, so it seems that some users still use JDEE [] with individual customization.

Malabar-mode

Malabar-mode [] supports Maven 3 and Java 6. The part that supported Maven was big, and this was the package that worked best a decade ago.

This was also dead for a while and the maintainer was replaced to resume development. As a result, the 1.x series was registered with MELPA, and the introduction of the 1.x series was very low.

The next version, 2.0, was also under development, but Malabar-mode [] itself relies on CEDET, and 2.0 requires the latest version of CEDET, so the introduction was quite difficult. Because of that, it often does not work stably, and development is almost stopped now.

Many of the CEDET parsers do not work in some cases. This is due to the specification that CEDET uses Idle time to parse and cache there. Therefore, the completion may not be displayed correctly immediately after opening the file, or it may take a long time to complete. (If there is no analysis result, it starts to analyze on the spot, but it is extremely slow)

Eclim

It was originally a port of writing Java in Vim to Emacs. This is also registered with MELPA, so the introduction has dropped considerably.

A lot of things that were not implemented compared to the original Eclim functions have been implemented. In addition to completion, you can seamlessly use Eclipse-like features such as import optimization, refactoring, and running Ant and Maven tasks. Since it uses the functions of Eclipse, it supports Java 8 as well as generics. However, since Eclim [] still uses the functions of Eclipse itself in Emacs, there may be some parts that you may not understand unless you first know about Eclipse.

As an aside, Visual Studio Code's Java Integration is also Eclipse (JDT) based.

ENSIME (ENJIME)

This is also one of the current favorites. ENSIME [] has a strong image of being a Scala development environment, but it is already Java-enabled and can benefit from Emacs. Another strength is that many build tools are supported. However, the initial setup takes time. This is not a story that takes time and effort. It means that after executing the command, go to a distant shop that makes you want to go occasionally for lunch.

The Java-enabled part is called ENJIME, and even Java files work just by enabling ensime-mode. Complementation works even with Lambda etc., and the operation is light. However, although the basics are the same as ENSIME [], there is no documentation at the moment, so if you try to use it only with Java, there may be some parts that you do not understand.

By the way, I myself have been requested to cooperate by the ENSIME committer.

Realistic choice

At present, if you choose, it seems that you will have two choices: Eclim [] or ENSIME []. However, Eclim [] requires Eclipse and is cumbersome to set up, and it is essentially like starting Eclipse in the background. So why not just use Eclipse? I feel like that.

When it comes to ENJIME, the initial settings for each build tool are a little troublesome for ENJIME as well. It's also a negative impression that the documentation is too lacking and it's too unclear if it can be customized. It may be a downside that it takes a tremendous amount of time to set up.

Development of new development environment

So, in the end, unless you have an environment that suits you (Representative Russo), you have no choice but to develop it. I haven't developed this area from scratch, so it might be a good idea to try it. I thought, I lifted my back and started development. By the way, I'm new to writing a decent elisp.

About design policy

First, let's sort out what you (Representative Russo) want.

--Linkage with Build tool (incorporate settings such as source directory and classpath) --Link with Build tool without using Plugin (I want to reduce the number of settings) --Completion of Class, Method, etc. --Complementary support for generics --When completing, you can check the arguments and return value. --Seamless completion function call (Completion candidates are automatically displayed without pressing the key like IntelliJ) --Can compile --You can be notified if there is a problem with the compilation result --Can run unit tests

If there is something like the one written above, it seems to be practical. So what kind of design should be made to realize it?

Client-server model method

The first major design policy is the client-server model, which reduces the processing on the client (that is, Emacs this time). This method is the result of learning from the mistakes of our predecessors, and ENJIME and others are also such methods. As much as possible, elisp is developed with a policy of specializing only in the display part and not writing a huge elisp.

The reason for not writing elisp as much as possible is as follows.

The problem of poor elisp
elisp processing speed issues
Parallel processing problems

Some madmen try to implement it in elisp only, which is a mistake. Looking at elisp alone, the processing speed is not so bad. However, that is the case for simple processing. Looking at the language specifications and standard functions of elisp, it is hard to say that it is rich, so to describe the processing you want to realize You have to write a reasonable amount of code, and you have to execute that much code. Even if you use the package as a library, the amount of code will be reduced, but the processing itself will not be eliminated. At a moderate execution speed, you will still feel slow. Also, elisp cannot do parallel processing properly. Now that multi-core has become mainstream, this is a pain. Therefore, in elisp, I do not write much logic part and implement the main function on the server side.

Of course, this method also has its disadvantages. The state of the currently written buffer on Emacs exists in Emacs's memory and The point is that it is not visible on the server side. Therefore, parsing cannot be performed in real time. Save it once, reflect it in a file, analyze it by the server, and return the result. You will have to take steps such as. It is also unclear if the saved source file is a syntactically correct source file. In some cases, the source file is halfway It may be reflected. I think this part is a divisible part. This time, when saving the file, the server analyzes the source file and manages the state of the source file on the server. If the source file is not syntactically correct, stop the analysis on the spot and do not reflect the analysis result. Conversely, if you add a new variable, it will not be parsed unless you save it, and it will not be complemented. This method is the same for ENJIME and so on. However, there are some painful parts. It will be very difficult to write if you are complementing with a method chain etc. Therefore, we will add processing to alleviate these problems.

Also, if you adopt the server model, you will be able to freely choose the client. This means that it can be ported to Vim, Atom, and VSCode with less effort. ENSIME [], Eclim [] are exactly this method, and it is very easy to support various editors.

development language

So in what language should you develop your server? I would like to talk about that. Originally, one of the motivations for developing this was the release of the Gradle Tooling API. As the name implies, you can operate Gradle directly via API. Initial setup of ENSIME [] etc. could not be done without applying Gradle Plugin, but such things are no longer necessary. Then there is only one answer and the development language is Java.

Java 8 VS Clojure

Well, there are many languages that run on the JVM for developing in Java, so it might be okay to write in them. I started writing in Clojure to get a feel for the prototype. However, since the startup speed was too slow, I started to make the code part of Clojure thinner and thinner, and eventually Java started to exceed 90%, so I rewrote everything in Java.

The startup option also starts with emphasis on startup speed and memory consumption.

-XX:+UseConcMarkSweepGC -XX:SoftRefLRUPolicyMSPerMB=50 -XX:+TieredCompilation -Xverify:none -Xms256m -Xmx2G

Cooperation with Build Tool

For development in Java, it is necessary to set the classpath. Many projects use some existing Build Tool. It is very kind to read those settings and seamlessly link the classpath settings and output destinations.

Gradle Tooling API

Currently, tools that do not support Gradle will not be looked at. Gradle has an API that allows you to connect to Gradle Daemon from the outside, send commands and perform tasks. This time, we will connect to Gradle on the server side and incorporate settings such as dependency resolution and classpath. The nice thing about this API is that if you have Gradle Wrapper, it will also download Gradle itself. If you commit the Gradle version Wrapper used in the project to the project, the main body will be downloaded automatically and the dependency will be resolved. Etc. can be done automatically.

Maven difficult problem

Maven is still very popular. Maven can also embed the Maven itself in the server. I actually tried this method, but I gave it up because some features didn't get injected well. (The inside is DI, so I'm not sure which class is in which Jar and I'm not sure about the initialization procedure.) Also, in some cases, the plugin may not work depending on the version of Maven. Therefore, the hybrid method of parsing the POM and executing the mvn command is executed internally, and the method of cooperation is adopted.

Class loader problem

By the way, we actually cooperate with the project and perform complementation etc., but we have to think about how to acquire the information of the target class. Java has a reflection API, is that okay?

Reflection slow problem

There is talk of slow field access and method invocation in Reflection, but what about enumerating? For example, I would like to create a list of classes to display a list of classes to be completed. Guava has a class called ClassPath that accesses classpath information, but calling getAllClasses with this is ridiculously slow. It is a factor that slows down the loading on the class loader. With this, smooth complementation is difficult. Also, server-side classes are loaded in the class loader. As a result, unrelated classes that are not in the project may appear as completion candidates.

ASM Reflector

I made my own Reflector to solve the above problem. Since it is not placed in the class loader, it can be a pure candidate for completion in the project. We also take the following approach to solve the speed problem.

Project detection and project compilation
Identify the jar file path from the class path, scan the jar file class file in parallel, and create a class list.
Make a note of which class is in which jar when creating the class list
Identify the jar of the class specified when issuing completion candidates, search inside the jar, and read members etc. with ASM
Put the read result in the memory cache and file cache. If it is a class in the project, also record the checksum of the target source
From the next time, the completion candidate will be read from the cache, and if checksum is changed, it will be read again by ASM.

Huge number of classes

But how many complementary candidates are there? Depending on the JDK, the Java standard alone has about 9000 classes. And if you add the project dependency to this, it will index about tens of thousands of classes. You need to be as parallel as possible to handle these huge numbers quickly.

Meghanada

However, the implementation of the above is Meghanada []. Many are implemented on servers written in Java, but the front elisp is also reasonably large. This client / server model is made with reference to irony-mode. When you open it with meghanada-mode [] in Emacs, the server is started, connected, and commands are exchanged with the server on the Emacs side.

Completion is implemented as a backend for company. Also, the check system for compilation errors is implemented as a checker for flycheck. Meghanada [] is designed so that the Emacs part does not do its own processing as much as possible. Therefore, it can be used in the same operation as the settings of other languages.

Completion triggers are based on IntelliJ. Basically to pick up keywords etc. and start completion You don't have to press a key to complete it.

Ease of introduction

Meghanada [] also considers ease of installation. The server jar file has been uploaded to bintray as a fat jar. If there is no server jar locally on Elisp side, it will be automatically downloaded from bintray. All you have to do is restart Emacs and it will sound ready for use. You don't have to compile it like irony does.

Asynchronous processing

In some cases, there may be heavy processing. In anticipation of that, many are designed to communicate asynchronously. Freezing in the editor is a fatal problem. The completion process uses company. The company has an API that supports asynchronous, and I am trying to prevent it from freezing by using this.

Two connections

Meghanada [] pastes two connections with the server. One is for sending and receiving commands, and one is for streams. Stream is a function to receive output in real time. This is especially used to check console logs for JUnit execution, builds, etc.

To be honest, I don't think it's possible to talk about it without a stream, but the LSP adopted by VSCode doesn't have a stream.

Finally

I briefly introduced Meghanada []. I started writing this project this year and wished I could release it within the year. I rewrote it several times before it was released, but I was able to release the first edition around October.

At present, the source analysis part was implemented independently, but this part is rewritten with the internal compiler API like ENSIME []. I think this will allow full support for lambdas and method references.