Embedded Elasticsearch: Launch ES in test code

How do you unit test and integrate your application with Elasticsearch? I think there are various methods such as creating and injecting an ES client mock, connecting to an ES instance running externally, etc., but it is convenient to use Embedded Elasticsearch only for Java application development. The method once existed. In other words, since ES itself is a Java application, it is a method to start ES Node together on the JVM that executes the test and use it as an ES instance for testing. Although it is for testing, it is a real ES instance that is not a mock, so it was very convenient for both indexing and searching to work normally. Elastic itself also provides a JUnit test template called `ʻESIntegTestCase``, and I written an article a few years ago.

However, since version 5, this Embedded Elasticsearch is becoming more and more difficult to use.

elastic itself declares Embedded Elasticsearch is not supported
Official Plugin is no longer uploaded to Maven Central
Elasticsearch itself has been modularized, but important modules are no longer uploaded.

Since the company also made a test framework using Embedded Elasticsearch in-house and used it thoroughly, when upgrading from ES2.x, one module necessary for testing was added to the in-house Maven repository while whispering with a colleague. The complicated work of registering one by one became necessary.

It seems that there were other people who had similar problems, and when I checked the elastic forum etc., the following alternatives were suggested.

Use Elasticsearch Maven Plugin
Run an ES instance for testing with Docker etc.
There seems to be a way to use Gradle

In short, it's a way to run a test ES instance in conjunction with a build tool. This can be used with this, but Embedded Elasticsearch has the ease of executing test cases directly on the IDE without going through a build tool, and the ease of using programmable control of ES node settings and startup directly from the test code. Many of the benefits of are lost.

So, even after ES5.x, some libraries have been developed to realize something like Embedded Elasticsearch.

```elasticsearch-cluster-runner`` is a library for using Embedded Elasticsearch even after ES5.x. ES modules that have not been uploaded to Maven Central have been uploaded to Maven Central by themselves. My head goes down.

ʻembedded-elasticsearch`` is a slightly different approach. Contrary to the name ʻembedded-elasticsearch``, this library does not run ES on the same JVM as the test. Instead, run the ES instance as an external process with the following steps for each test.

Download the Elasticsearch distribution package from the official repository
Extract the downloaded package
Create a configuration file
Install the plugin
Run bin / elasticsearch to launch an ES instance
Stop the process of ES instance after the test is completed

In other words, you get the same environment as you would normally download and run ES. On the other hand, the ease of use that allows ES node control to be programmable from within the test code is similar to Embedded Elasticsearch.

Each has its advantages and disadvantages

`ʻelasticsearch-cluster-runner``: ** Faster startup ** ・ Higher functionality ・ Multi-node startup by default
`ʻembedded-elasticsearch``: ** Very few dependencies **

Embedded Elasticsearch also needs to start ES on the same JVM, so all the dependencies needed to run ES are added to the classpath. As a result, it may conflict with the dependencies of the application you want to test. On the other hand, `ʻembedded-elasticsearch does not cause the same problem because ES is executed as an external process. ```embedded-elasticsearch itself is very lightweight, relying only on Jackson and some commons libraries.

The front mouth has become long, but in this article I will try using `ʻembedded-elasticsearch`` from JUnit.

Basic usage of `ʻembedded-elasticsearch``

Dependency setting

Add pl.allegro.tech: embedded-elasticsearch to the dependency. The latest version is 2.5.0. The latest version is recommended as the problem on Windows has been fixed recently.

`pom.xml`


  <dependencies>
    ...
    <dependency>
      <groupId>pl.allegro.tech</groupId>
      <artifactId>embedded-elasticsearch</artifactId>
      <version>2.5.0</version>
      <scope>test</scope>
    </dependency>

    <dependency>
      <groupId>org.slf4j</groupId>
      <artifactId>slf4j-simple</artifactId>
      <version>1.7.21</version>
      <scope>test</scope>
    </dependency>

    <dependency>
      <groupId>org.junit.jupiter</groupId>
      <artifactId>junit-jupiter-engine</artifactId>
      <version>5.1.0</version>
      <scope>test</scope>
    </dependency>

  </dependencies>

```embedded-elasticsearchuses SLF4J for logging, but it only contains slf4j-apias a dependency, so if there is no suitable binding for the dependency, add this as well (example above). Then slf4j-simple``). In this example, JUnit5 is used as the test framework, but of course it can be used with JUnit4 and ScalaTest without any problem.

Test code

The test code looks like this.

public class TestEmbeddedES {

  private static EmbeddedElastic es = null;

  @BeforeAll
  public static void startES() throws Exception {
    
    es = EmbeddedElastic.builder()
      .withElasticVersion("6.2.4")
      .withSetting("discovery.zen.ping.unicast.hosts", Collections.emptyList()) // <- (*)
      .build();

    es.start();
  }

  @Test
  void check_es_status() throws IOException {

    System.out.println("HTTP port: " + es.getHttpPort());
    System.out.println("Transport port: " + es.getTransportTcpPort());

    CloseableHttpClient client = HttpClientBuilder.create().build();

    String uri = "http://localhost:" + es.getHttpPort() + "/";
    CloseableHttpResponse res = client.execute(new HttpGet(uri));
    System.out.println(IOUtils.toString(res.getEntity().getContent(), "UTF-8"));
  }

  @AfterAll
  public static void stopES() {
    es.stop();
  }

}

Create an instance of `ʻEmbedded Elastic using the builder before running all the tests ( @ BeforeAll). You can specify the ES version and set it (finally written to config / elasticsearch.yml``) through the builder. The settings indicated by (*) are workarounds that disable Zen discovery for ES. If ES is running on the same machine, the ES node started for testing may accidentally connect without this setting.

ES is started by calling ʻEmbeddedElastic # start () `. The port number for connecting to the started ES can be obtained by calling getHttpPort () or getTransportTcpPort (). In the actual test, I think that you will create a client or set a service to test using these port numbers.

You must remember to call ʻEmbeddedElastic # stop () to stop the ES after all the tests are done ( @ AfterAll).

Basic operation

When the above test is executed, the following log will be output.

[main] INFO p.a.t.e.ElasticSearchInstaller - Downloading https://artifacts.elastic.co/.../elasticsearch-6.2.4.zip to /var/folders/pl/l3n21...
[main] INFO p.a.t.e.ElasticSearchInstaller - Download complete
[main] INFO p.a.t.e.ElasticSearchInstaller - Installing Elasticsearch into /var/folders/pl/l3n21...
[main] INFO p.a.t.e.ElasticSearchInstaller - Done
[main] INFO p.a.t.e.ElasticSearchInstaller - Applying executable permissions on /var/folders/pl/l3n21.../elasticsearch-6.2.4/bin/elasticsearch-plugin
[main] INFO p.a.t.e.ElasticSearchInstaller - Applying executable permissions on /var/folders/pl/l3n21.../elasticsearch-6.2.4/bin/elasticsearch
[main] INFO p.a.t.e.ElasticSearchInstaller - Applying executable permissions on /var/folders/pl/l3n21.../elasticsearch-6.2.4/bin/elasticsearch.in.sh
[main] INFO p.a.t.e.ElasticServer - Waiting for ElasticSearch to start...
[EmbeddedElsHandler] INFO p.a.t.e.ElasticServer - [2018-04-20T00:40:05,110][INFO ][o.e.n.Node               ] [] initializing ...
...

As you can see in the log, `ʻembedded-elasticsearch`` first downloads the specified version of the ES distribution package from the elastic server. When the download is complete, unzip it into a temporary directory, set the file execution attributes, and then execute ES. The downloaded ES distribution package is cached in the temporary directory. If the temporary directory is not cleaned, the download will be skipped from the second time onward.

[main] INFO p.a.t.e.ElasticSearchInstaller - Download skipped
[main] INFO p.a.t.e.ElasticSearchInstaller - Installing Elasticsearch into /var/folders/pl/l3n21...
[main] INFO p.a.t.e.ElasticSearchInstaller - Done
...

However, by default, the ES distribution package is decompressed each time the test class is executed, so it is slower than `ʻelasticsearch-cluster-runner``.

Using plugins

Plugins to use with ES instances can be specified with the withPlugin (...) method of `ʻEmbeddedElastic.Builder``

  ...
  @BeforeAll
  public static void startES() throws Exception {

    es = EmbeddedElastic.builder()
      .withElasticVersion("6.2.4")
      .withPlugin("analysis-phonetic")
      .withPlugin("analysis-icu")
      .withSetting("discovery.zen.ping.unicast.hosts", Collections.emptyList())
      .build();
    ...

If a plugin is specified, the plugin will be installed using the bin / elasticserch-plugin command before starting ES.

...
[main] INFO p.a.t.e.ElasticSearchInstaller - > /var/folders/pl/l3n21.../elasticsearch-6.2.4/bin/elasticsearch-plugin install analysis-phonetic
-> Downloading analysis-phonetic from elastic
[=================================================] 100%   
-> Installed analysis-phonetic
[main] INFO p.a.t.e.ElasticSearchInstaller - > /var/folders/pl/l3n21.../elasticsearch-6.2.4/bin/elasticsearch-plugin install analysis-icu
-> Downloading analysis-icu from elastic
[=================================================] 100%   
-> Installed analysis-icu
...

The problem here is that the ES distribution package is cached once it's downloaded, while the plugin is downloaded every time it's tested. It's very painful to download a large plugin like the ICU plugin for every test, aside from a small plugin like the phonetic analysis plugin.

Unfortunately, the ʻembedded-elasticsearch`` library itself does not currently have a specific solution to this problem. However, since the `` withPlugin (...) `` method also internally executes `` bin / elasticserch-plugin``, the argument of the `` withPlugin (...) `` method is the plugin name. Not only URL and Path of local file can be specified. So, if you want to use a plugin for testing, in reality, you need to implement your own mechanism to download the plugin package and cache it locally before starting ʻEmbedded Elastic``.

So which library should I use?

It depends on the type of ES client used in the application. The Transport client currently relies on ʻelasticsearch-core`` and many other ES modules. Therefore, when the `` transport`` module is added to the dependency, the contents of the classpath will not be much different from when using ```elasticsearch-cluster-runner``. So in this case, I think that ʻelasticsearch-cluster-runner``, which starts up quickly and has high functionality, is a good choice.

On the other hand, if you are using a REST client with few dependencies and you don't want to increase the dependencies too much, `ʻembedded-elasticsearch`` can also be a good option.

However, in writing this article, I re-examined ʻelasticsearch-cluster-runner``, but it is much more sophisticated, and Elasticsearch itself does not have so many dependencies that it would be troublesome to check again, so I'm reminding myself that there are fewer scenes where I should use ʻembedded-elasticsearch. This time, I'm familiar with the article "ʻembedded-elasticsearch, but I'd like to write an article that uses` ʻelasticsearch-cluster-runner``.

Easy JUnit test of Elasticsearch 2018 version with embedded-elasticsearch