Make it possible to search Japanese sentences with ElasticSearch

things to do

Set up a container with docker and input data including Japanese to ElasticSearch so that you can search in Japanese.

Set up a Docker container

Follow the Elasticsearch Official.

Install docker image from Docker Hub.

docker pull

Write docker-compose.yml. Officially, we have three clusters, but this time we will only have one cluster.


version: '2.2'
    container_name: es01
      - cluster.initial_master_nodes=es01
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
        soft: -1
        hard: -1
      - data01:/usr/share/elasticsearch/data
      - 9200:9200
      - elastic
    driver: local

    driver: bridge

Start the container by doing the following in the directory where docker-compose.yml is located.

docker-compose up

Install the Japanese analysis plugin

Install Official Japanese Analysis Plugin (kuromoji). Log in to the container.

docker exec -it Launched container ID/bin/bash 

(↓ on docker)

sudo bin/elasticsearch-plugin install analysis-kuromoji

After installation, exit the docker container and restart docker.

docker restart Launched container ID

Go back to local and do the following:

curl http://localhost:9200/_nodes/plugins?pretty

If you have analysis-kuromoji, it's OK.

 "plugins" : [
          "name" : "analysis-kuromoji",
          "version" : "7.6.2",
          "elasticsearch_version" : "7.6.2",
          "java_version" : "1.8",
          "description" : "The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.",
          "classname" : "org.elasticsearch.plugin.analysis.kuromoji.AnalysisKuromojiPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false

Create index

Since it is running on Elasticsearch 7 this time, there is no type. Use an index that allows you to search for museums.

curl -X PUT http://localhost:9200/museum?pretty

It is OK if the index name created below is displayed after executing ↑.

curl -X GET http://localhost:9200/_cat/indices?v

Input data to index

Input the following data as a sample. Create a json file.


{"pref_id": "13", "city_id": "13101", "name": "National Museum of Modern Art, Tokyo National Museum of Modern Art, Tokyo","location": [139.7547383, 35.6905368]}
{"pref_id": "13", "city_id": "13106", "name": "Ueno Royal Museum", "location": [139.7747384, 35.7127347]}

The museum is the created index name, and the created data is sample.json.

curl -H "Content-Type: application/json" -XPOST http://localhost:9200/museum/_bulk --data-binary @sample.json

It is OK if you can get the data registered below.

curl -XGET http://localhost:9200/facility/_search?pretty

Creating a query

There are several types of document search functions in Elasticsearch (partial match, exact match, etc.), but This time we will use match_phrase. match_phraseCan return data containing the specified clause.


  "query": {"match_phrase": { "name": "National" } } 


Specify the created json and search.

curl -H "Content-Type: application/json" -X GET http://localhost:9200/facility/_search --data-binary @sample_query.json

As a result, you can get the data that meets the conditions. (Partially omitted.)

          "pref_id": "13", 
          "city_id": "13103", 
          "name": "The National Art Center, Tokyo",  
          "location": [139.7263974, 35.6652779]


