Make it possible to search Japanese sentences with ElasticSearch

things to do

Set up a container with docker and input data including Japanese to ElasticSearch so that you can search in Japanese.

Set up a Docker container

Follow the Elasticsearch Official.

Install docker image from Docker Hub.

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.10.1

Write docker-compose.yml. Officially, we have three clusters, but this time we will only have one cluster.

docker-compose.yml


version: '2.2'
services:
  es01:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.10.1
    container_name: es01
    environment:
      - node.name=es01
      - cluster.name=es-docker-cluster
      - cluster.initial_master_nodes=es01
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - data01:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - elastic
  
volumes:
  data01:
    driver: local

networks:
  elastic:
    driver: bridge

Start the container by doing the following in the directory where docker-compose.yml is located.

docker-compose up

Install the Japanese analysis plugin

Install Official Japanese Analysis Plugin (kuromoji). Log in to the container.

docker exec -it Launched container ID/bin/bash 

(↓ on docker)

sudo bin/elasticsearch-plugin install analysis-kuromoji

After installation, exit the docker container and restart docker.

docker restart Launched container ID

Go back to local and do the following:

curl http://localhost:9200/_nodes/plugins?pretty

If you have analysis-kuromoji, it's OK.

(Omitted)
 "plugins" : [
        {
          "name" : "analysis-kuromoji",
          "version" : "7.6.2",
          "elasticsearch_version" : "7.6.2",
          "java_version" : "1.8",
          "description" : "The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.",
          "classname" : "org.elasticsearch.plugin.analysis.kuromoji.AnalysisKuromojiPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        }
(Omitted)

Create index

Since it is running on Elasticsearch 7 this time, there is no type. Use an index that allows you to search for museums.

curl -X PUT http://localhost:9200/museum?pretty

It is OK if the index name created below is displayed after executing ↑.

curl -X GET http://localhost:9200/_cat/indices?v

Input data to index

Input the following data as a sample. Create a json file.

sample.json


{"pref_id": "13", "city_id": "13101", "name": "National Museum of Modern Art, Tokyo National Museum of Modern Art, Tokyo","location": [139.7547383, 35.6905368]}
{"pref_id": "13", "city_id": "13106", "name": "Ueno Royal Museum", "location": [139.7747384, 35.7127347]}

The museum is the created index name, and the created data is sample.json.

curl -H "Content-Type: application/json" -XPOST http://localhost:9200/museum/_bulk --data-binary @sample.json

It is OK if you can get the data registered below.

curl -XGET http://localhost:9200/facility/_search?pretty

Creating a query

There are several types of document search functions in Elasticsearch (partial match, exact match, etc.), but This time we will use match_phrase. match_phraseCan return data containing the specified clause.

sample_query.json


{
  "query": {"match_phrase": { "name": "National" } } 
}

Search

Specify the created json and search.

curl -H "Content-Type: application/json" -X GET http://localhost:9200/facility/_search --data-binary @sample_query.json

As a result, you can get the data that meets the conditions. (Partially omitted.)


{
  "took":324,
  "timed_out":false,
  "_shards":
    {
      "total":1,
      "successful":1,
      "skipped":0,
      "failed":0
    },
  "hits":
    {
      "_index":"museum",
      "_type":"_doc",
      "_id":"1",
      "_source":
        {
          "pref_id": "13", 
          "city_id": "13103", 
          "name": "The National Art Center, Tokyo",  
          "location": [139.7263974, 35.6652779]
        }
     }
}

reference

https://www.elastic.co/guide/jp/elasticsearch/reference/current/gs-executing-searches.html https://blog.chocolapod.net/momokan/entry/114

Recommended Posts

Make it possible to search Japanese sentences with ElasticSearch
Make it possible to handle the camera with old notebook + Ubuntu 18.04 LTS (droidcam)
Make Docker confusing with Pokemon and make it easier to attach
Since the Rspec command is troublesome, I tried to make it possible to execute Rspec with one Rake command
It may have been possible to introduce bootstrap to rails with this
Create a program to post to Slack with GO and make it a container
Introduction to algorithms with java --Search (depth-first search)
Introduction to algorithms with java --Search (breadth-first search)
I tried to make Basic authentication with Java
Introduction to algorithms with java --Search (bit full search)
Introduction to algorithms with java-Search (Full search, Binary search)
How to search multiple columns with gem ransack
Let's make a search function with Rails (ransack)
Create assert_equal to make it easy to write tests
Easy to make LINE BOT with Java Servlet