A story that makes it easy to estimate the living area using Elasticsearch and Python

Overview

By registering the location information of the action history in Elasticsearch, I will talk about how it can be used in a nice way and also visualized in a nice way.

Prerequisite knowledge

logo-elastic.png

I will use Elasticsearch this time, so I will briefly introduce it. Elasticsearch is one of the full-text search engines often compared to Apache Solr. It is schema-free and all inputs and outputs are REST & JSON. It is also implemented in Java.

--For details, Introduction and features of Elasticsearch

Installation is easy with yum or brew. Please check according to the environment you want to use. By the way, elasticsearch-head, which is a GUI plug-in of Elasticsearch, is convenient, so it is good to put it together.

Elasticsearch settings

After starting Elasticsearch, set the index (like a table in the database) to use. For that purpose, first create the index mapping method with json. This time, it is assumed that there is a log of the following data set.

sample_log


{
  "id":1,
  "uuid":"7ef82126c32c55f7272d5ca5dd5e40c6",
  "time":"2015-12-03T04:21:01.641Z",
  "lat":35.658546,
  "lng":139.729281,
  "accuracy":47.126048
}

Set the type for each field so that such a dataset can be mapped well. This time, I set the following mapping. It is an image that the mapping of the type of dataset called geo is set.

geo_mapping.json


{
  "geo" : {
    "properties" : {
      "id" : {
        "type" : "integer"
      },
      "uuid" : {
        "type" : "string",
        "index" : "not_analyzed"
      },
      "time" : {
        "type" : "date",
        "format" : "date_time"
      },
      "location" : {
        "type" : "geo_point"
      },
      "accuracy" : {
        "type" : "double"
      }
    }
  }
}

To explain a little, the uuid field is set to `` `not_analyzed``` because it is a unique value and you do not want to morphologically analyze it. Also important this time is the type of the location field. geo_point is the type provided by Elasticsearch, and the longitude and latitude are registered as a set. You can use it by doing. By setting the type of this field, you can perform a convenient search. More on that later.

Once you have created the mapping settings, use it to create the index. The index name is test_geo this time. If you throw the following curl while Elasticsearch is running, the creation is completed.

Creating an index


curl -XPOST 'localhost:9200/test_geo' -d @geo_mapping.json

Data registration

Assuming that you have the data as a log file, register the data in the index created from the log file. There is an official python client this time, so let's use it.

--Official: [elasticsearch-py] (https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/index.html)

It's easy to install with pip.

Installation


pip install elasticsearch

The program to be registered using this is as follows.

regist_es.py


import json
import sys
from elasticsearch import Elasticsearch

es = Elasticsearch()
index = "test_geo"
doc_type = "geo"

f = open('var/logs.json', 'r')

_line = f.readline()
while _line:
    data = json.loads(_line)
    _line = f.readline()
f.close()

for value in data:
    lat = value['lat']
    lon = value['lng']
    del value['lat']
    del value['lng']
    value.update({
        "location" : {
            "lat" : lat,
            "lon" : lon
        }
    })
    es.index(index=index, doc_type=doc_type, body=value, id=value['id'])

The point to note is that `lat``` and lon``` are combined into ``` location``` in order to make it suitable for `` geo_point```. .. If you execute this program, the action history information in logs.json will be registered in Elasticsearch. It's very easy.

Visualize with Kibana

Now that the data has been registered, it's time to boil or bake. Kibana is an official visualization tool that visualizes the data registered in Elasticsearch. Kibana4 is out now, so I think it's a good idea to get the latest version. Once you get it, just run ./bin/kibana and the HTTP server will start on port 5601. Detailed setting method, etc.. After actually starting it, you can set the dashboard by accessing it with a suitable browser. By playing around with it, you can easily create a heat map like the one below.

heatmap.png

histgram.png

Living area estimation

Since I registered the data with much effort, I will try using it. This time, this paper (Information recommendation system using location information for mobile terminals) I will try to estimate the living area. Since the location information is registered with `` `geo_type```, the following query such as acquiring data within a few kilometers from a specific location can be thrown.

python


query = {
    "from":0,
    "query": {
        "filtered" : {
            "query" : {
                "simple_query_string" : {
                    "query" : uuid,
                    "fields" : ["uuid"],
                }
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : 10 + 'km',
                    "geo.location" : {
                        "lat" : lat,
                        "lon" : lon
                    }
                }
            }
        }
    }
}

The results of living area estimation using this are as follows.

Sample result


Stage1
35.653945 , 139.716692
radius(km): 5.90

Stage2
35.647367 , 139.709346
radius(km): 1.61

When using the center of gravity
35.691165 , 139.709840
radius(km): 8.22

Noon: (104)
35.696822 , 139.708228
radius(km): 9.61
Night: (97)
35.685100 , 139.711568
radius(km): 6.77

Nearest station(Noon):Higashi Shinjuku
Nearest station(Night):Shinjuku Gyoenmae

Impressions

Elasticsearch is very easy to set up and easy to use. Elasticsearch is amazing. Kibana is amazing. Moreover, there seems to be a new product (Beats). It depends on the amount of data, but it is convenient to register with Elasticsearch for the time being. Logs can be automatically registered with fluentd, so it seems that you can do various things by combining them.

_ Difficult to write articles ... _

Recommended Posts

A story that makes it easy to estimate the living area using Elasticsearch and Python
A script that makes it easy to create rich menus with the LINE Messaging API
A story that makes it easier to see Model debugging in the Django + SQLAlchemy environment
[Python] I wrote a test of "Streamlit" that makes it easy to create visualization applications.
An easy way to view the time taken in Python and a smarter way to improve it
A story that struggled to handle the Python package of PocketSphinx
Create a web page that runs a model that increases the resolution of the image using gradio, which makes it easy to create a web screen
I tried to make a site that makes it easy to see the update information of Azure
[Python] A program that finds the minimum and maximum values without using methods
Prepare a development environment that is portable and easy to duplicate without polluting the environment with Python embeddable (Windows)
A story that Seaborn was easy, convenient and impressed
A Python script that crawls RSS in Azure Status and posts it to Hipchat
A story that was convenient when I tried using the python ip address module
A story that got stuck when trying to upgrade the Python version on GCE
A story about a Python beginner trying to get Google search results using the API
It is easy to execute SQL with Python and output the result in Excel
Try to find the probability that it is a multiple of 3 and not a multiple of 5 when one is removed from a card with natural numbers 1 to 100 using Ruby and Python.
Try to make it using GUI and PyQt in Python
I made a tool that makes it a little easier to create and install a public key.
Python error messages are specific and easy to understand "ga" (before that, a colon (:) and a semicolon (;))
When writing to a csv file with python, a story that I made a mistake and did not meet the delivery date
Recursively get the Excel list in a specific folder with python and write it to Excel.
Build a Python environment and transfer data to the server
How to write a metaclass that supports both python2 and python3
I wrote a class that makes it easier to divide by specifying part of speech when using Mecab in python
[Python] A program to find the number of apples and oranges that can be harvested
[Python] A story that seemed to fall into a rounding trap
[Python] How to scrape a local html file and output it as CSV using Beautiful Soup
The story that I set transparent proxy and it worked for some reason without a certificate
Find the white Christmas rate by prefecture with Python and map it to a map of Japan
[Python] The role of the asterisk in front of the variable. Divide the input value and assign it to a variable
[Python] I tried to make a simple program that works on the command line using argparse.
A story that didn't work when I tried to log in with the Python requests module
The story that the version of python 3.7.7 was not adapted to Heroku
Regular expressions that are easy and solid to learn in Python
The story that a hash error came out when using Pipenv
Process Splunk execution results using Python and save to a file
How easy is it to synthesize a drug on the market?
[Ev3dev] Create a program that captures the LCD (screen) using python
Try to write a program that abuses the program and sends 100 emails
A script that returns 0, 1 attached to the first Python prime number
A quick guide to PyFlink that combines Apache Flink and Python
Convert the result of python optparse to dict and utilize it
[python] A note that started to understand the behavior of matplotlib.pyplot
The story of making a module that skips mail with python
[Python] A program that rotates the contents of the list to the left
The story of Python and the story of NaN
[Python / Jupyter] Translate the comment of the program copied to the clipboard and insert it in a new cell
I ran GhostScript with python, split the PDF into pages, and converted it to a JPEG image.
The story of making a tool to load an image with Python ⇒ save it as another name
[Python] How to save the installed package and install it in a new environment at once Mac environment
How to input a character string in Python and output it as it is or in the opposite direction.
Python patterns that have been released to the world and have been scrutinized later
A story that visualizes the present of Qiita with Qiita API + Elasticsearch + Kibana
How to divide and process a data frame using the groupby function
[Introduction to Python] What is the difference between a list and a tuple?
[Python] A program that calculates the number of socks to be paired
A story about trying to connect to MySQL using Heroku and giving up
A python program that resizes a video and turns it into an image
Estimate the probability that a coin will appear on the table using MCMC
I tried to extract and illustrate the stage of the story using COTOHA