[PYTHON] Problems when using Elasticsearch as a data source in Redash

It seems that Elasticsearch can be used as a data source for Redash, so when I tried it, I found that there were various problems.

environment

Redash(v7.0.0) Elasticsearch(v7.3.2)

problem

** Cannot format query results from Elasticsearch correctly (expected) in tabular format. ** **

When Query is executed from Redash, the query is issued and the result is displayed via QueryRunner as shown below.

Execute Query from the screen → QueryRunner → Elasticsearch → QueryRunner → Display the result in tabular format on the screen

** Since I was able to confirm that the query results from Elasticsearch were as expected, I found that the problem this time was that the query results from Elasticsearch could not be parsed tabularly with QueryRunner. ** **

Results of executing multiple Aggregation queries

Execute a query that nests Aggregation as shown below.

{
    "index": "Index name",
    "size": 1000,
    "aggs": {
        "1st": {
            "terms": {
                "field": "LogDate"
            },
            "aggs": {
                "2nd": {
                    "terms": {
                        "field": "SectionName"
                    }
                }
            }
        }
    }
}

As an image, we will aggregate by date and department, so expect the following results.

LogDate SectionName number
October 29, 2019 System department 10
October 29, 2019 Human Resources Department 20
October 29, 2019 General Affairs Department 30
October 30, 2019 System department 100
October 30, 2019 Human Resources Department 200
October 30, 2019 General Affairs Department 300

However, what is displayed as a result of Table in Redash is as follows. → I don't know what the total is and how many.

2nd doc_count
(Some value of SectionName) 110
(Some value of SectionName) 220
(Some value of SectionName) 330

Case where the structure of JSON becomes complicated

For example, when Sum Bucket Aggregation is used, the query result from Elasticsearch shows that another bucket exists in the same hierarchy as the root of the bucket, as shown below.

{
・
・
・
   "aggregations": {
      "Bucket name": {
         "buckets": [
            {
Bucket result
            }
         ]
      },
      "SUM BUCKET name": {
          "value": 1000.0
      }
   }
}

Results are displayed in tabular format only for "bucket name"

How to respond

To solve these problems -Modify the existing source of Redash or add an extension program to support the program. For the support of the extension program, please refer to "Extending Redash QueryRunner". ・ Create ISSUE and wait for update (or move by yourself)

Summary

Perhaps when Elasticsearch is used as the data source, Kibana is often selected, or it is rarely used because of the behavior of the problem that occurred (it will definitely be the query pattern).

"If your data source is Elasticsearch, don't use Redash." "If you use Redash, don't make your data source Elasticsearch, if you do, just a simple query."

I felt that it would be more efficient to realize it in this way.

Digression

I wondered if the result would change if Kibana was used as the data source, but the result was the same because it was only processed by the description of the query for Kibana.

The data source was not Kibana, and the query was just for Kibana.

Execute Query from the screen → QueryRunner → Kibana (Elasticsearch) → QueryRunner → Display the result in tabular format on the screen

Recommended Posts

Problems when using Elasticsearch as a data source in Redash
Tips for using ElasticSearch in a good way
Create a data collection bot in Python using Selenium
A story that stumbled when using pip in a proxy environment
A memo when creating a directed graph using Graphviz in Python
Obtain OTU (microorganism) count data as a text file using QIIME2
Scribble what I used when using ipython in formatting pos data
I get a can't set attribute when using @property in python
How to update a Tableau packaged workbook data source using Python
Try using Elasticsearch as the foundation of a question answering system
A memorandum when using beautiful soup
Precautions when using pit in Python
Try throwing a query in Redash
When using regular expressions in Python
When writing a program in Python
Check points when MIDI does not work in a program using SDL_mixer
A useful note when using Python for the first time in a while
Knowledge when making a bot using discord.py
Precautions when pickling a function in python
Precautions when using for statements in pandas
Get Youtube data in Python using Youtube Data API
Scraping a website using JavaScript in Python
Draw a tree in Python 3 using graphviz
SELECT data using client library in BigQuery
A memorandum of trouble when formatting data
Execute raw SQL using python data source with redash and display the result
Save the setting conditions as a CSV file using UDF Manager in OCTA