[PYTHON] I tried playing with PartiQL and MongoDB connected

Prerequisites

This article describes what I researched about PartiQL. I think that there are deficiencies such as omissions in the content, so in that case I would appreciate it if you could point out with Edit Request or comment. This article was written as of November 7, 2019. PartiQL has just been announced, and I think that the implementation will continue to progress and the description may not be appropriate, so please forgive that point.

What is PartiQL

PartiQL is a SQL compatible language published by Amazon as Open Source. As a feature, it can have a wide range of DB forms to be controlled, and it seems that it can handle nested data such as JSON in addition to RDB.

For the actual operation, please refer to Tutorial. If you know SQL quite well, you can operate it intuitively. At this point, the Update / delete / Insert statement doesn't seem to be implemented.

I tried connecting with MongoDB.

Here is the main issue. This PartiQL uses PartQL instead of adding its own language as a query language for the API when adding search and statistical processing functions in the service ([SPL] in Splunk (https://docs.splunk.com) /Documentation/Splunk/8.0.0/SearchReference/UnderstandingSPLsyntax)) There seems to be a use case. So, although it is still an alpha version, I connected it to MongoDB and actually tried it.

First, install PartiQL. If you look at Getting Start on the official website, you can install it without any problems. In addition, JRE is required as a prerequisite. This has been confirmed to work with OpenJRE9.

You can try PartiQL in REPL format by executing the command and interactively executing the user's input.

$ ./bin/partiql
Welcome to the PartiQL REPL!
PartiQL> 

However, it seems that no further implementation has been done at this stage, and there is no particular connector with MongoDB.

Therefore, in this article, we will use Python to create a PoC that outputs the result of executing partiql from the Subprocess module, using the PartiQL input query from the user and the data acquired from the DB in the Pymongo module as arguments.

For those who want to know only the results below

The following is the actual PoC prototype. It takes information from MongoDB and displays the result of PariQL query search. (No error handling is performed)

partiQLCtl.py


import subprocess
import json
import os
import pymongo

class PartiQLCtl():
    def __init__(self, address=None, port=None):
        # address,If there is no description of port, search by direct input of search source data JSON
        if (address is not None) and (port is not None):
            self.client = pymongo.MongoClient(address, port)
        else:
            self.client = None

    def load_json_data(self, json_data):
        #Search source data JSON Function for direct input
        self.env_data = self.convert_json_to_partiql(json_data)
        print(self.env_data)

    def load_db_data(self, db, collection):
        #Extract search source data from MongoDB
        cur = self.client[db][collection].find({})

        #DB output result is data['record']Save as list below
        data = {"record": []}
        for record in cur:
            # _id String processing
            record['_id'] = str(record['_id'])
            data["record"].append(record)

        self.env_data = self.convert_json_to_partiql(data)
        
    def execute(self, query):
        #Function for executing a query

        # -Dummy data for the i option. It is necessary to create a file in advance. The contents are"{}"Only the two-letter description of
        ion_file_path = os.path.join(os.path.dirname(__file__), '../tmp/tmp.ion')
        # -Source data for the e option. Create a file in this script. Change location as needed
        env_file_path = os.path.join(os.path.dirname(__file__), '../tmp/env.ion')
        #Executable file. Change location as needed
        partiql_execute_path = os.path.join(os.path.dirname(__file__),'../dist/partiql/bin/partiql')

        with open(env_file_path, 'w') as f:
            f.write(self.env_data)

        res = subprocess.check_output([partiql_execute_path, '-q', query, '-i', ion_file_path, '-e', env_file_path])
        return res
    
    def convert_json_to_partiql(self, json):
        #A function that changes JSON to the data format for PartiQL. list/dict/boolean/int/Supports str type
        if type(json) is list:
            env = "<<"
            for idx, elem in enumerate(json):
                if (type(elem) is dict) or (type(elem) is list):
                    env += (self.convert_json_to_partiql(elem))
                elif type(elem) == str:
                    env += "'{}'".format(elem)
                elif elem is None:
                    env += "null"
                elif elem is True:
                    env += "true"
                elif elem is False:
                    env += "false"
                else:
                    env += str(elem)

                if idx != len(json) - 1:
                    env += ', '
                
            env += '>>'
        elif type(json) is dict:
            env = '{'
            for idx, elem in enumerate(json.keys()):
                if (type(json[elem]) is dict) or (type(json[elem]) is list):
                    env += "'{}': {}".format(elem, self.convert_json_to_partiql(json[elem]))
                elif type(json[elem]) == str:
                    env += "'{}': '{}'".format(elem, json[elem])
                elif json[elem] is None:
                    env += "'{}': null".format(elem)
                elif json[elem] is True:
                    env += "'{}': true".format(elem)
                elif json[elem] is False:
                    env += "'{}': false".format(elem)
                else:
                    env += "'{}': {}".format(elem, str(json[elem]))
                
                if idx != len(json.keys()) - 1:
                    env += ', '
            env += '}'

        return env


if __name__ == '__main__':
    pql = PartiQLCtl("192.168.1.10", 27017)
    pql.load_db_data("test", "test")
    print(pql.execute("select r.id from record r"))

In addition, the following data is stored in MongoDB in advance.

> use test
switched to db test
> db.test.insert({"id": "aa", "setting": [{"config1": "hoge", "config2": "fuga"}]})
WriteResult({ "nInserted" : 1 })
> db.test.insert({"id": "bb", "setting": [{"config1": "hoge2", "config2": "fuga2"}]})

The execution result is as follows.

$ python partiQlCtl.py 
b"<<\n  {\n    'id': 'aa'\n  },\n  {\n    'id': 'bb'\n  }\n>>"

Try to see that the display data is changed by changing the last pql.execute argument (PartiQl query).

Trial and error until PoC creation

When you execute the help of partiql, it seems that you can also execute it as a command other than REPL as shown below.

$ ./bin/partiql -h
PartiQL CLI
Command line interface for executing PartiQL queries. Can be run in an interactive (REPL) mode or non-interactive.

Examples:
To run in REPL mode simply execute the executable without any arguments:
     partiql

In non-interactive mode we use Ion as the format for input data which is bound to a global variable 
named "input_data", in the example below /logs/log.ion is bound to "input_data":
     partiql --query="SELECT * FROM input_data" --input=/logs/log.ion

The cli can output using PartiQL syntax or Ion using the --output-format option, e.g. to output binary ion:
     partiql --query="SELECT * FROM input_data" --output-format=ION_BINARY --input=/logs/log.ion

To pipe input data in via stdin:
     cat /logs/log.ion | sqlcli --query="SELECT * FROM input_data" --format=ION_BINARY > output.10n

Option                                Description                                                
------                                -----------                                                
-e, --environment <File>              initial global environment (optional)                      
-h, --help                            prints this help                                           
-i, --input <File>                    input file, requires the query option (default: stdin)     
-o, --output <File>                   output file, requires the query option (default: stdout)   
--of, --output-format <OutputFormat:  output format, requires the query option (default: PARTIQL)
  (ION_TEXT|ION_BINARY|PARTIQL)>                                                                 
-q, --query <String>                  PartiQL query, triggers non interactive mode 

It seems that you can execute a query search on the shell by executing the following command.

partiql --query="SELECT * FROM input_data" --input=/logs/log.ion

There is a statement here that --input can only use the ION format. For those who don't know here, ION format is a data serialization format created by Amazon, and is a JSON-like Text format description. Besides, it is written in binary format. Module in Python is also provided, so you can convert JSON ⇔ ION by using this.

>>> import amazon.ion.simpleion as ion
>>> 
>>> obj = ion.loads('{"id" : "aa", "setting" : [ { "config1" : "hoge", "config2" : "fuga" } ] }')
'$ion_1_0 {id:"aa",setting:[{config1:"hoge",config2:"fuga"}]}'

When you actually save the output result as test.ion and execute it, it will be as follows.

$ cat test.ion 
$ion_1_0 {id:"aa",setting:[{config1:"hoge",config2:"fuga"}]}
$ ./bin/partiql -q "select * from input_data" -i test.ion 
<<
  {
    'id': 'aa',
    'setting': [
      {
        'config1': 'hoge',
        'config2': 'fuga'
      }
    ]
  }
>>

However, when I tried a few things here, when I entered nested data in the from clause, an unexpected event occurred in which the output result was blank. Here, I have given up on pursuing this method.

$ ./bin/partiql -q "select * from input_data.setng" -i test.ion  
<<
  {}
>>

However, exchanging the output result of REPL by connecting PIPE with Python's Subprocess Popen is not very clean as a code, so we will consider another method.

There is a method to input the information of the search source of the input here with the -e option other than the -i option. (In this case, the data to be input will be PartiQL original format data, so it will be necessary to convert to JSON⇔PariQL original format.) However, with the -q option and -e option, input information is waited for from standard input, so when executed, it will be in the user input waiting state. Finally, I achieved the original purpose by having the -i option read dummy data as a dummy.

Recommended Posts

I tried playing with PartiQL and MongoDB connected
I tried playing with the image with Pillow
I tried Jacobian and partial differential with python
I tried function synthesis and curry with python
I tried playing mahjong with Python (single mahjong edition)
I tried playing with the calculator on tkinter
[Introduction to AWS] I tried porting the conversation app and playing with text2speech @ AWS ♪
I tried to read and save automatically with VOICEROID2 2
I tried to implement and learn DCGAN with PyTorch
I tried to automatically read and save with VOICEROID2
I tried to implement Grad-CAM with keras and tensorflow
[Introduction to AWS] I tried playing with voice-text conversion ♪
I tried scraping with Python
I tried clustering with PyCaret
I tried gRPC with Python
I tried scraping with python
I tried to predict and submit Titanic survivors with Kaggle
I tried connecting Raspberry Pi and conect + with Web API
[Introduction to infectious disease model] I tried fitting and playing ♬
I tried to make GUI tic-tac-toe with Python and Tkinter
I displayed the chat of YouTube Live and tried playing
I tried trimming efficiently with OpenCV
I tried machine learning with liblinear
I tried web scraping with python.
I tried moving food with SinGAN
I tried implementing DeepPose with PyTorch
I tried face detection with MTCNN
I played with PyQt5 and Python3
I tried running prolog with python 3.8.2.
I tried SMTP communication with Python
I tried sentence generation with GPT-2
I tried learning LightGBM with Yellowbrick
I tried face recognition with OpenCV
I tried to visualize bookmarks flying to Slack with Doc2Vec and PCA
[System trade] I tried playing with Stochastic Oscillator by decomposing with python ♬
I tried natural number expression and arithmetic processing only with list processing
I tried to make a periodical process with Selenium and Python
I tried using PyEZ and JSNAPy. Part 4: Automate ISP setup with PyEZ and JSNAPy
I tried to create Bulls and Cows with a shell program
I tried to easily detect facial landmarks with python and dlib
I tried multiple regression analysis with polynomial regression
I tried sending an SMS with Twilio
I tried using Amazon SQS with django-celery
I tried to implement Autoencoder with TensorFlow
I tried linebot with flask (anaconda) + heroku
I tried to visualize AutoEncoder with TensorFlow
I installed DSX Desktop and tried it
I tried to get started with Hy
I tried to express sadness and joy with the stable marriage problem.
I tried scraping Yahoo News with Python
I tried playing a ○ ✕ game using TensorFlow
I tried using Selenium with Headless chrome
I tried factor analysis with Titanic data!
I tried learning with Kaggle's Titanic (kaggle②)
I tried sending an email with python.
I tried non-photorealistic rendering with Python + opencv
I tried to convert datetime <-> string with tzinfo using strftime () and strptime ()
I tried Linux (CentOS 7) life and death monitoring (Ping) with monitoring server Zabbix
I tried a functional language with Python
I tried to learn the angle from sin and cos with chainer
I tried batch normalization with PyTorch (+ note)