[PYTHON] How to use JDBC driver with Redash

Redash supports many data sources by default. Furthermore, even if some data sources are not supported, you can add new data sources by writing query_runner in Python. You can write query_runner straightforwardly if you have a Python library to connect to the target data source, but some DBs may only provide JDBC drivers. Since the JDBC driver can only be called from the JVM language, it cannot be directly incorporated into redash's query_runner. So I devised a way to incorporate the JDBC driver into redash, and this article will show you how to do it.

Library to use

This time it is a library for calling JDBC driver from Python. Use JayDeBeApi. https://github.com/baztian/jaydebeapi

This library uses JPype under the CPython environment to bridge between Python and Java. https://github.com/jpype-project/jpype

There is Py4J in the library for calling Java from Python, and there is an introductory article on Qiita, but in comparison, JPype seems to be lighter because it does not need to start a Java process.

https://www.py4j.org/ https://qiita.com/riverwell/items/e90cbbfdac439e6e9d30

Incorporate JDBC driver into redash

This time we will embed the JDBC driver in redash v5.0.2. The JDBC driver to be included is from MySQL, and you can download it from the following URL. https://dev.mysql.com/downloads/connector/j/5.1.html

Since redash supports MySQL as standard, there is no need to forcibly incorporate the JDBC driver, but since it is a widely popular DB, it is used for explanation.

jdbc.py Next, write query_runner. How to write query_runner is well organized in the following articles. https://discuss.redash.io/t/creating-a-new-query-runner-data-source-in-redash/347

I created the following query_runner by referring to the above article.

import json
import logging
import os
import jaydebeapi

from redash.query_runner import *
from redash.settings import parse_boolean
from redash.utils import JSONEncoder

logger = logging.getLogger(__name__)
types_map = {
    jaydebeapi.STRING: TYPE_STRING,
    jaydebeapi.TEXT: TYPE_STRING,
    jaydebeapi.NUMBER: TYPE_INTEGER,
    jaydebeapi.FLOAT: TYPE_FLOAT,
    jaydebeapi.DECIMAL: TYPE_FLOAT,
    jaydebeapi.DATE: TYPE_DATE,
    jaydebeapi.DATETIME: TYPE_DATETIME,
    jaydebeapi.ROWID: TYPE_STRING,
}

class Jdbc(BaseSQLQueryRunner):
    noop_query = "SELECT 1"

    @classmethod
    def configuration_schema(cls):
        schema = {
            'type': 'object',
            'properties': {
                'host': {
                    'type': 'string',
                },
                'port': {
                    'type': 'number',
                    'default': 3306,
                },
                'user': {
                    'type': 'string'
                },
                'password': {
                    'type': 'string',
                },
                'database': {
                    'type': 'string',
                },
            },
            "order": ['host', 'port', 'user', 'password', 'database'],
            'required': ['host', 'port', 'user', 'database'],
            'secret': ['password']
        }

        return schema

    @classmethod
    def name(cls):
        return "jdbc"

    @classmethod
    def enabled(cls):
        try:
            import jaydebeapi
        except ImportError:
            return False

        return True

    def _get_tables(self, schema):
        query = """
        SELECT col.table_schema,
               col.table_name,
               col.column_name
        FROM `information_schema`.`columns` col
        WHERE col.table_schema NOT IN ('information_schema', 'performance_schema', 'mysql');
        """

        results, error = self.run_query(query, None)
        if error is not None:
            raise Exception("Failed getting schema.")

        results = json.loads(results)

        for row in results['rows']:
            if row['TABLE_SCHEMA'] != self.configuration['database']:
                table_name = u'{}.{}'.format(row['TABLE_SCHEMA'], row['TABLE_NAME'])
            else:
                table_name = row['TABLE_NAME']

            if table_name not in schema:
                schema[table_name] = {'name': table_name, 'columns': []}

            schema[table_name]['columns'].append(row['COLUMN_NAME'])

        return schema.values()

    def run_query(self, query, user):
        import jaydebeapi

        connection = None
        try:
            host = self.configuration.get('host', '')
            port = self.configuration.get('port', '')
            user = self.configuration.get('user', '')
            password = self.configuration.get('password', '')
            database = self.configuration.get('database', '')

            jclassname = 'com.mysql.jdbc.Driver'
            url = 'jdbc:mysql://{}:{}/{}'.format(host, port, database)
            driver_args = {'user': user, 'password': password}
            jar_path = '/app/redash/query_runner/mysql-connector-java-5.1.47.jar'
            connection = jaydebeapi.connect(jclassname, url, driver_args, jar_path)

            cursor = connection.cursor()
            logger.info("JDBC running query: %s", query)
            cursor.execute(query)

            data = cursor.fetchall()

            if cursor.description is not None:
                columns = self.fetch_columns([(i[0], types_map.get(i[1], None)) for i in cursor.description])
                rows = [dict(zip((c['name'] for c in columns), row)) for row in data]

                data = {'columns': columns, 'rows': rows}
                json_data = json.dumps(data, cls=JSONEncoder)
                error = None
            else:
                json_data = None
                error = "No data was returned."

            cursor.close()
        except jaydebeapi.Error as e:
            json_data = None
            error = str(e.message)
        except KeyboardInterrupt:
            cursor.close()
            error = "Query cancelled by user."
            json_data = None
        finally:
            if connection:
                connection.close()

        return json_data, error

register(Jdbc)

I think that most of the processing except configuration_schema can be used in common regardless of which JDBC driver is used.

Arrangement of various files

Place the jdbc.py created in the above procedure and the JDBC driver jar file in redash / query_runner.

Dockerfile Finally, create a Docker container to run Redash. Since JRE is required for JPype to work, patch the redash standard Dockerfile. Also, create requirements.txt for installing jaydebeapi.


FROM redash/base:latest

#Added JRE installation
RUN apt-get update && apt-get install -y openjdk-8-jre

COPY redash/requirements.txt redash/requirements_dev.txt redash/requirements_all_ds.txt requirements_jaydebeapi.txt ./
RUN pip install -r requirements.txt -r requirements_dev.txt -r requirements_all_ds.txt -r requirements_jaydebeapi.txt

COPY . ./
RUN npm install && npm run build && rm -rf node_modules
RUN chown -R redash /app
USER redash

ENTRYPOINT ["/app/bin/docker-entrypoint"]
CMD ["server"]

requirements.txt


JayDeBeApi==1.1.1
JPype1==0.6.3

Start redash

At this point, you can start redash. In addition, in order to load the added query_runner into redash, it is necessary to add redash.query_runner.jdbc to the REDASH_ADDITIONAL_QUERY_RUNNERS environment variable and start it.

If you start redash and look at the DataSource screen, you can see that a data source called JDBC has been added.

スクリーンショット 2018-11-29 14.28.28.png

At the moment, it looks a bit disappointing because there is no icon image showing the data source. To put an image in this part, place the image jdbc.png in client / app / assets / images / db-logos and build the container again.

Execute a query

Now that we have registered the JDBC driver as query_runner, let's run the query.

スクリーンショット 2018-11-29 14.30.45.png

Summary

By using JayDeBeApi I was able to use the JDBC driver with redash. With this, the variety of redash data sources has expanded, and you can have a wonderful BI life.

Recommended Posts

How to use JDBC driver with Redash
Python: How to use async with
How to use virtualenv with PowerShell
How to use FTP with Python
How to use ManyToManyField with Django's Admin
How to use OpenVPN with Ubuntu 18.04.3 LTS
How to use Cmder with PyCharm (Windows)
How to use Ass / Alembic with HtoA
How to use Japanese with NLTK plot
How to use jupyter notebook with ABCI
How to use SQLAlchemy / Connect with aiomysql
How to use xml.etree.ElementTree
How to use virtualenv
How to use Seaboan
How to use image-match
How to use shogun
How to use Pandas 2
How to use Virtualenv
How to use numpy.vectorize
How to use pytest_report_header
How to use partial
How to use SymPy
How to use x-means
How to use WikiExtractor.py
How to use IPython
How to use virtualenv
How to use Matplotlib
How to use iptables
How to use numpy
How to use TokyoTechFes2015
How to use venv
How to use dictionary {}
How to use Pyenv
How to use list []
How to use python-kabusapi
How to use OptParse
How to use dotenv
How to use pyenv-virtualenv
How to use imutils
How to use import
How to use GCP trace with open Telemetry
How to use tkinter with python in pyenv
How to use search sorted
[gensim] How to use Doc2Vec
Understand how to use django-filter
How to use the generator
[Python] How to use list 1
How to use FastAPI ③ OpenAPI
How to use python interactive mode with git bash
How to use Python argparse
How to use IPython Notebook
How to update with SQLAlchemy?
How to use Pandas Rolling
[Note] How to use virtualenv
How to use redis-py Dictionaries
Python: How to use pydub
[Python] How to use checkio
[Go] How to use "... (3 periods)"
How to Alter with SQLAlchemy?
Autoencoder with Chainer (Notes on how to use + trainer)
How to use Django's GeoIp2