I wrote Python code to create a table (view) dependency diagram (PlantUML) from SQL

What is this?

A table (view) dependency diagram like this ...

UML.png

↓ It is a story that I wrote Python code to output from a SQL group like this. (PlantUML was used to create the figure)

CREATE TABLE
  `project.dataset.table5` AS
SELECT
  *
FROM
  `project.dataset.table1`;
CREATE TABLE IF NOT EXISTS
  `project.dataset.table6` AS
SELECT
  *
FROM
  `project.dataset.table2`
UNION ALL
SELECT
  *
FROM
  `project.dataset.table3`;
CREATE VIEW
  `project.dataset.table7` AS
SELECT
  *
FROM
  `project.dataset.table3`
INNER JOIN
  `project.dataset.table4`
USING
  (user_id);

What is PlantUML

According to wikipedia

PlantUML is a text-based language for creating open source UML diagrams

That's right. Various model diagrams can be created. Here are the sites that I used to learn from.

--Learn basic syntax and render diagrams - https://plantuml.com/ja/ --A personally developed app that was very easy to use - http://sujoyu.github.io/plantuml-previewer/

How to create a dependency diagram

Do the following in Python3:

sql1 = '''CREATE TABLE
  `project.dataset.table5` AS
SELECT
  *
FROM
  `project.dataset.table1`;
'''

sql2 = '''CREATE TABLE IF NOT EXISTS
  `project.dataset.table6` AS
SELECT
  *
FROM
  `project.dataset.table2`
UNION ALL
SELECT
  *
FROM
  `project.dataset.table3`;
'''

sql3 = '''CREATE VIEW
  `project.dataset.table7` AS
SELECT
  *
FROM
  `project.dataset.table3`
INNER JOIN
  `project.dataset.table4`
USING
  (user_id);
'''

import re
prog_destination = re.compile(r'(?:CREATE TABLE|CREATE TABLE IF NOT EXISTS|CREATE VIEW|CREATE VIEW IF NOT EXISTS|INSERT INTO|INSERT)[\s \n]+`(.+?)`')
prog_origin = re.compile(r'(?:FROM|JOIN)[\s \n]+`(.+?)`')

platuml_tempate = '''@startuml
skinparam padding 10 /'Padding adjustment'/
left to right direction /'If you want a layout that extends the diagram from left to right'/
hide members /'Erase class attributes'/
hide circle /'Remove the class mark'/
{}
@enduml
'''


def make_table_dependencies_for_platuml(sql:str):
    #Table to create(View)Get
    if len(prog_destination.findall(sql)) != 1:
        raise Exception('CREATE TABLE|The INSERT INTO clause does not exist.')
    else:
        destination_table = prog_destination.findall(sql)[0]

    #Get referrer table excluding self-reference
    origin_tables = [table for table in prog_origin.findall(sql) if table != destination_table]
    if len(origin_tables) == 0:
        raise Exception('Dependency does not exist')

    return [F'"{table}" <|-- "{destination_table}"' for table in origin_tables]

#When you want to output PlantUML for each sql
# print(platuml_tempate.format('\n'.join(make_table_dependencies_for_platuml(sql1))))
# print(platuml_tempate.format('\n'.join(make_table_dependencies_for_platuml(sql2))))
# print(platuml_tempate.format('\n'.join(make_table_dependencies_for_platuml(sql3))))

all_sql_dependencies = [make_table_dependencies_for_platuml(sql) for sql in [sql1, sql2, sql3]]
print(platuml_tempate.format('\n'.join(sum(all_sql_dependencies, []))))

Then, the following result will be output. Let's render the figure at the link destination introduced earlier. (The relationship diagram that appears at the beginning is drawn.)

@startuml
skinparam padding 10 /'Padding adjustment'/
left to right direction /'If you want a layout that extends the diagram from left to right'/
hide members /'Erase class attributes'/
hide circle /'Remove the class mark'/
"project.dataset.table1" <|-- "project.dataset.table5"
"project.dataset.table2" <|-- "project.dataset.table6"
"project.dataset.table3" <|-- "project.dataset.table6"
"project.dataset.table3" <|-- "project.dataset.table7"
"project.dataset.table4" <|-- "project.dataset.table7"
@enduml

Why i wrote this code

I'm doing data analysis work, but I often process a table with BigQuery to create another table, and then process that table to create a table. Especially with ad hoc analysis, it becomes difficult to understand the relationships between tables. I think that it will be useful for understanding table relations and whether you are making useless intermediate tables.

↓ Inflated table dependencies uml_table.png

reference

--The idea of grasping the dependency between tables from SQL and regular expressions were obtained from the following article. As expected it is ZOZO: pray: -Introduction of data mart construction platform managed by GitHub --ZOZO Technologies TECH BLOG

Other

--You can also use GraphViz & PyDot.

Finally

We are challenging the Advent calendar this year as well, so please have a look. https://qiita.com/advent-calendar/2019/sensy

Recommended Posts

I wrote Python code to create a table (view) dependency diagram (PlantUML) from SQL
How to create a kubernetes pod from python code
I made a tool to automatically generate a simple ER diagram from the CREATE TABLE statement
I wrote a code to convert quaternions to z-y-x Euler angles in Python
Edit Excel from Python to create a PivotTable
I want to create a window in Python
[python] Create table from pandas DataFrame to postgres
I made a package to create an executable file from Hy source code
I tried to create a table only with Django
Python script to create a JSON file from a CSV file
I tried to create API list.csv in Python from swagger.yaml
I wrote a program quickly to study DI with Python ①
I want to start a lot of processes from python
I made a tool to create a word cloud from wikipedia
I want to send a message from Python to LINE Bot
[Python] I wrote a test of "Streamlit" that makes it easy to create visualization applications.
[Python] How to create a table from list (basic operation of table creation / change of matrix name)
I wrote a script to create a Twitter Bot development environment quickly with AWS Lambda + Python 2.7
Create folders from '01' to '12' with python
Try to create a python environment with Visual Studio Code & WSL
I tried to create a list of prime numbers with python
[Python] How to get & change rows / columns / values from a table.
I wrote the code to write the code of Brainf * ck in python
Simple code to call a python program from Javascript on EC2
I wrote a function to load a Git extension script in Python
I wrote a script to extract a web page link in Python
I want to make a parameter list from CloudFormation code (yaml)
5 Ways to Create a Python Chatbot
I made a plugin to generate Markdown table from csv in Vim
I was addicted to creating a Python venv environment with VS Code
[Python] I wrote a simple code that automatically generates AA (ASCII art)
[Python memo] I want to get a 2-digit hexadecimal number from a decimal number
Create a tool to automatically furigana with html using Mecab from Python3
Steps to create a Python virtual environment with VS Code on Windows
I want to convert a table converted to PDF in Python back to CSV
[Outlook] I tried to automatically create a daily report email with Python
I tried to create a sample to access Salesforce using Python and Bottle
I tried to create a linebot (implementation)
How to create a clone from Github
I tried to create a linebot (preparation)
Send a message from Python to Slack
I want to use jar from python
I want to build a Python environment
How to create a repository from media
I want to create a nice Python development environment for my new Mac
I tried to create a class that can easily serialize Json in Python
I made a library to operate AWS CloudFormation stack from CUI (Python Fabric)
I want to create a priority queue that can be updated in Python (2.7)
When I tried to create a virtual environment with Python, it didn't work
I tried to automatically generate a port management table from Config of L2SW
I tried to easily create a fully automatic attendance system with Selenium + Python
[IBM Cloud] I tried to access the Db2 on Cloud table from Cloud Funtions (python)
Create a setting in terraform to send a message from AWS Lambda Python3.8 to Slack
A note I was addicted to when running Python with Visual Studio Code
Send a message from Slack to a Python server
[Python] List Comprehension Various ways to create a list
I want to easily create a Noise Model
How to create a Python virtual environment (venv)
I wanted to create a dll to use a function written in C from Python with ctypes, but I had a hard time
How to open a web browser from python
I tried to make a function to retrieve data from database column by column using sql with sqlite3 of python [sqlite3, sql, pandas]