[PYTHON] It's Christmas, so I'll try to draw the genealogy of Jesus Christ with Cabocha

Since it's Christmas, I'll draw it in the genealogy of Jesus Christ in the Gospel of Matthew.

Father Abraham is Isaac,
Isaac's son is Jacob,
Jacob's son was Judas and his brothers,
The sons of Judas are Perez by Tamar and Zera,
The son of Perez is Hezlon,
Hezlon's son is Ram,
Ram's son is Amminadab,
Amminadab's child is Nachon,
Nachon's child is Salmon,
The son of Salmon is Boaz by Rahab,
Boaz's son is an obed by Ruth,
Obed's son is Jesse,
The son of Jesse is David.

David's son is Solomon by Uriah's wife,
Solomon's son is Rehabaam,
The son of Rehabaam is Abiya,
Abiya's son is Asa,
Asa's child is Yoshapate,
The son of Joshapate is Jehoram,
The son of Jehoram is Uzziah,
Uzziah's son is Jotham,
Jotham's child is Ahaz,
Ahaz's child is Hezekiah,
Hezekiah's child is Manasseh,
Manasseh's son is Amon,
Amon's son is Yoshiya,
The sons of Josiah were Econia and her brothers at the time of Babylon's expulsion.

After the expulsion of Babylon:
Econia's child is Charter,
Zerubbabel, the son of Chartel,
Zerubbabel's son is Abiude,
Abiude's son is Eliya Kim,
Eliya Kim's son is Azor,
Azor's child is Sadoku,
Sadoku's child is Akim,
Akim's son is Eriude,
Eleazar, the son of Eliude,
The son of Eleazar is Matane,
Matane's son is Jacob,
Jacob's son is Mary's husband Joseph.

Preparation

Installation of Cabocha 0.68 http://qiita.com/mima_ita/items/161cd869648edb30627b

Install GraphViz and pydot http://needtec.exblog.jp/20608517/

Create a user dictionary

Since there is a unique name, create a user dictionary so that morphological analysis can be performed. Create the following CSV.

bible.csv


Abraham,0,0,500,noun,General,*,*,*,*,Abraham,Abraham,Abraham
Isaac,0,0,500,noun,General,*,*,*,*,Isaac,Isaac,Isaac
Jacob,0,0,500,noun,General,*,*,*,*,Jacob,Jacob,Jacob
Judas,0,0,500,noun,General,*,*,*,*,Judas,Judas,Judas
Zera,0,0,500,noun,General,*,*,*,*,Zera,Zera,Zera
Perez,0,0,500,noun,General,*,*,*,*,Perez,Perez,Perez
Hezlon,0,0,500,noun,General,*,*,*,*,Hezlon,Hezlon,Hezlon
Lamb,0,0,500,noun,General,*,*,*,*,Lamb,Lamb,Lamb
Amminadab,0,0,500,noun,General,*,*,*,*,Amminadab,Amminadab,Amminadab
Nation,0,0,500,noun,General,*,*,*,*,Nation,Nation,Nation
Salmon,0,0,500,noun,General,*,*,*,*,Salmon,Salmon,Salmon
Boaz,0,0,500,noun,General,*,*,*,*,Boaz,Boaz,Boaz
Obed,0,0,500,noun,General,*,*,*,*,Obed,Obed,Obed
Jesse,0,0,500,noun,General,*,*,*,*,Jesse,Jesse,Jesse
David,0,0,500,noun,General,*,*,*,*,David,David,David
Solomon,0,0,500,noun,General,*,*,*,*,Solomon,Solomon,Solomon
Reha Baam,0,0,500,noun,General,*,*,*,*,Reha Baam,Reha Baam,Reha Baam
Abiya,0,0,500,noun,General,*,*,*,*,Abiya,Abiya,Abiya
Asa,0,0,500,noun,General,*,*,*,*,Asa,Asa,Asa
Yosha putty,0,0,500,noun,General,*,*,*,*,Yosha putty,Yosha putty,Yosha putty
Jehoram,0,0,500,noun,General,*,*,*,*,Jehoram,Jehoram,Jehoram
Uzziah,0,0,500,noun,General,*,*,*,*,Uzziah,Uzziah,Uzziah
Jotham,0,0,500,noun,General,*,*,*,*,Jotham,Jotham,Jotham
Ahaz,0,0,500,noun,General,*,*,*,*,Ahaz,Ahaz,Ahaz
Hezekiah,0,0,500,noun,General,*,*,*,*,Hezekiah,Hezekiah,Hezekiah
Manasseh,0,0,500,noun,General,*,*,*,*,Manasseh,Manasseh,Manasseh
Amon,0,0,500,noun,General,*,*,*,*,Amon,Amon,Amon
Yoshiya,0,0,500,noun,General,*,*,*,*,Yoshiya,Yoshiya,Yoshiya
Babylon,0,0,500,noun,General,*,*,*,*,Babylon,Babylon,Babylon
Econia,0,0,500,noun,General,*,*,*,*,Econia,Econia,Econia
Charter,0,0,500,noun,General,*,*,*,*,Charter,Charter,Charter
Zerubbabel,0,0,500,noun,General,*,*,*,*,Zerubbabel,Zerubbabel,Zerubbabel
Abiude,0,0,500,noun,General,*,*,*,*,Abiude,Abiude,Abiude
Eliya Kim,0,0,500,noun,General,*,*,*,*,Eliya Kim,Eliya Kim,Eliya Kim
Azor,0,0,500,noun,General,*,*,*,*,Azor,Azor,Azor
Sadoku,0,0,500,noun,General,*,*,*,*,Sadoku,Sadoku,Sadoku
Akim,0,0,500,noun,General,*,*,*,*,Akim,Akim,Akim
Eriude,0,0,500,noun,General,*,*,*,*,Eriude,Eriude,Eriude
Eleazar,0,0,500,noun,General,*,*,*,*,Eleazar,Eleazar,Eleazar
Matane,0,0,500,noun,General,*,*,*,*,Matane,Matane,Matane
Jacob,0,0,500,noun,General,*,*,*,*,Jacob,Jacob,Jacob

In Windows, you can create a user dictionary called bible.dic by executing the following command at the command prompt.

"C:\Program Files (x86)\MeCab\bin\mecab-dict-index" -d"C:\Program Files (x86)\MeCab\dic\ipadic" -u bible.dic -f shift-jis -t utf-8 bible.csv

In Cabocha, you can perform dependency analysis using the user dictionary by specifying the path of the user dictionary with the -u option.

Code for creating a family tree

bible.py


#!/usr/bin/python
# -*- coding: utf-8 -*-
import CaboCha
import pydot
import uuid
import sys


class node:
    """
Record family tree nodes
    """
    def __init__(self):
        self.name = None
        self.children_node = {}
        self.pairs_node = {}
        self.id = str(uuid.uuid4())

font_name = "ms ui gothic"

sentence = """
Father Abraham is Isaac,\n
Isaac's son is Jacob,\n
Jacob's son was Judas and his brothers,\n
The sons of Judas are Perez by Tamar and Zera,\n
The son of Perez is Hezlon,\n
Hezlon's son is Ram,\n
Ram's son is Amminadab,\n
Amminadab's child is Nachon,\n
Nachon's child is Salmon,\n
The son of Salmon is Boaz by Rahab,\n
Boaz's son is an obed by Ruth,\n
Obed's son is Jesse,\n
The son of Jesse is David.\n

David's son is Solomon by Uriah's wife,\n
Solomon's son is Rehabaam,\n
The son of Rehabaam is Abiya,\n
Abiya's son is Asa,\n
Asa's child is Yoshapate,\n
The son of Joshapate is Jehoram,\n
The son of Jehoram is Uzziah,\n
Uzziah's son is Jotham,\n
Jotham's child is Ahaz,\n
Ahaz's child is Hezekiah,\n
Hezekiah's child is Manasseh,\n
Manasseh's son is Amon,\n
Amon's son is Yoshiya,\n
The sons of Josiah were Econia and her brothers at the time of Babylon's expulsion.\n

After the expulsion of Babylon:\n
Econia's child is Charter,\n
Zerubbabel, the son of Chartel,\n
Zerubbabel's son is Abiude,\n
Abiude's son is Eliya Kim,\n
Eliya Kim's son is Azor,\n
Azor's child is Sadoku,\n
Sadoku's child is Akim,\n
Akim's son is Eriude,\n
Eleazar, the son of Eliude,\n
The son of Eleazar is Matane,\n
Matane's son is Jacob,\n
Jacob's son is Mary's husband Joseph.\n
"""


def get_word(tree, ix):
    surface = tree.token(ix).surface
    f = tree.token(ix).feature.split(",")
    return surface, f


def create_node(tree, chunk):
    s1, f1 = get_word(tree, chunk.token_pos + chunk.head_pos)
    s2 = None
    f2 = None
    type = 0
    if chunk.head_pos != chunk.func_pos:
        s2, f2 = get_word(tree, chunk.token_pos + chunk.func_pos)
        if f2[0] == 'Particle':
            if f2[1] == 'Attributive':  #"of"
                type = 1
            elif f2[1] == 'Particle':  #"Ha"
                type = 2
            elif f2[1] == 'Connection particle':  #"When"
                type = 3
            elif f2[1] == 'Case particles':  #"by"
                type = 4
    if f1[0] == 'noun':
        if f1[1] == 'suffix':
            s1 = tree.token(chunk.token_pos + chunk.head_pos - 1).surface + s1
    return {
        'text': s1,
        'type': type
    }


def find_child(t, child_name):
    """
Get a node with a specified child from the family tree
    """
    for key, item in t.children_node.items():
        if key == child_name and item is None:
            return t
        elif item is not None:
            ret = find_child(item, child_name)
            if ret:
                return ret
    return None


def dump_family(family_tree):
    """
Family tree dump
    """
    print ('parent:', family_tree.name)
    for key, item in family_tree.pairs_node.items():
        print ('wife:', key)

    for key, item in family_tree.children_node.items():
        print ('Child:', key)
        if item:
            print (dump_family(item))


def create_graph_children(graph, p_node, child):
    n = pydot.Node(child.id,
                   label=child.name,
                   style="filled",
                   fillcolor="green",
                   shape = "box",
                   fontname=font_name)
    subg = pydot.Subgraph('', rank='same')
    subg.add_node(n)
    graph.add_subgraph(subg)
    if p_node:
        graph.add_edge(pydot.Edge(p_node, n))

    for key, item in child.pairs_node.items():
        pair = pydot.Node(str(uuid.uuid4()),
                          label=key,
                          style="filled",
                          fillcolor="pink",
                          shape = "box",
                          fontname=font_name)
        subg.add_node(pair)
        graph.add_edge(pydot.Edge(n, pair))

    for key, item in child.children_node.items():
        if item:
            create_graph_children(graph, n, item)
        else:
            nchild = pydot.Node(str(uuid.uuid4()),
                                label=key,
                                style="filled",
                                fillcolor="gray",
                                shape = "box",
                                fontname=font_name)
            graph.add_node(nchild)
            graph.add_edge(pydot.Edge(n, nchild))


def create_graph(family_tree):
    graph = pydot.Dot(graph_type='digraph',
                      fontname=font_name)
    create_graph_children(graph, None, family_tree)
    graph.write_png('example2_graph.png')


def main(argvs, argc):
    c = CaboCha.Parser("-u bible.dic")
    lines = sentence.split("\n")
    family_tree = None
    for line in lines:
        tree = c.parse(line)
        data = node()
        if tree.chunk_size() == 0:
            continue
        ids = {-1: None}
        children = []
        pairs = []
        for i in range(tree.chunk_size()):
            curid = i
            if i in ids:
                continue
            chunk = tree.chunk(i)
            attr = None
            cnn_type = 0
            while True:
                d = create_node(tree, chunk)
                ids[curid] = d
                if data.name is None:
                    data.name = d['text']
                if cnn_type == 1:  #The previous chunk ends with "no"
                    if d['text'] == 'Child':
                        attr = 'children'
                elif cnn_type == 2:  #The previous chunk ends with "ha"
                    if attr == 'children':
                        children.append(curid)
                elif cnn_type == 3:  #Same operation if previous chunk ends with "and"
                    if attr == 'children':
                        children.append(curid)
                else:
                    attr = None
                if d['type'] == 4:  #If the chunk ends with "by", it is considered a spouse
                    pairs.append(i)
                else:
                    cnn_type = d['type']
                if not chunk.link in ids:
                    curid = chunk.link
                    chunk = tree.chunk(curid)
                else:
                    break
        for child in children:
            data.children_node[ids[child]['text']] = None

        for pair in pairs:
            data.pairs_node[ids[pair]['text']] = None

        if len(children) > 0:
            if family_tree is None:
                family_tree = data
            else:
                node_obj = find_child(family_tree, data.name)
                if node_obj:
                    node_obj.children_node[data.name] = data
    if family_tree:
        dump_family(family_tree)
        create_graph(family_tree)


if __name__ == '__main__':
    argvs = sys.argv
    argc = len(argvs)
    sys.exit(main(argvs, argc))

Doing this will generate a family tree similar to the following: example2_graph.png

If you write in a similar style, you can probably create various family trees.

For the time being, the data structure that allows you to set multiple spouses is that your ancestors are sore ... I am trying to respond to vigorous mating activities, but I am not aware of who and who's child ..

Recommended Posts

It's Christmas, so I'll try to draw the genealogy of Jesus Christ with Cabocha
It's Halloween so I'll try to hide it with Python
Try to get the contents of Word with Golang
Try to extract the features of the sensor data with CNN
Try to solve the N Queens problem with SA of PyQUBO
Put Cabocha 0.68 on Windows and try to analyze the dependency with Python
Try to image the elevation data of the Geographical Survey Institute with Python
Try to react only the carbon at the end of the chain with SMARTS
Try to separate the background and moving object of the video with OpenCV
[Verification] Try to align the point cloud with the optimization function of pytorch Part 1
Try to solve the fizzbuzz problem with Keras
Try to solve the man-machine chart with Python
Try to draw a life curve with python
How to try the friends-of-friends algorithm with pyfof
Try to simulate the movement of the solar system
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
Try to solve the programming challenge book with python3
Add information to the bottom of the figure with Matplotlib
Try to solve the problems / problems of "Matrix Programmer" (Chapter 1)
Try to visualize the room with Raspberry Pi, part 1
Try to solve the internship assignment problem with Python
Try to estimate the number of likes on Twitter
[Neo4J] ④ Try to handle the graph structure with Cypher
Try to specify the axis with PyTorch's Softmax function
It's a hassle to write "coding: utf-8" in Python, so I'll do something with Shellscript
Try to import to the database by manipulating ShapeFile of national land numerical information with Python
Try to visualize the nutrients of corn flakes that M-1 champion Milkboy said with Python