Language processing 100 knocks 2015 ["Chapter 5: Dependency analysis"](http://www.cl.ecei. It is a record of 44th "Visualization of Dependent Tree" of tohoku.ac.jp/nlp100/#ch5). Visualization makes it very easy to understand how the document is dependent. By visualizing the dependency, you can also do something nice as in the article "I tried to linguistically analyze Karen Takizawa's incomprehensible sentences.".

Reference link

Link	Remarks
044.Visualization of dependent trees.ipynb	Answer program GitHub link
100 amateur language processing knocks:44	Copy and paste source of many source parts
CaboCha official	CaboCha page to look at first

environment

I installed CRF ++ and CaboCha too long ago and forgot how to install them. Since it is a package that has not been updated at all, we have not rebuilt the environment. I have only a frustrated memory of trying to use CaboCha on Windows. I think I couldn't use it on 64-bit Windows (I have a vague memory and maybe I have a technical problem).

type	version	Contents
OS	Ubuntu18.04.01 LTS	It is running virtually
pyenv	1.2.16	I use pyenv because I sometimes use multiple Python environments
Python	3.8.1	python3 on pyenv.8.I'm using 1 Packages are managed using venv
Mecab	0.996-5	apt-Install with get
CRF++	0.58	It's too old and I forgot how to install(Perhaps`make install`)
CaboCha	0.69	It's too old and I forgot how to install(Perhaps`make install`)

In the above environment, I am using the following additional Python packages. Just install with regular pip.

type	version
pydot	1.4.1

Chapter 5: Dependency analysis

content of study

Apply the dependency analyzer CaboCha to "I am a cat" and experience the operation of the dependency tree and syntactic analysis.

Class, Dependency Parsing, CaboCha, Clause, Dependency, Case, Functional Verb Parsing, Dependency Path, [Graphviz](http: / /www.graphviz.org/)

Knock content

Using CaboCha for the text (neko.txt) of Natsume Soseki's novel "I am a cat" Analyze the dependency and save the result in a file called neko.txt.cabocha. Use this file to implement a program that addresses the following questions.

44. Visualization of the dependent tree

Visualize the dependency tree of a given sentence as a directed graph. For visualization, convert the dependency tree to DOT language and [Graphviz](http: / /www.graphviz.org/) should be used. Also, to visualize directed graphs directly from Python, use pydot.

Problem supplement (About "visualization and directed graph")

Visualization

It seems that there are two types of visualization. I'm ignoring the first method. I haven't even checked if the first method is easy. It doesn't matter because I didn't use it in "Amateur language processing 100 knocks: 44" that I always refer to in my knocks.

For visualization, convert the dependency tree to DOT language and then [Graphviz](http: //www.graphviz.org/) should be used.

This time, I used the following method. With this, all you have to do is install pydot with pip and pass it to the function in Python.

Also, to visualize directed graphs directly from Python, use pydot.

Directed graph

First, [** Graph Theory **](https://ja.wikipedia.org/wiki/%E3%82%B0%E3%83%A9%E3%83%95%E7%90%86%E8% There is something called AB% 96)

Graph theory (Graph theory) is a mathematical theory of graphs consisting of a set of nodes (nodes / vertices) and a set of edges (branches / sides).

[Definition of directed graph and invalid graph as below](https://ja.wikipedia.org/wiki/%E3%82%B0%E3%83%A9%E3%83%95%E7%90%86% E8% AB% 96 #% E6% A6% 82% E8% A6% 81) is roughly (the "directed graph" has a direction). Please follow the link for details.

If you want to consider not only how to connect but also "from which to which", add an arrow to the edge. Such a graph is called a directed graph or a digraph. A graph without an arrow is called an undirected graph.

Answer

Answer program [044. Visualization of dependent trees.ipynb](https://github.com/YoheiFukuhara/nlp100/blob/master/05.%E4%BF%82%E3%82%8A%E5%8F%97 % E3% 81% 91% E8% A7% A3% E6% 9E% 90 / 044.% E4% BF% 82% E3% 82% 8A% E5% 8F% 97% E3% 81% 91% E6% 9C% A8% E3% 81% AE% E5% 8F% AF% E8% A6% 96% E5% 8C% 96.ipynb)

import re
from subprocess import run, PIPE

import pydot

#Delimiter
separator = re.compile('\t|,')

#Dependency
dependancy = re.compile(r'''(?:\*\s\d+\s) #Not subject to capture
                            (-?\d+)       #Numbers(Contact)
                          ''', re.VERBOSE)

text = input('Please enter text')

#initial value
if len(text) == 0:
    text = 'I don't remember exactly whether I said it or not, but I think I probably said it when I had a hand-wound party the other day, without feeling like I said it a little. I tried it, but I came to think that it doesn't matter whether I say it or not.'

cmd = 'echo {} | cabocha -f1'.format(text)
proc = run(cmd, shell=True, stdout=PIPE, stderr=PIPE)
print(proc.stdout.decode('UTF-8'))

class Chunk:
    def __init__(self, phrase, dst):
        self.phrase = phrase
        self.dst = dst  #Contact clause index number

phrase = ''
chunks = []
for line in proc.stdout.decode('UTF-8').splitlines():
    dependancies = dependancy.match(line)
    
    #If it is not EOS or dependency analysis result(Note that EOS does not have line breaks)
    if not (line == 'EOS' or dependancies):
        #Split with tabs and commas
        cols = separator.split(line)
        phrase += cols[0] #Surface type(surface)

    #When there is a morphological analysis result in the EOS or dependency analysis result
    elif phrase != '':
        chunks.append(Chunk(phrase, dst))
        phrase = ''

    #In the case of dependency result
    if dependancies:
        dst = int(dependancies.group(1))

#Changed to a format that passes something with a contact to pydot
edges = []
for i, chunk in enumerate(chunks):
    if chunk.dst != -1 and \
       chunk.phrase != '' and \
       chunks[chunk.dst].phrase != '':
        edges.append(((i, chunk.phrase), (chunk.dst, chunks[chunk.dst].phrase)))

#Save image as directed graph with pydot
if len(edges) > 0:
    graph = pydot.graph_from_edges(edges, directed=True)
    graph.write_png('044.dot.png')

Answer commentary

Text input

The "given sentence" part of the knock is given by the ʻinput` function (does it conform to the question intention?). If nothing is entered, the initial value will be used.

`python`


text = input('Please enter text')

#initial value
if len(text) == 0:
    text = 'I don't remember exactly whether I said it or not, but I think I probably said it when I had a hand-wound party the other day, without feeling like I said it a little. I tried it, but I came to think that it doesn't matter whether I say it or not.'

CaboCha execution part

The CaboCha execution part uses the function run of the package subprocess to execute the shell. I didn't use CaboCha's Python wrapper because it was purely annoying.

`python`


cmd = 'echo {} | cabocha -f1'.format(text)
proc = run(cmd, shell=True, stdout=PIPE, stderr=PIPE)
print(proc.stdout.decode('UTF-8'))

The first part of the content output by the print function is as follows.

`Part of print result`


* 0 1D 0/4 0.285960
Say verb,Independence,*,*,Godan / Wa line reminder,Continuous connection,To tell,It,It
Particles,Connection particle,*,*,*,*,hand,Te,Te
A verb,Non-independent,*,*,Five steps, La line,Continuous connection,is there,Ah,Ah
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
Ka particle,Sub-particles / parallel particles / final particles,*,*,*,*,Or,Mosquito,Mosquito
* 1 4D 0/4 2.230543
Say verb,Independence,*,*,Godan / Wa line reminder,Continuous connection,To tell,It,It
Verb,Non-independent,*,*,One step,Imperfective form,Teru,Te,Te
No auxiliary verb,*,*,*,Special Nai,Continuous connection,Absent,Naka,Naka
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
Ka particle,Sub-particles / parallel particles / final particles,*,*,*,*,Or,Mosquito,Mosquito
* 2 4D 0/3 2.418727
Which noun,Pronoun,General,*,*,*,Which,Dotch,Dotch
Auxiliary verb,*,*,*,Special,Continuous connection,Is,Dad,Dad
Auxiliary verb,*,*,*,Special,Uninflected word,Ta,Ta,Ta
Ka particle,Sub-particles / parallel particles / final particles,*,*,*,*,Or,Mosquito,Mosquito

`python`


#Changed to a format that passes something with a contact to pydot
edges = []
for i, chunk in enumerate(chunks):
    if chunk.dst != -1 and \
       chunk.phrase != '' and \
       chunks[chunk.dst].phrase != '':
        edges.append(((i, chunk.phrase), (chunk.dst, chunks[chunk.dst].phrase)))

By the way, ʻedges` has such contents.

((0, 'Did you say'), (1, 'Didn't you say'))
((1, 'Didn't you say'), (4, 'I don't remember'))
((2, 'Which was'), (4, 'I don't remember'))
((3, 'Properly'), (4, 'I don't remember'))
((4, 'I don't remember'), (19, 'I thought about it,'))
((5, 'Certainly'), (7, 'Hooray'))
((6, 'A hand-wound party during this time'), (7, 'Hooray'))
((7, 'Hooray'), (8, 'Sometimes'))
((8, 'Sometimes'), (10, 'Said'))
((9, 'A little bit'), (10, 'Said'))
((10, 'Said'), (11, 'Feeling'))
((11, 'Feeling'), (12, 'Without'))
((12, 'Without'), (14, 'Nishimo'))
((13, 'Without'), (14, 'Nishimo'))
((14, 'Nishimo'), (15, 'Without'))
((15, 'Without'), (17, 'I think I said'))
((16, 'Perhaps'), (17, 'I think I said'))
((17, 'I think I said'), (19, 'I thought about it,'))
((18, 'To here'), (19, 'I thought about it,'))
((19, 'I thought about it,'), (28, 'It depends.'))
((20, 'Oh dear'), (21, 'I'll tell you'))
((21, 'I'll tell you'), (28, 'It depends.'))
((22, 'Say'), (23, 'I don't care'))
((23, 'I don't care'), (25, 'There is no problem,'))
((24, 'Up to that point'), (25, 'There is no problem,'))
((25, 'There is no problem,'), (26, 'I think'))
((26, 'I think'), (27, 'Reached'))
((27, 'Reached'), (28, 'It depends.'))

Directed graphing

Finally, use the graph_from_edges function to create a valid graph and use the write_png function to save the image. By setting directed = True at the time of directed graphing, the line between the segments becomes an arrow.

#Save image as directed graph with pydot
if len(edges) > 0:
    graph = pydot.graph_from_edges(edges, directed=True)
    graph.write_png('044.dot.png')

Output result (execution result)

When the program is executed, the following results will be output.

By the way, this is the original story of this document. Article "[Play] Synthetic analysis of Shinkalion's ton demo mail".

[PYTHON] 100 Language Processing Knock-44: Visualization of Dependent Tree

Reference link

environment

Chapter 5: Dependency analysis

content of study

Knock content

44. Visualization of the dependent tree

Problem supplement (About "visualization and directed graph")

Visualization

Directed graph

Answer

Answer commentary

Text input

python

CaboCha execution part

python

Part of print result

python

Directed graphing

Output result (execution result)

`python`

`python`

`Part of print result`

`python`