Language processing 100 knocks 2015 ["Chapter 5: Dependency analysis"](http://www.cl.ecei. tohoku.ac.jp/nlp100/#ch5) [41st "Reading of dependency analysis result (phrase / dependency)"](http://www.cl.ecei.tohoku.ac.jp/nlp100/# sec41) This is a record. Since the last time was the content of the preparatory movement, this time it is the actual performance of the dependency. Overall, Chapter 5 is not a short code that utilizes packages as in Chapter 4, "Morphological Analysis", and we have to create an algorithm. This time it's not that complicated, but it still makes me think to some extent.

Reference link

Link	Remarks
041.Reading the dependency analysis result(Phrase / dependency).ipynb	Answer program GitHub link
100 amateur language processing knocks:41	Copy and paste source of many source parts
CaboCha official	CaboCha page to look at first

environment

I installed CRF ++ and CaboCha too long ago and forgot how to install them. Since it is a package that has not been updated at all, we have not rebuilt the environment. I only remember being frustrated when I decided to use CaboCha on Windows. I think I couldn't use it on 64-bit Windows (I have a vague memory and maybe I have a technical problem).

type	version	Contents
OS	Ubuntu18.04.01 LTS	It is running virtually
pyenv	1.2.16	I use pyenv because I sometimes use multiple Python environments
Python	3.8.1	python3 on pyenv.8.I'm using 1 Packages are managed using venv
Mecab	0.996-5	apt-Install with get
CRF++	0.58	It's too old and I forgot how to install(Perhaps`make install`)
CaboCha	0.69	It's too old and I forgot how to install(Perhaps`make install`)

Chapter 5: Dependency analysis

content of study

Apply the dependency analyzer CaboCha to "I am a cat" and experience the operation of the dependency tree and syntactic analysis.

Class, Dependency Parsing, CaboCha, Clause, Dependency, Case, Functional Verb Parsing, Dependency Path, [Graphviz](http: / /www.graphviz.org/)

Knock content

Using CaboCha for the text (neko.txt) of Natsume Soseki's novel "I am a cat" Analyze the dependency and save the result in a file called neko.txt.cabocha. Use this file to implement a program that addresses the following questions.

41. Reading the dependency analysis result (phrase / dependency)

In addition to> 40, implement the clause class Chunk. This class has a list of morphemes (Morph objects) (morphs), a list of related clause index numbers (dst), and a list of related original clause index numbers (srcs) as member variables. In addition, read the analysis result of CaboCha of the input text, express one sentence as a list of Chunk objects, and display the character string and the contact of the phrase of the eighth sentence. For the rest of the problems in Chapter 5, use the program created here.

Answer

Answer program [041. Reading the dependency analysis result (phrase / dependency) .ipynb](https://github.com/YoheiFukuhara/nlp100/blob/master/05.%E4%BF%82%E3%82% 8A% E5% 8F% 97% E3% 81% 91% E8% A7% A3% E6% 9E% 90 / 041.% E4% BF% 82% E3% 82% 8A% E5% 8F% 97% E3% 81 % 91% E8% A7% A3% E6% 9E% 90% E7% B5% 90% E6% 9E% 9C% E3% 81% AE% E8% AA% AD% E3% 81% BF% E8% BE% BC % E3% 81% BF (% E6% 96% 87% E7% AF% 80% E3% 83% BB% E4% BF% 82% E3% 82% 8A% E5% 8F% 97% E3% 81% 91) .ipynb)

import re

#Delimiter
separator = re.compile('\t|,')

#Dependency
dependancy = re.compile(r'''(?:\*\s\d+\s) #Not subject to capture
                            (-?\d+)       #Numbers(Contact)
                          ''', re.VERBOSE)

class Morph:
    def __init__(self, line):
        
        #Split with tabs and commas
        cols = separator.split(line)
        
        self.surface = cols[0] #Surface type(surface)
        self.base = cols[7]    #Uninflected word(base)
        self.pos = cols[1]     #Part of speech(pos)
        self.pos1 = cols[2]    #Part of speech subclassification 1(pos1)

class Chunk:
    def __init__(self, morphs, dst):
        self.morphs = morphs
        self.srcs = []   #List of original clause index numbers
        self.dst  = dst  #Contact clause index number
        self.phrase = ''.join([morph.surface for morph in morphs]) #Phrase

#Substitute the origin and add the Chunk list to the statement list
def append_sentence(chunks, sentences):
    
    #Substitute the entrepreneur
    for i, chunk in enumerate(chunks):
        if chunk.dst != -1:
            chunks[chunk.dst].srcs.append(i)
    sentences.append(chunks)
    return sentences, []

morphs = []
chunks = []
sentences = []

with open('./neko.txt.cabocha') as f:
    
    for line in f:
        dependancies = dependancy.match(line)
        
        #If it is not EOS or dependency analysis result
        if not (line == 'EOS\n' or dependancies):
            morphs.append(Morph(line))
            
        #When there is a morphological analysis result in the EOS or dependency analysis result
        elif len(morphs) > 0:
            chunks.append(Chunk(morphs, dst))
            morphs = []
       
        #In the case of dependency result
        if dependancies:
            dst = int(dependancies.group(1))
        
        #When there is a dependency result in EOS
        if line == 'EOS\n' and len(chunks) > 0:
            sentences, chunks = append_sentence(chunks, sentences)

for i, chunk in enumerate(sentences[7]):
    print('{}: {},Contact:{},Person in charge:{}'.format(i, chunk.phrase, chunk.dst, chunk.srcs))

Answer commentary

Regular expression for getting a contact

I'm using a regular expression that can get the contact. (-? \ D +) is the part to get the number of the contact. For more information on regular expressions, see the article "Basics and Tips for Python Regular Expressions Learned from Zero". I think you can get it without using regular expressions, but I use it for practice.

`python`


#Dependency
dependancy = re.compile(r'''(?:\*\s\d+\s) #Not subject to capture
                            (-?\d+)       #Numbers(Contact)
                          ''', re.VERBOSE)

Chunk class

It also defines a variable called phrase that is not specified for knocking. It's convenient when you output it later. srcs is only defined, and __init__ does not assign values.

`python`


class Chunk:
    def __init__(self, morphs, dst):
        self.morphs = morphs
        self.srcs = []   #List of original clause index numbers
        self.dst  = dst  #Contact clause index number
        self.phrase = ''.join([morph.surface for morph in morphs]) #Phrase

Output result (execution result)

When the program is executed, the following results will be output.

`Output result`


0:this,Contact:1,Person in charge:[]
1:A student is,Contact:7,Person in charge:[0]
2:Sometimes,Contact:4,Person in charge:[]
3:Us,Contact:4,Person in charge:[]
4:Catch,Contact:5,Person in charge:[2, 3]
5:Boil,Contact:6,Person in charge:[4]
6:To eat,Contact:7,Person in charge:[5]
7:It's a story.,Contact:-1,Person in charge:[1, 6]

[PYTHON] 100 Language Processing Knock-41: Reading Parsing Results (Phrase / Dependency)

Reference link

environment

Chapter 5: Dependency analysis

content of study

Knock content

41. Reading the dependency analysis result (phrase / dependency)

Answer

Answer commentary

Regular expression for getting a contact

python

Chunk class

python

Output result (execution result)

Output result

`python`

`python`

`Output result`