[PYTHON] 100 Language Processing Knock-43: Extracted clauses containing nouns related to clauses containing verbs

Language processing 100 knocks 2015 ["Chapter 5: Dependency analysis"](http://www.cl.ecei. tohoku.ac.jp/nlp100/#ch5) [43rd "Extracting clauses containing nouns related to clauses containing verbs"](http://www.cl.ecei.tohoku.ac.jp/nlp100 / # sec43) This is a record. Compared to the previous knock, there is no big difference just by adding conditions to the output source and destination.

Reference link

Link Remarks
043.Extract clauses containing nouns related to clauses containing verbs.ipynb Answer program GitHub link
100 amateur language processing knocks:43 Copy and paste source of many source parts
CaboCha official CaboCha page to look at first

environment

I installed CRF ++ and CaboCha too long ago and forgot how to install them. Since it is a package that has not been updated at all, we have not rebuilt the environment. I have only a frustrated memory of trying to use CaboCha on Windows. I think I couldn't use it on 64-bit Windows (I have a vague memory and maybe I have a technical problem).

type version Contents
OS Ubuntu18.04.01 LTS It is running virtually
pyenv 1.2.16 I use pyenv because I sometimes use multiple Python environments
Python 3.8.1 python3 on pyenv.8.I'm using 1
Packages are managed using venv
Mecab 0.996-5 apt-Install with get
CRF++ 0.58 It's too old and I forgot how to install(Perhapsmake install)
CaboCha 0.69 It's too old and I forgot how to install(Perhapsmake install)

Chapter 5: Dependency analysis

content of study

Apply the dependency analyzer CaboCha to "I am a cat" and experience the operation of the dependency tree and syntactic analysis.

Class, Dependency Parsing, CaboCha, Clause, Dependency, Case, Functional Verb Parsing, Dependency Path, [Graphviz](http: / /www.graphviz.org/)

Knock content

Using CaboCha for the text (neko.txt) of Natsume Soseki's novel "I am a cat" Analyze the dependency and save the result in a file called neko.txt.cabocha. Use this file to implement a program that addresses the following questions.

43. Extracting clauses containing nouns related to clauses containing verbs

When clauses containing nouns relate to clauses containing verbs, extract them in tab-delimited format. However, do not output symbols such as punctuation marks.

Answer

Answer program [043. Extract clauses containing nouns related to clauses containing verbs.ipynb](https://github.com/YoheiFukuhara/nlp100/blob/master/05.%E4%BF%82%E3% 82% 8A% E5% 8F% 97% E3% 81% 91% E8% A7% A3% E6% 9E% 90 / 043.% E5% 90% 8D% E8% A9% 9E% E3% 82% 92% E5 % 90% AB% E3% 82% 80% E6% 96% 87% E7% AF% 80% E3% 81% 8C% E5% 8B% 95% E8% A9% 9E% E3% 82% 92% E5% 90 % AB% E3% 82% 80% E6% 96% 87% E7% AF% 80% E3% 81% AB% E4% BF% 82% E3% 82% 8B% E3% 82% 82% E3% 81% AE % E3% 82% 92% E6% 8A% BD% E5% 87% BA.ipynb)

import re

#Delimiter
separator = re.compile('\t|,')

#Dependency
dependancy = re.compile(r'''(?:\*\s\d+\s) #Not subject to capture
                            (-?\d+)       #Numbers(Contact)
                          ''', re.VERBOSE)

class Morph:
    def __init__(self, line):
        
        #Split with tabs and commas
        cols = separator.split(line)
        
        self.surface = cols[0] #Surface type(surface)
        self.base = cols[7]    #Uninflected word(base)
        self.pos = cols[1]     #Part of speech(pos)
        self.pos1 = cols[2]    #Part of speech subclassification 1(pos1)

class Chunk:
    def __init__(self, morphs, dst):
        self.morphs = morphs
        self.srcs = []   #List of original clause index numbers
        self.dst  = dst  #Contact clause index number
        
        self.verb = False
        self.noun = False
        self.phrase = ''
        
        for morph in morphs:            
            #For non-symbols Create clauses
            if morph.pos != 'symbol':
                self.phrase += morph.surface
            if morph.pos == 'verb':
                self.verb = True
            if morph.pos == 'noun':
                self.noun = True

#Substitute the origin and add the Chunk list to the statement list
def append_sentence(chunks, sentences):
    
    #Substitute the entrepreneur
    for i, chunk in enumerate(chunks):
        if chunk.dst != -1:
            chunks[chunk.dst].srcs.append(i)
    sentences.append(chunks)
    return sentences, []

morphs = []
chunks = []
sentences = []

with open('./neko.txt.cabocha') as f:
    
    for line in f:
        dependancies = dependancy.match(line)
        
        #If it is not EOS or dependency analysis result
        if not (line == 'EOS\n' or dependancies):
            morphs.append(Morph(line))
            
        #When there is a morphological analysis result in the EOS or dependency analysis result
        elif len(morphs) > 0:
            chunks.append(Chunk(morphs, dst))
            morphs = []
       
        #In the case of dependency result
        if dependancies:
            dst = int(dependancies.group(1))
        
        #When there is a dependency result in EOS
        if line == 'EOS\n' and len(chunks) > 0:
            sentences, chunks = append_sentence(chunks, sentences)

for i, sentence in enumerate(sentences):
    for chunk in sentence:
        if chunk.dst != -1 and \
           chunk.noun and \
           sentence[chunk.dst].verb:
            print('{}\t{}'.format(chunk.phrase, sentence[chunk.dst].phrase))
    
    #Limited because there are many
    if i > 50:
        break

Answer commentary

Does the phrase include nouns / verbs?

I changed the Chunk class from the previous knock and defined in the class variable whether the clause includes nouns and verbs. Since we are processing in a for loop, we stopped creating strings for clauses in list comprehension notation.

python


class Chunk:
    def __init__(self, morphs, dst):
        self.morphs = morphs
        self.srcs = []   #List of original clause index numbers
        self.dst  = dst  #Contact clause index number
        
        self.verb = False
        self.noun = False
        self.phrase = ''
        
        for morph in morphs:            
            #For non-symbols Create clauses
            if morph.pos != 'symbol':
                self.phrase += morph.surface
            if morph.pos == 'verb':
                self.verb = True
            if morph.pos == 'noun':
                self.noun = True

Output section

All you have to do is narrow down the output target with the conditional branch of ʻif`.

python


for i, sentence in enumerate(sentences):
    for chunk in sentence:
        if chunk.dst != -1 and \
           chunk.noun and \
           sentence[chunk.dst].verb:
            print('{}\t{}'.format(chunk.phrase, sentence[chunk.dst].phrase))

Output result (execution result)

When the program is executed, the following results will be output. Since there are many, only a part is output.

Output result


Where was born
I don't get it
I have no idea
Crying where you did
I remember only what I was
I saw
For the first time here
I saw something
I will ask you later
Catch us
Placed on the palm
Sue was lifted
When fluffy
I just felt
Calm down on
I saw my face
Would be the beginning of things
I thought it was
Feeling remains
It still remains
Should be decorated with the first hair
The face is slippery
I met after that
I also met a cat
I met once
The center is protruding
Blow from inside
Blow smoke
I was sore and weak
Human drinking
I knew that
I knew about it
Sit behind
Sit in your heart
I started driving at high speed
Does the student move?
Will it move or will it move
Will only I move?
I don't know if it works
Turn your eyes
I feel sick
There is a sound
Out of the eyes
The fire broke out
I remember until then
I remember but I don't know
I don't know the rest
I don't know
Notice
There is no student
Lots of
I can't see my brother
I can't even see a single
I even hid my mother
I hid myself
Unlike the place
I can't even open my eyes
I was abandoned
It was abandoned from above
It was suddenly abandoned
It was abandoned inside
When you crawl out with your thoughts
When you crawl out Sasahara
On the other side
There is a pond
I saw
Sit in front
It doesn't make sense
I wonder if the student will come again
Will you come to meet me
Do it with meow
No one comes
Cross over the pond
The wind crosses
Takes a day
It's dark
I'm hungry
It has decreased very much
With food
There is up to
Make a decision
Started to go around the pond
Started to turn to the left
Put up with that
If you put up with it and crawl
If you forcibly crawl
It came out by the thing
I went to the place
If you crawl here
The bamboo fence collapsed
I sneaked through the hole
I sneaked into the house
I may have starved to death with something.
If the bamboo fence was not torn
I may have starved to death
I may have starved to death on the roadside.
What was the shadow
There is a hole
Until today
I will visit
Visit the calico
It is a passage
Although I sneaked into the mansion
It gets dark in my house
I'm hungry
It's raining
I couldn't do it after cleaning up
I can no longer grace
Go towards
Go towards
Thinking from now on
The time has passed
Crawl inside
I encountered it here
I encountered
Should see humans
I encountered an opportunity
The first thing I met
This is squeezed out
Seen from the student
If you look at it
When I see me
Suddenly grab it
Grab the cervical muscle
I squeezed out to the table

Recommended Posts

100 Language Processing Knock-43: Extracted clauses containing nouns related to clauses containing verbs
100 Language Processing Knock (2020): 28
100 Language Processing Knock (2020): 38
100 language processing knock 00 ~ 02
Python inexperienced person tries to knock 100 language processing 14-16
Python inexperienced person tries to knock 100 language processing 07-09
Python inexperienced person tries to knock 100 language processing 10 ~ 13
Python inexperienced person tries to knock 100 language processing 05-06
Python inexperienced person tries to knock 100 language processing 00-04
100 language processing knock 2020 [00 ~ 69 answer]
100 Language Processing Knock 2020 Chapter 1
100 language processing knock 2020 [00 ~ 49 answer]
100 Language Processing Knock-52: Stemming
100 Language Processing Knock Chapter 1
100 Amateur Language Processing Knock: 07
100 Language Processing Knock 2020 Chapter 3
100 Language Processing Knock 2020 Chapter 2
100 Amateur Language Processing Knock: 09
100 Amateur Language Processing Knock: 47
100 Language Processing Knock-53: Tokenization
100 Amateur Language Processing Knock: 97
100 language processing knock 2020 [00 ~ 59 answer]
100 Amateur Language Processing Knock: 67
100 language processing knock-92 (using Gensim): application to analogy data
100 Language Processing Knock-49: Extraction of Dependency Paths Between Nouns
100 Language Processing with Python Knock 2015
100 Language Processing Knock-51: Word Clipping
Language processing 100 knocks-48: Extraction of paths from nouns to roots
100 Language Processing Knock-58: Tuple Extraction
100 language processing knock-50: sentence break
100 Language Processing Knock Chapter 1 (Python)
100 Language Processing Knock Chapter 2 (Python)
100 Language Processing Knock-25: Template Extraction
100 Language Processing Knock-87: Word Similarity
I tried 100 language processing knock 2020
100 language processing knock-56: co-reference analysis
Solving 100 Language Processing Knock 2020 (01. "Patatokukashi")
100 Amateur Language Processing Knock: Summary
I tried to solve 100 language processing knock 2020 version [Chapter 2: UNIX commands 10 to 14]
I tried to solve 100 language processing knock 2020 version [Chapter 2: UNIX commands 15 to 19]
100 Language Processing Knock 2020 Chapter 2: UNIX Commands
100 Language Processing Knock 2015 Chapter 5 Dependency Analysis (40-49)
100 Language Processing Knock with Python (Chapter 1)
100 Language Processing Knock 2020 Chapter 4: Morphological Analysis
100 Language Processing Knock 2020 Chapter 9: RNN, CNN
100 language processing knock-76 (using scikit-learn): labeling
100 language processing knock-55: named entity extraction
I tried 100 language processing knock 2020: Chapter 3
100 Language Processing Knock-82 (Context Word): Context Extraction
100 Language Processing Knock with Python (Chapter 3)
100 Language Processing Knock: Chapter 1 Preparatory Movement
100 Language Processing Knock 2020 Chapter 6: Machine Learning
100 Language Processing Knock Chapter 4: Morphological Analysis
Language processing 100 knock-86: Word vector display
100 Language Processing Knock 2020 Chapter 10: Machine Translation (90-98)
100 Language Processing Knock 2020 Chapter 5: Dependency Analysis
100 Language Processing Knock-28: MediaWiki Markup Removal
100 Language Processing Knock 2020 Chapter 7: Word Vector
100 Language Processing Knock 2020 Chapter 8: Neural Net
100 Language Processing Knock-59: Analysis of S-expressions
Python beginner tried 100 language processing knock 2015 (05 ~ 09)