[PYTHON] Visualize claims with AI

0. point

    1. The scope of patent rights is expressed in words as claims.
  1. Claims often have a complicated sentence structure because the description is strict, and may be misunderstood by some readers.
    1. I tried to visualize the claims so that there would be no discrepancy in recognition among the inventor, patent attorney, and examiner. (The picture below is part of the visualization.)

pic.png

1. 1. Background

Intellectual property rights, especially patent rights, are powerful weapons that open up a new era. The validity of the patent right is defined by the text (claims) described in "Claims". As a matter of course, each claim is rigorously written so that the components of the patent right are "necessary" and "sufficient", so that the sentence structure is often complicated. For example, the "claims" of Toyota's patent ("Traffic situation recognition for autonomous vehicles", JP2018198422A) filed for autonomous driving are as follows (excerpt).

JP2018198422A.txt


An acquisition step of acquiring sensor data obtained by sensing the external environment of the vehicle from an external sensor provided in the vehicle, and
An identification step that analyzes the sensor data to identify traffic conditions outside the vehicle,
A generation step of generating graphic data for displaying visual feedback that visually describes the information about the traffic situation, and
A transmission step of transmitting the graphic data to the interface device in order for the interface device to display the visual feedback.
Including methods.

Do you know what you are saying? When I first read it, to be honest, I didn't really understand what it was (laughs). After re-reading, I found two points: (1) it is a patent related to the method, and (2) the method includes four steps. Honestly, even if I read the details of each step, I feel like "I can't get it right away." Of course, it's not the invention that's bad, it's my head (laughs). Based on such a situation, I tried to see if the claims could be visualized.

2. Preparation

Now, for visualization, I use Python, which can be said to be a standard language for AI. This time, we will use Cabocha, considering the structure of the text as a dependency. First of all, preparation, MeCab, CRF ++, Cabocha are required. There is a very good article on how to install it, so please refer to it. I also used it as a reference. We would like to take this opportunity to thank the site creators. The strongest way to use MeCab and CaboCha with Google Colab [▲ 100 knocks on language processing without disappointment == Chapter 5 preparation ==](https://ds-blog.tbtech.co.jp/entry/2020/06/08/%E2%96%B2%E5 % BF% 83% E3% 81% 8F% E3% 81% 98% E3% 81% 91% E3% 81% 9A% E8% A8% 80% E8% AA% 9E% E5% 87% A6% E7% 90 % 86100% E6% 9C% AC% E3% 83% 8E% E3% 83% 83% E3% 82% AF% EF% BC% 9D% EF% BC% 9D5% E7% AB% A0% E4% B8% 8B % E6% BA% 96% E5% 82% 99% EF% BC% 9D% EF% BC% 9D)

If MeCab for morphological analysis can be installed correctly, it will be as follows.

import MeCab
tagger = MeCab.Tagger()
print(tagger.parse("The customer next door is a customer who often eats persimmons"))

output


Neighbor Tonari Tonari Neighbor Noun-Appellative-General 0
Nono particle-Case particles
Customer Cuck Cuck Customer Noun-Appellative-General 0
Waha is a particle-Particle
Well Yoku Yoku Well Adverb 1
Persimmon oyster oyster noun-Appellative-General 0
Eat Kuku Eat verb-General 5th dan-Wah line attributive form-General 1
Customer Cuck Cuck Customer Noun-Appellative-General 0
Da da da auxiliary verb auxiliary verb-Da terminal form-General
EOS

Also, if CRF ++ and Cabocha required for dependency analysis are installed correctly, it will be as follows.

import CaboCha
cp = CaboCha.Parser()
print(cp.parseToString("The customer next door is a customer who often eats persimmons"))

output


next to-D        
Customers-------D
Often---D |
persimmon-D |
Eat-D
It's a customer
EOS

3. 3. Data processing

Now that the installation is complete, let's process the data. First, the prepared text data is read and morphological analysis is performed for each line. Then use Cabocha to do a dependency analysis.

file_path = 'JP2018198422A.txt'
#Preparing an empty list
c_list = []
c = CaboCha.Parser()
#Reading text data
with open(file_path) as f:
  text_list = f.read()
  #Morphological analysis is performed for each line by separating with line breaks.
  for i in text_list.split('\n'):
    cabo = c.parse(i)
    #Prepared c_Store in list.
    c_list.append(cabo.toString(CaboCha.FORMAT_LATTICE))

The result is saved in a file, and the data processing is completed.

#Export
path_w = 'JP2018198422A.txt.cabocha'
#Writelines when writing a list type()
with open(path_w, mode='w') as f:
  f.writelines(c_list)

4. Challenge to visualization

Then, finally, we will challenge visualization. First, read the result of the dependency analysis.

#Reading the result data of the dependency analysis
path = 'JP2018198422A.txt.cabocha'
import re
with open(path, encoding='utf-8') as f:
  _data = f.read().split('\n')

Next, implement the class Morph that represents morphemes. This class has surface form (surface), uninflected word (base), part of speech (pos), and part of speech subclassification 1 (pos1) as member variables.

class Morph:
  def __init__(self, word):
    self.surface = word[0]
    self.base = word[7]
    self.pos = word[1]
    self.pos1 = word[2]
#A list of sentences
sent = []
#Temporary storage for the list to be sent
temp = []

for line in _data[:-1]:
  #Split each element in the list.
  #set[]so"\t "and","When"(space)".
  text = re.split("[\t, ]", line)
  #Use "EOS" as a guide to list each sentence.
  if text[0] == 'EOS':
    sent.append(temp)
    #Empty for use in the next statement.
    temp = []
  #The dependency analysis line is unnecessary this time, so continue
  elif text[0] == '*':
    continue
  #Stores the specified element from the result of morphological analysis in temp as a list of Morph objects.
  else:
    morph = Morph(text)
    temp.append(morph)

Then implement the clause Chunk class. This class has a list of morphemes (Morph objects) (morphs), a list of related clause index numbers (dst), and a list of related original clause index numbers (srcs) as member variables.

#Class Chunk
class Chunk:
  def __init__(self, idx, dst):
    self.idx = idx     #Phrase number
    self.morphs = []   #List of morphemes (Morph objects)
    self.dst = dst     #Contact clause index number
    self.srcs = []     #List of original clause index numbers

import re
#List by sentence
s_list = []
#Chunk object
sent = []
#Morphological analysis result Morph object list
temp = []
chunk = None
for line in _data[:-1]:
  #set[]so"\t "and","When"(space)Specify the delimiter.
  text = re.split("[\t, ]", line)

  #Processing rows for dependency parsing
  if text[0] == '*':
    idx = int(text[1])
    dst = int(re.search(r'(.*?)D', text[2]).group(1))
    #To Chunk object
    chunk = Chunk(idx, dst)
    sent.append(chunk)

  #List by sentence using EOS as a landmark
  elif text[0] == 'EOS':
    if sent:
      for i, c in enumerate(sent, 0):
        if c.dst == -1:
          continue
        else:
          sent[c.dst].srcs.append(i)
      s_list.append(sent)
    sent = []
  else:
    morph = Morph(text)
    chunk.morphs.append(morph)
    temp.append(morph)

#Display of the first line
for m in s_list[0]:
  print(m.idx, [mo.surface for mo in m.morphs], 'Person in charge:' + str(m.srcs),'Contact person:' + str(m.dst))

output


0 ['vehicle', 'To']Person in charge:[]Contact person: 1
1 ['Prepare', 'Be', 'Ta']Person in charge:[0]Contact: 2
2 ['External', 'Sensor', 'From', '、']Person in charge:[1]Contact: 8
3 ['Said', 'vehicle', 'of']Person in charge:[]Contact: 4
4 ['External', 'environment', 'To']Person in charge:[3]Contact: 5
5 ['Sensing', 'Shi', 'hand']Person in charge:[4]Contact: 6
6 ['Gain', 'Be', 'Ta']Person in charge:[5]Contact: 7
7 ['Sensor data', 'To']Person in charge:[6]Contact: 8
8 ['Get', 'To do']Person in charge:[2, 7]Contact: 9
9 ['Get', 'Step', 'When', '、']Person in charge:[8]Contact person:-1

In addition, the text of the original clause and the related clause is extracted.

for s in s_list:
  for m in s:
    #In the case of a clause with a contact
    if int(m.dst) != -1:
      #The pos of the morphological analysis result is'symbol'Items other than are displayed separated by r tabs.
      print(''.join([b.surface if b.pos != 'symbol' else '' for b in m.morphs]),
            ''.join([b.surface if b.pos != 'symbol' else '' for b in s[int(m.dst)].morphs]), sep='\t')

output


Equipped in the vehicle
Provided from an external sensor
Obtained from an external sensor
The external environment of the vehicle
Sensing the external environment
Obtained by sensing
Obtained sensor data
Acquire sensor data
Acquisition step and acquisition step
Analyze the sensor data
Analyze and identify
Outside the vehicle
External traffic conditions
Identify traffic conditions
Identification with identification step
Information on the traffic situation
Depict information
Visually depict
Depict visual feedback
Display visual feedback
To display
Graphic data for
Generate graphic data
Generate with the generation step
Display on the interface device
Display the visual feedback
To display
For the transmission step and
Send the graphic data
Send to the interface device
Send with the send step
including
How to include

Finally, the dependent tree is visualized as a directed graph. Prioritize visibility and visualize only the first step.

#I will try the first part
v = s_list[0]

#Create a list to store a set of clauses
s_pairs = []
for m in v:
  if int(m.dst) != -1:      
    a = ''.join([b.surface if b.pos != 'symbol' else '' for b in m.morphs])
    b = ''.join([b.surface if b.pos != 'symbol' else '' for b in v[int(m.dst)].morphs])
    c = a, b
    s_pairs.append(c)
#Drawing of the dependent tree
import pydot_ng as pydot
img = pydot.Dot(graph_type='digraph')
#Specify a font that supports Japanese
img.set_node_defaults(fontname='Meiryo UI', fontsize='12')
for s, t in s_pairs:
  img.add_edge(pydot.Edge(s, t))
img.write_png('pic')

The result is as below. Same as the first one, but reposted.

pic.png

How is it? I think it's easier to think about what kind of step it is.

5. Finally

To be honest, I was surprised when I first learned that patent rights are basically determined by words. Of course, drawings are also important documents, but whether or not they become patents and the determination of the scope of rights are directly expressed in words. Given these facts, I think it is very important to visualize the sentence structure.

The status of Toyota's autonomous driving patent used in the example is pending. It's hard to get a patent.

For coordination, I referred to the following site. It was a very good site and I learned a lot. We would like to take this opportunity to thank you. [▲ 100 knocks of language processing without disappointment == 40-44 ==](https://ds-blog.tbtech.co.jp/entry/2020/06/12/%E2%96%B2%E5% BF% 83% E3% 81% 8F% E3% 81% 98% E3% 81% 91% E3% 81% 9A% E8% A8% 80% E8% AA% 9E% E5% 87% A6% E7% 90% 86100% E6% 9C% AC% E3% 83% 8E% E3% 83% 83% E3% 82% AF% EF% BC% 9D% EF% BC% 9D% EF% BC% 94% EF% BC% 90% EF% BD% 9E% EF% BC% 94% EF% BC% 94% EF% BC% 9D)

Recommended Posts

Visualize claims with AI
Quickly visualize with Pandas
3. 3. AI programming with Python
Visualize data with Streamlit
Visualize 2019 nem with WordCloud
Visualize 2ch threads with WordCloud-Scraping-
Make Puyo Puyo AI with Python
Visualize location information with Basemap
Visualize Wikidata knowledge with Neo4j
Visualize decision trees with jupyter notebook
Let's make Othello AI with Chainer-Part 1-
Visualize python package dependencies with graphviz
Let's make Othello AI with Chainer-Part 2-
Write Reversi AI with Keras + DQN
[Episode 2] Beginners tried Numeron AI with python
I tried to visualize AutoEncoder with TensorFlow
[Episode 3] Beginners tried Numeron AI with python
[Easy] AI automatic recognition with a webcam!
Detect river flooding with AI (ternary classification)
[Episode 0] Beginners tried Numeron AI with python
Quickly try to visualize datasets with pandas
Visualize latitude / longitude coordinate information with kepler.gl
[Episode 1] Beginners tried Numeron AI with python
Visualize 2ch threads with WordCloud-Morphological analysis / WordCloud-
Visualize point P that works with Python
Let's make a tic-tac-toe AI with Pylearn 2
Build AI / machine learning environment with Python
Visualize scikit-learn decision trees with Plotly's Treemap