[PYTHON] Replace the named entity in the read text file with a label (using GiNZA)

I had a hard time as it was, so I will post it for the time being. There may be a better way. If you are a beginner like me, please refer to it.

The environment is python 3.6.9 and Ubuntu 18.04.4.

change_NER.py


# coding:utf-8
import spacy

with open('input.txt','r') as f:
  nlp = spacy.load('ja_ginza')
  data = f.read()
  doc = nlp(data)

with open('output.txt','w') as f:

    text = list(data)                               #Store each character in the list
    entity = [ent.label_ for ent in doc.ents]       #Named entity label
    start = [ent.start_char for ent in doc.ents]    #From what character is the named entity
    end = [ent.end_char for ent in doc.ents]        #What character is the named entity
    num = 0                                        
    stop = False

    for i in range(len(text)):
        if i == start[num]:
            f.write(entity[num])
            if num < len(start) - 1: #Out of range prevention
                num += 1
            stop = True

        elif stop == True:
            if i < end[num-1]: #Only the number of characters in the named entity
                continue    #Consume i
            elif i == end[num-1]:
                stop = False
                f.write(text[i])

        else:
            f.write(text[i])

Recommended Posts

Replace the named entity in the read text file with a label (using GiNZA)
Read a file in Python with a relative path from the program
Replace the directory name and the file name in the directory together with a Linux command.
Get the file name in a folder using glob
[Sublime Text 2] Always execute a specific file in the project
Process the contents of the file in order with a shell script
Read the config file in Go language! Introducing a simple sample
[Python] Read a csv file with a large data size using a generator
A memo organized by renaming the file names in the folder with python
Extract lines that match the conditions from a text file with python
Read a Python # .txt file for a super beginner in Python with a working .py
Let's read the RINEX file with Python ①
Create a large text file with shellscript
Read the file line by line in Python
Read the file line by line in Python
Create a record with attachments in KINTONE using the Python requests module
Read a character data file with numpy
[Python] Read the specified line in the file
Read text in images with python OCR
Various ways to read the last line of a csv file in Python
Save the setting conditions as a CSV file using UDF Manager in OCTA
How to read a CSV file with Python 2/3
Create a GIF file using Pillow in Python
[Python] Get the files in a folder with Python
Speaking Japanese with OpenJtalk (reading a text file)
Read a file containing garbled lines in Python
Speaking Japanese with gTTS (reading a text file)
Perform entity analysis using spaCy / GiNZA in Python
Save the object to a file with pickle
Create a MIDI file in Python using pretty_midi
Read line by line from a file with Python
How to read a file in a different directory
I tried to extract the text in the image file using Tesseract of the OCR engine
Python2 / numpy> Replace only a specific column in a file with column data from another file> numpy.c_
I want to extract only pods with the specified label using Label Selector in Client-go
Convert a text file with hexadecimal values to a binary file
Read the csv file and display it in the browser
Read the linked list in csv format with graph-tool
Run a Python file with relative import in PyCharm
Save the pystan model and results in a pickle file
[Python] Read the csv file and display the figure with matplotlib
Access the file with a relative path from the execution script.
Rename and replace remote files using NcFTP in the shell
Read the GRIB2 file of the Japan Meteorological Agency with pygrib
Process the files in the folder in order with a shell script
How to generate a query using the IN operator in Django
[Note] Import of a file in the parent directory in Python
Read the VTK file and display the color map with jupyter.
Settings for running a test each time you save a file in the editor using watchmedo (watchdog)