Search for synonyms from the word list (csv) using Python Japanese WordNet

What i did

Implemented code to detect synonyms using Japanese WordNet. The specification takes a csv file with words as input, searches for synonyms from the word group, lists them, and outputs a list of synonyms as text. The implementation is mainly based on the material of here.

About WordNet

Knowing Japanese WordNet Since the network is visualized, it is easy to imagine intuitively. If you are interested in the definition of WordNet, please read it.

Here is the official website. A Japanese semantic dictionary developed by the National Institute of Information and Communications Technology (NICT). This implementation requires downloading the official website Japanese Wordnet and English WordNet in an sqlite3 database. .. Download file name: wnjpn.db.gz If you unzip this, you can get the db file of the dictionary data. By loading this db with Python, it is possible to detect synonyms.

Implementation

create_similar_words.py


import sqlite3
import csv
import re
#db connection
conn = sqlite3.connect("wnjpn.db")
# ui
csvfile = 'words.csv'
outfile = 'similar_words.txt'

'''functions
csv_input:Return list by inputting csv
SearchSimilarWords:Create and return a synonym list
create_similar_wordlst:Synonym list shaping
save_synonyms:Save synonym list
'''

def csv_input(path_name):
    rows = []
    with open(path_name,encoding='utf-8') as f:
        reader = csv.reader(f)
        for row in reader:
            rows.append(row)
    return rows


def SearchSimilarWords(word):
    word = ','.join(word)
    cur = conn.execute("select wordid from word where lemma='%s'" % word)
    word_id = 99999999  #temp
    for row in cur:
        word_id = row[0]

    #Determining if a word exists in Wordnet
    if word_id==99999999:
        return
    cur = conn.execute("select synset from sense where wordid='%s'" % word_id)
    synsets = []
    for row in cur:
        synsets.append(row[0])
    simdict = []
    for synset in synsets:
        cur1 = conn.execute("select name from synset where synset='%s'" % synset)
        cur2 = conn.execute("select def from synset_def where (synset='%s' and lang='jpn')" % synset)
        cur3 = conn.execute("select wordid from sense where (synset='%s' and wordid!=%s)" % (synset,word_id))
        for row3 in cur3:
            target_word_id = row3[0]
            cur3_1 = conn.execute("select lemma from word where wordid=%s" % target_word_id)
            for row3_1 in cur3_1:
                #Store similar words in a list
                simdict.append(row3_1[0])
    return simdict


def create_similar_wordlst(full_word):
    parent = []
    child = []
    with open(csvfile, encoding='utf-8') as f:
        reader = csv.reader(f)
        for row in reader:
            child = []
            synonym = SearchSimilarWords(row)
            if not synonym is None:
                row = ','.join(row)
                child.append(row)
                for f_row in full_word:
                    f_row = ','.join(f_row)
                    for syn in synonym:
                        if f_row == syn:
                            child.append(syn)
            if len(child) > 1:
                parent.append(set(child))
    # print(parent)
    return parent


def save_synonyms(lst):
    norlst = []
    for row in lst:
        row = list(row)
        row = ','.join(row)
        norlst.append(row)
    norlst = set(norlst)
    with open(outfile, mode='w') as f:
        for row in norlst:
            f.write(row+'\n')


def main():
    full_word = csv_input(csvfile)
    save_synonyms(create_similar_wordlst(full_word))


if __name__ == "__main__":
    main()

Folder structure

create_similar_words.py wards.csv wnjpn.db

Input file example

This time, for simple implementation, it is assumed that the character string is inserted in only one column. In addition, it is a mechanism to search for synonyms within the characters in ** words.csv. ** **

words.csv


development of
development
・
・
・
get together
Flock
Takaru

Output file example

similar_words.csv


development of,development
・
・
・
get together,Flock,Takaru

Summary

I created a script that searches for synonyms in characters in csv and outputs them in csv. If you have any questions or imperfections in the implementation, please point them out. LGTM is also welcome! Thank you for reading.

Recommended Posts

Search for synonyms from the word list (csv) using Python Japanese WordNet
Search / list synonyms using Japanese WordNet
Try a similar search for Image Search using the Python SDK [Search]
Solve the Japanese problem when using the CSV module in Python.
[Python] Master the reading of csv files. List of main options for pandas.read_csv.
Csv output from Google search with [Python]! 【Easy】
vprof --I tried using the profiler for Python
Python pandas: Search for DataFrame using regular expressions
Refined search for Pokemon race values using Python
Operate the schedule app using python from iphone
Call Polly from the AWS SDK for Python
[Python] How to remove duplicate values from the list
Python: Japanese text: Characteristic of utterance from word similarity
[Boto3] Search for Cognito users with the List Users API
Python: Japanese text: Characteristic of utterance from word continuity
Study from the beginning of Python Hour8: Using packages
A little bit from Python using the Jenkins API
Python error list (Japanese)
Search Twitter using Python
Parse the Researchmap API in Python and automatically create a Word file for the achievement list
Python + ImageMagick> Cut out characters from the BMP format character list (for Shinonome font BMP conversion)
Control smart light "Yeelight" from Python without using the cloud
Convert from Pandas DataFrame to System.Data.DataTable using Python for .NET
I tried using the Python library from Ruby with PyCall
Search for large files on Linux from the command line
Recursively copy files from the directory directly under the directory using Python
Miscellaneous notes that I tried using python for the matter
Google search for the last line of the file in Python
Output product information to csv using Rakuten product search API [Python]
Extract the value closest to a value from a Python list element
Flatten using Python yield from
Python list, for statement, dictionary
Let's search from the procession
Search list for duplicate elements
Search algorithm using word2vec [python]
Python #list for super beginners
tesseract-OCR for Python [Japanese version]
Get Japanese synonyms in Python
Do a search by image from the camera roll using Pythonista3
I tried using the python module Kwant for quantum transport calculation
How to get followers and followers from python using the Mastodon API
[Python] LINE notification of the latest information using Twitter automatic search
[Python of Hikari-] Chapter 05-03 Control syntax (for statement-extracting elements from list-)
Operate Maya from an external Python interpreter using the rpyc module
Let's touch Google's Vision API from Python for the time being
From the introduction of JUMAN ++ to morphological analysis of Japanese with Python
python note: map -do the same for each element of the list
List of disaster dispatches from the Sapporo City Fire Department [Python]
Dump, restore and query search for Python class instances using mongodb
Try using FireBase Cloud Firestore in Python for the time being