What i did

Implemented code to detect synonyms using Japanese WordNet. The specification takes a csv file with words as input, searches for synonyms from the word group, lists them, and outputs a list of synonyms as text. The implementation is mainly based on the material of here.

About WordNet

Knowing Japanese WordNet Since the network is visualized, it is easy to imagine intuitively. If you are interested in the definition of WordNet, please read it.

Here is the official website. A Japanese semantic dictionary developed by the National Institute of Information and Communications Technology (NICT). This implementation requires downloading the official website Japanese Wordnet and English WordNet in an sqlite3 database. .. Download file name: wnjpn.db.gz If you unzip this, you can get the db file of the dictionary data. By loading this db with Python, it is possible to detect synonyms.

Implementation

`create_similar_words.py`


import sqlite3
import csv
import re
#db connection
conn = sqlite3.connect("wnjpn.db")
# ui
csvfile = 'words.csv'
outfile = 'similar_words.txt'

'''functions
csv_input:Return list by inputting csv
SearchSimilarWords:Create and return a synonym list
create_similar_wordlst:Synonym list shaping
save_synonyms:Save synonym list
'''

def csv_input(path_name):
    rows = []
    with open(path_name,encoding='utf-8') as f:
        reader = csv.reader(f)
        for row in reader:
            rows.append(row)
    return rows


def SearchSimilarWords(word):
    word = ','.join(word)
    cur = conn.execute("select wordid from word where lemma='%s'" % word)
    word_id = 99999999  #temp
    for row in cur:
        word_id = row[0]

    #Determining if a word exists in Wordnet
    if word_id==99999999:
        return
    cur = conn.execute("select synset from sense where wordid='%s'" % word_id)
    synsets = []
    for row in cur:
        synsets.append(row[0])
    simdict = []
    for synset in synsets:
        cur1 = conn.execute("select name from synset where synset='%s'" % synset)
        cur2 = conn.execute("select def from synset_def where (synset='%s' and lang='jpn')" % synset)
        cur3 = conn.execute("select wordid from sense where (synset='%s' and wordid!=%s)" % (synset,word_id))
        for row3 in cur3:
            target_word_id = row3[0]
            cur3_1 = conn.execute("select lemma from word where wordid=%s" % target_word_id)
            for row3_1 in cur3_1:
                #Store similar words in a list
                simdict.append(row3_1[0])
    return simdict


def create_similar_wordlst(full_word):
    parent = []
    child = []
    with open(csvfile, encoding='utf-8') as f:
        reader = csv.reader(f)
        for row in reader:
            child = []
            synonym = SearchSimilarWords(row)
            if not synonym is None:
                row = ','.join(row)
                child.append(row)
                for f_row in full_word:
                    f_row = ','.join(f_row)
                    for syn in synonym:
                        if f_row == syn:
                            child.append(syn)
            if len(child) > 1:
                parent.append(set(child))
    # print(parent)
    return parent


def save_synonyms(lst):
    norlst = []
    for row in lst:
        row = list(row)
        row = ','.join(row)
        norlst.append(row)
    norlst = set(norlst)
    with open(outfile, mode='w') as f:
        for row in norlst:
            f.write(row+'\n')


def main():
    full_word = csv_input(csvfile)
    save_synonyms(create_similar_wordlst(full_word))


if __name__ == "__main__":
    main()

Folder structure

create_similar_words.py wards.csv wnjpn.db

Input file example

This time, for simple implementation, it is assumed that the character string is inserted in only one column. In addition, it is a mechanism to search for synonyms within the characters in ** words.csv. ** **

`words.csv`


development of
development
・
・
・
get together
Flock
Takaru

Output file example

`similar_words.csv`


development of,development
・
・
・
get together,Flock,Takaru

Summary

I created a script that searches for synonyms in characters in csv and outputs them in csv. If you have any questions or imperfections in the implementation, please point them out. LGTM is also welcome! Thank you for reading.

Search for synonyms from the word list (csv) using Python Japanese WordNet