I have created code to search for and list synonyms using Japanese WordNet.

What is WordNet in the first place?

Please refer to the following article. Knowing Japanese WordNet I made a tool with python that can search for synonyms using Japanese WordNet

The second linked article was also used as a reference when creating the code.

Code immediately

Environment: Google Colaboratory The flow is to process/extract "wnjpn.db" downloaded from the Japanese WordNet website with sqlite, store it in the DataFrame of pandas, and search for similar words from the created DataFrame.

import gzip
import shutil
import sqlite3
import pandas as pd

#DL and unzip Japanese wordnet
! wget "http://compling.hss.ntu.edu.sg/wnja/data/1.1/wnjpn.db.gz"  # 1~2 minutes

with gzip.open('wnjpn.db.gz', 'rb') as f_in:
    with open('wnjpn.db', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

# synset(Concept ID)And lemma(word)Combination of DataFrame creation
conn = sqlite3.connect("wnjpn.db")
q = 'SELECT synset,lemma FROM sense,word USING (wordid) WHERE sense.lang="jpn"'
sense_word = pd.read_sql(q, conn)

#Define a function that lists and returns synonyms
def get_synonyms(word):
    """Returns a list of synonyms for the input word.

    Args:
        word(str):Words to search for synonyms
　
    Returns:
        list[str]:List of synonyms
    """
    #Search for synonyms Search for word synset
    synsets = sense_word.loc[sense_word.lemma == word, "synset"]

    #Get all the words associated with that synset (set it as there is a possibility of duplication)
    synset_words = set(sense_word.loc[sense_word.synset.isin(synsets), "lemma"])

    #Deleted because the original word will be included
    if word in synset_words:
        synset_words.remove(word)

    return list(synset_words)

#Example of use
get_synonyms("word")
# >> ['Resignation', 'word', 'word', 'word']

#Empty list if you specify a word that is not in WordNet
get_synonyms("Super word")
# >> []

Supplement

I think this part is difficult to understand, so it's a little supplement.

`python`


# synset(Concept ID)And lemma(word)Combination of DataFrame creation
conn = sqlite3.connect("wnjpn.db")
q = 'SELECT synset,lemma FROM sense,word USING (wordid) WHERE sense.lang="jpn"'
sense_word = pd.read_sql(q, conn)

Here, in sqlite, issue a query that joins the "sense" table and "word" table included in "wnjpn.db", and ** all combinations of synset (concept ID) and lemma (word) ** are included. The table is creating. Here, synset is a word concept (ID conversion), and words with the same synset (concept) are synonyms **. The created table will have the following form, and the same synset "00001740-v", "breathing", "breathing", "exhaling", "breathing", and "breathing" are synonyms.

	synset	lemma
1	00001740-n	entity
2	00001740-r	With a cappella
3	00001740-v	Breathe
4	00001740-v	Breathing
5	00001740-v	Vomiting
6	00001740-v	Breathing
7	00001740-v	Breath

The contents of the table used for the join

If you don't know what the "sense" table and "word" table are, it's hard to get an image, so I'll briefly introduce the contents of each. If you want to know more, read the linked article posted at the very beginning.

sense It is a table that shows the word id (word ID) included in the synset (concept ID). The combination of synset and wordid makes it unique. Also, as used in this query, the lang column can be used to determine whether the word is Japanese (jpn) or English (eng).

	synset	wordid	lang	rank	lexid	freq	src
0	02130160-v	155287	eng	0	1	1	eng-30
1	00001740-v	186954	jpn	nan	nan	nan	hand
2	00001740-v	216393	jpn	nan	nan	nan	hand

word It is a correspondence table of wordid (word ID) and lemma (word). It is unique with wordid. This time I'm using it to convert the word id of sense into a word. The lang column is similar to sense.

	wordid	lang	lemma	pos
0	155287	eng	lay_eyes_on	v
1	186954	jpn	Breathing	v
2	216393	jpn	Breath	v

That is all. I would appreciate it if you could point out any mistakes.

[PYTHON] Search / list synonyms using Japanese WordNet

What is WordNet in the first place?

Code immediately

Supplement

python

The contents of the table used for the join

`python`