There is a need to get synonyms in Program to weaken Japanese, and as a result of searching, I found that WordNet is necessary, so I searched for WordNet. I saw it.
Japanese WordNet is "a Japanese concept dictionary in which individual concepts are grouped into units called" synsets ", which are semantically linked to other synsets" (by provided site). I think the main use is for synonym search. Investigation is required for other uses.
WordNet is provided at here, and there seem to be several types, but here, "* Japanese Wordnet and English WordNet in" Take a look at an sqlite3 database * ".
The following 11 tables are included in WordNet.
Of these, the minimum tables required to get synonyms for a particular word were word, synset, and sense. However, since it is lonely that there is no relation of semantic information with that alone, the data model excerpted from these tables after including synset_def is as follows.
The word word i </ sub> belongs to the concept synset j </ sub> and is an image that connects them with the item sense. By the way, using the word "warm" as an example, the result of outputting synset and synset_def is as follows.
Now, I will explain the procedure and program to get a list of synonyms using WordNet. The processing flow is as follows.
The big picture of the code is below.
def search_synonyms(lemma, lang="jpn"):
synonym_list = []
# 1.Get the word id of a word
wobj = get_word(lemma)
if wobj:
word = wobj[0]
# 2.Get the sense of the synset to which the wordid belongs
senses = get_senses(word, lang)
for s in senses:
# 3.Get words that belong to synset as synonyms
synonyms = get_words_from_synset(s.synset, word, lang)
for syn in synonyms:
if syn.lemma not in synonym_list:
synonym_list.append(syn.lemma)
else:
print(f"'{lemma}'No synonyms were found for.")
return synonym_list
Hereafter, we will describe each of the processes 1 to 3.
The processing of the function `` `get_word (lemma) ``` to get the wordid of the target word is as follows. In addition, here, not the wordid alone, but the entire Word object is acquired. (From the point of view of readability and extensibility.)
Word = namedtuple('Word', 'wordid lang lemma pron pos')
def get_word(lemma):
cur = conn.execute("select * from word where lemma=?", (lemma,))
return [Word(*row) for row in cur]
The processing of the function `get_senses (word [, lang])`
that gets the sense from word (id) is as follows.
Sense = namedtuple('Sense', 'synset wordid lang rank lexid freq src')
def get_senses(word, lang):
cur = conn.execute("select * from sense where wordid=? and lang=?", (word.wordid, lang))
return [Sense(*row) for row in cur]
The language limitation ( lang =" jpn "
) may be just the following processing, but I have included it for the time being.
The processing of the function `get_words_from_synset (synset, word [, lang])`
to get the word belonging to it from synset is as follows.
def get_words_from_synset(synset, word):
cur = conn.execute("select * from word where wordid in (select wordid from sense where synset=? and lang=?) and wordid<>?;", (synset, lang, word.wordid))
return [Word(*row) for row in cur]
The final `wordid <> {word.wordid}`
is included to exclude the target word itself. I think there are some patterns in how to write SQL.
Synonyms could be obtained with only 1 to 3, but if you want to see what kind of concept each synonym is similar to, you can also get `` `synset_def```.
SynsetDef = namedtuple('SynsetDef', 'synset lang defi sid')
#Since def cannot be used as a reserved word, it is set to defi.
def get_synset_def_from_synset(synset, lang):
cur = conn.execute("select * from synset_def where synset=? and lang=?", (synset, lang))
return [SynsetDef(*row) for row in cur]
I'm sorry I don't have any new information, but I hope it helps. that's all.