[PYTHON] I tried to complement the knowledge graph using OpenKE

The knowledge graph was complemented using an open source framework called OpenKE. As a memo for myself, I will transcribe the result.

Article flow

  1. [Target of this article](https://qiita.com/yuta0919/private/b49b5a8120bd3336d1d3#%E6%9C%AC%E8%A8%98%E4%BA%8B%E3%81%AE%E5% AF% BE% E8% B1% A1)
  2. [What is the Knowledge Graph](https://qiita.com/yuta0919/private/b49b5a8120bd3336d1d3#%E7%9F%A5%E8%AD%98%E3%82%B0%E3%83%A9%E3% 83% 95% E3% 81% A8% E3% 81% AF)
  3. What is OpenKE
  4. [Program to use](https://qiita.com/yuta0919/private/b49b5a8120bd3336d1d3#%E5%AE%9F%E8%A1%8C%E3%83%97%E3%83%AD%E3%82 % B0% E3% 83% A9% E3% 83% A0% E3% 81% AE% E4% BD% 9C% E6% 88% 90)
  5. Execution result
  6. [Comparison / Evaluation with GitHub](https://qiita.com/yuta0919/private/b49b5a8120bd3336d1d3#%E8%AB%96%E6%96%87%E3%81%A8%E3%81%AE% E6% AF% 94% E8% BC% 83% E8% A9% 95% E4% BE% A1)
  7. Summary

Subject of this article

This article applies to people who fall under any of the following.

--People who are interested in the knowledge graph --People who want to know what python can do --People who want to use OpenKE

What is a knowledge graph?

The knowledge graph shows the connection of various knowledge as a structure.

** Example) ** (obama, born-in, Hawaii)

Data that has a subject, the form of relations, and object relations such as is called a knowledge graph.

If the subject, the form of relations, and the object are $ s, r, and o $, respectively, they remain from the relationship between $ s $ and $ r $ or $ o $ and $ r $. The purpose of this time is to guess $ o and s $.

What is OpenKE

OpenKE is an open source created by Tsinghua University Natural Language Processing and Social Studies Laboratory (THUNLP). It is a framework.

It is a framework dedicated to knowledge graphs written in C ++ and python, and currently seems to support pytorch and tensorflow.

For details, please refer to the following github link or the OpenKE homepage. OpenKE Home Page OpenKE's github

Program to use

Next, the program to be actually executed is shown below. This time, we will use train_distmult_WN18.py in examples.

import openke
from openke.config import Trainer, Tester
from openke.module.model import DistMult
from openke.module.loss import SoftplusLoss
from openke.module.strategy import NegativeSampling
from openke.data import TrainDataLoader, TestDataLoader

# dataloader for training
train_dataloader = TrainDataLoader(
	in_path = "./benchmarks/WN18RR/",
	nbatches = 100,
	threads = 8,
	sampling_mode = "normal",
	bern_flag = 1,
	filter_flag = 1,
	neg_ent = 25,
	neg_rel = 0
)

# dataloader for test
test_dataloader = TestDataLoader("./benchmarks/WN18RR/", "link")

# define the model
distmult = DistMult(
	ent_tot = train_dataloader.get_ent_tot(),
	rel_tot = train_dataloader.get_rel_tot(),
	dim = 200
)

# define the loss function
model = NegativeSampling(
	model = distmult,
	loss = SoftplusLoss(),
	batch_size = train_dataloader.get_batch_size(),
	regul_rate = 1.0
)


# train the model
trainer = Trainer(model = model, data_loader = train_dataloader, train_times = 2000, alpha = 0.5, use_gpu = True, opt_method = "adagrad")
trainer.run()
distmult.save_checkpoint('./checkpoint/distmult.ckpt')

# test the model
distmult.load_checkpoint('./checkpoint/distmult.ckpt')
tester = Tester(model = distmult, data_loader = test_dataloader, use_gpu = True)
tester.run_link_prediction(type_constrain = False)

test_dataloader is "./benchmarks/WN18RR/" model is distmult The loss function is SoftplusLoss () I will leave it. dim is set to the form of 200. All are the same as when downloaded.

There are several other types of executable programs in examples.

Regarding settings

There are three parts that can be changed: dataset, model, and loss.

Make sure that "./benchmarks/WN18RR/" of train_dataloader and test_dataloader are the same. You can use the dataset in the link below for this benchmark. benchmarks

Variables in TrainDataLoader can be changed freely. In addition to nomal, cross can be selected for sampling_mode. (The cross setting may require a slight change to the deeper setting.)

For the model, please refer to the link below. Available models

In addition to Softplus Loss, Margin Loss and Sigmoid Loss can be used for loss.

Execution result

The execution result is as follows. I don't have a GPU machine so I ran it with google colaboratory. スクリーンショット 2020-05-27 19.13.33.png

Comparison / evaluation with GitHub

Let's compare it with the Experiments table on GitHub. The table seems to be the value at Hits @ 10 (filter). スクリーンショット 2020-05-28 13.14.24.png

The average of the experimental results was 0.463306, so the accuracy was 0.015 lower than the value of DistMult on GitHub.

The improvement is to adopt another loss function. Also, I think one way is to change the values of neg_ent, neg_rel, and alpha.

Summary

This time, I tried to complement the knowledge graph using OpenKE. As a result, we did not get the expected results, but since there was room for improvement, we would like to start the improvement points shown above.

Thank you for reading until the end.

Recommended Posts

I tried to complement the knowledge graph using OpenKE
I tried to approximate the sin function using chainer
[Python] I tried to graph the top 10 eyeshadow rankings
I tried to identify the language using CNN + Melspectogram
I tried to compress the image using machine learning
I tried using the checkio API
I tried to estimate the interval.
I tried to simulate ad optimization using the bandit algorithm.
[TF] I tried to visualize the learning result using Tensorboard
I tried to approximate the sin function using chainer (re-challenge)
I tried to output the access log to the server using Node.js
I tried using Azure Speech to Text.
I tried to recognize the wake word
I tried to classify text using TensorFlow
I tried to summarize the graphical modeling.
I tried to digitize the stamp stamped on paper using OpenCV
I tried to estimate the pi stochastically
I tried using the BigQuery Storage API
I tried to predict Covid-19 using Darts
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried to execute SQL from the local environment using Looker SDK
I tried to get the batting results of Hachinai using image processing
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried to extract and illustrate the stage of the story using COTOHA
I tried with the top 100 PyPI packages> I tried to graph the packages installed on Python
I tried to display the altitude value of DTM in a graph
I tried the common story of using Deep Learning to predict the Nikkei 225
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to analyze the New Year's card by myself using python
I tried using scrapy for the first time
vprof --I tried using the profiler for Python
I tried to optimize while drying the laundry
I tried to save the data with discord
I tried to synthesize WAV files using Pydub.
I tried using PyCaret at the fastest speed
I tried using the Google Cloud Vision API
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried using the Datetime module by Python
Qiita Job I tried to analyze the job offer
I tried using the image filter of OpenCV
LeetCode I tried to summarize the simple ones
I tried using the functional programming library toolz
I tried to implement the traveling salesman problem
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried using aiomysql
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried using Ipython
I tried to debug.
I tried using ngrok