[PYTHON] Creating an interactive application using a topic model

I want to heal my eyes with the blue ocean rather than burning my eyes with the blue light.

2015 is nearing the end, but how are you all doing? Since I want to return to nature around the end of the year, this time I will introduce how to create an application that makes travel proposals interactively using a topic model.

This article is a sister article of the previously published Creating an application using the topic model. At that time, application creation was not catching up, and although we said "application creation", we did not reach application creation, so it will be a complement to that. I will not touch on the topic model itself this time, so if you are interested, please refer to the above article.

What is a topic model?

I will leave the detailed explanation to here, but the topic model is to classify documents by topic as the name suggests. It is a method of. Specifically, the "topic" here has the following image.

image

This is a word cloud created from a travel blog. A "topic" is thus composed of words, and some of the words are frequent and some are not. Estimating the probability distribution that defines the "appearance word" and "appearance probability" is the main focus of the topic model. If this probability distribution is clarified, it will be possible to classify documents with similar distributions, and it will also be possible to estimate the degree of relevance between documents from the distance between distributions.

Application to interactive applications

Topics can be represented by probability distributions as described above, so that the distance between distributions can be calculated (this time I used KL-divergence). Using this distance, try to propose a spot for topic A, and if the answer is No, propose a distant topic (topic B, which is the farthest in the figure), and implement it with a simple policy. I will try.

image

Implementation of interactive application

The application implemented this time is here.

I suggest about 3 candidates from the same topic, which can be switched with the arrow below. If there is something you like / image is different, you can evaluate it with the Good / Bad button below. Receive ratings and make suggestions for similar / distant topics.

image

Since it has a Heroku Button, you can deploy it to your Heroku environment. Try it with the topic model I made! That's quite possible. As data, the API of AB-ROAD is used, and this usage registration is required.

enigma_abroad

Application implementation

The application configuration is as follows.

image

In the composition, I paid attention to the following points.

After that, like the application, write the test code exactly for the machine learning model, and attach the document with iPython notebook for the machine learning model.

The construction assumptions and verification of the topic model constructed this time can be referred from the following iPython notebook.

enigma_abroad/pola/machine/topic_model_evaluation.ipynb

Building a topic model

Of course, when making a proposal, it is essential to build a topic model, which is the brain of the application. This time, like the sister article Creating an application using a pick model, [gensim]( I built it using https://radimrehurek.com/gensim/) (I tried using pymc as well, but it was sealed because the memory was lost due to learning). And sadly, the accuracy wasn't as good as it was ... but I'm going to continue here.

In addition, when it comes to actually using machine learning in an application, it is unlikely that "accuracy 99% or!" Like the tutorial, and even if there is, it is either a hallucination due to overfitting or a bug created by oneself. It is often the case.

To overcome this, steady data collection and steady data preprocessing are required. Ah ... when I talked about what happened, I was trying to do something cool with machine learning, but before I knew it, I was meticulously setting words to exclude from the corpus ... ・. Content-based recommendation such as the topic model has the advantage that it can be recommended even when the user's evaluation data is irresistible compared to collaborative filtering that is often used for recommendation, but after all the amount of content and its shaping must be done properly. It doesn't work well (there are a number of documents, but the volume of the document itself is also quite good).

Consideration

Although it was made into an application, the essential topic model has not been built well. Last time I dealt with data different from hair salon and this time travel plan, but all of them ended up with sad results that topics could not be classified well.

I think the cause of this is the data problem.

In short, I think it is desirable to apply it in a situation where there are various variations of documents and each is reasonably long. If you want to make more detailed classifications within the same category, I think you need to build some prior knowledge.

I think there are many other ideas, so please try to create your own model and build an application that will take you to Blue Ocean.

image Garrett Gill

Recommended Posts

Creating an interactive application using a topic model
Creating a web application using Flask ②
Creating a web application using Flask ①
Creating a learning model using MNIST
Creating a web application using Flask ③
Creating a web application using Flask ④
Creating a data analysis application using Streamlit
[Day 9] Creating a model
Manipulate topic models ~ Interactive Topic Model ~
Creating a voice transcription web application
Creating a simple table using prettytable
Create an API that returns data from a model using turicreate
WEB application development using Django [Model definition]
Create an application using the Spotify API
Get a reference model using Django Serializer
Creating a virtual environment in an Anaconda environment
I made an image discrimination (cifar10) model using a convolutional neural network.
Creating a position estimation model for the Werewolf Intelligence Tournament using machine learning
[Python] Implementation of clustering using a mixed Gaussian model
Try creating a compressed file using Python and zlib
(Python) Try to develop a web application using Django
Building a seq2seq model using keras's Functional API Overview
Creating a graph using the plotly button and slider
I tried hosting a Pytorch sample model using TorchServe
Try running a Django application on an nginx unit
How to deploy a Go application to an ECS instance
Building a seq2seq model using keras' Functional API Inference
PyTorch Learning Note 2 (I tried using a pre-trained model)
Make an application using tkinter an executable file with cx_freeze