Maximum likelihood estimation implementation of topic model in python

Introduction

Implemented maximum likelihood estimation of topic model in python. As a textbook [Topic model](https://www.amazon.co.jp/ Topic model-Machine learning professional series-Iwata-Guji / dp / 4061529048 / ref = sr_1_2? Topic model) was used.

Structure of this article

Topic model

The topic model assumes that a document has multiple topics.

** Example ** Article on "Medicine Bill Deliberation" => Two topics, "Medical" and "Politics" Articles on "Economic Effects of the Olympics" => Two topics, "Sports" and "Economy"

To better express the assumption that a document has multiple topics Set the topic distribution $ \ boldsymbol \ theta_d = (\ theta_ {d1}, \ cdots, \ theta_ {dK}) $ for each document. $ \ theta_ {dk} = p (k \ | \ \ boldsymbol \ theta_d) $ is the probability that the topic $ k $ will be assigned to the word in the document $ d $. The topic $ z_ {dn} $ is assigned to each word in the document $ d $ according to the topic distribution $ \ boldsymbol \ theta_d $. Words are generated according to the word distribution $ \ boldsymbol \ phi_ {z_ {dn}} $ of the assigned topic. $ \ boldsymbol \ phi_k = (\ phi_ {k1}, \ cdots, \ phi_ {kV}) $ represents the probability that the topic $ k $ will generate the vocabulary $ v $.

The figure below shows the process of generating a specific document set.

topic_model.png

A topic is assigned to each word, and words are generated from the word distribution according to that topic. In the middle of the figure, there is a high probability that the topics "sports" and "economy" will be assigned. Many words related to these two topics are observed. Looking at the word distribution of economic topics, in descending order of probability of generation It is lined up with "economy," "stock price," "economy," and "strong yen."

Stochastic latent semantic analysis

The method of maximum likelihood estimation of the topic model is called ** stochastic latent semantic analysis ** (** PLSA ). By the way, Bayesian estimation of the topic model is a method called ** latent Dirichlet distribution model ** ( LDA **).

PLSA uses the EM algorithm to estimate the maximum likelihood of the topic model. Parameters that maximize the log-likelihood of the following $ \ boldsymbol \ Theta = (\ boldsymbol \ theta_1, \ cdots, \ boldsymbol \ theta_D) $ and $ \ boldsymbol \ Phi = (\ boldsymbol \ phi_1, \ cdots, \ boldsymbol \ phi_K) Find $.

\begin{align}
L &= \sum^D_{d = 1} \sum^{N_d}_{n = 1} \log \sum^K_{k = 1} \theta_{dk}\phi_{kw_{dn}} \\
&= \sum^D_{d = 1} \sum^{N_d}_{n = 1} \log \sum^K_{k = 1} q_{dnk} \frac{\theta_{dk}\phi_{kw_{dn}}}{q_{dnk}} \\
&\geq \sum^D_{d = 1} \sum^{N_d}_{n = 1} \sum^K_{k = 1} q_{dnk} \log \frac{\theta_{dk}\phi_{kw_{dn}}}{q_{dnk}} \\
&= \sum^D_{d = 1} \sum^{N_d}_{n = 1} \sum^K_{k = 1} q_{dnk} \log \theta_{dk}\phi_{kw_{dn}} - \sum^D_{d = 1} \sum^{N_d}_{n = 1} \sum^K_{k = 1} q_{dnk} \log q_{dnk} \equiv F \tag{1}
\end{align}

Use the EM algorithm to maximize the lower bound $ F $. Below, we will expand the formula by dividing it into ** E step ** and ** M step **.

E step

To use Lagrange's undetermined multiplier method, we define $ F_q $ as follows.

F_q \equiv \sum^D_{d = 1} \sum^{N_d}_{n = 1} \sum^K_{k = 1} q_{dnk} \log \theta_{dk}\phi_{kw_{dn}} - \sum^D_{d = 1} \sum^{N_d}_{n = 1} \sum^K_{k = 1} q_{dnk} \log q_{dnk} + \lambda \bigl(\sum^K_{k = 1} q_{dnk} - 1 \bigr) \tag{2}

Partially differentiate $ F_q $ with $ q_ {dnk} $.

\frac{\partial F_q}{\partial q_{dnk}} = \log \theta_{dk}\phi_{kw_{dn}} - \log q_{dnk} - 1 + \lambda = 0 \tag{3}

From $ \ frac {\ partial F_q} {\ partial q_ {dnk}} = 0 $

q_{dnk} = \exp(\lambda - 1) \theta_{dk}\phi_{kw_{dn}} \tag{4}

From $ \ sum ^ K_ {k = 1} q_ {dnk} = 1 $

\begin{align}
& \sum^K_{k = 1} \exp(\lambda - 1) \theta_{dk}\phi_{kw_{dn}} = 1 \\
& \exp(\lambda - 1) = \frac{1}{\displaystyle \sum^K_{k = 1} \theta_{dk}\phi_{kw_{dn}}} \tag{5}
\end{align}

Substitute equation (5) into equation (4) to get $ q_ {dnk} $.

q_{dnk} = \frac{\theta_{dk}\phi_{kw_{dn}}}{\displaystyle \sum^K_{k' = 1} \theta_{dk'}\phi_{k'w_{dn}}} \tag{6}

$ q_ {dnk} $ is called the burden factor and represents the probability that the topic $ k $ will be assigned to the $ n $ th word in the document $ d $.

M step

In M step, the parameters $ \ boldsymbol \ Theta = (\ boldsymbol \ theta_1, \ cdots, \ theta_D) $ and $ \ boldsymbol \ Phi = (\ phi_1, \ cdots, \ phi_K) $ are considered separately. Both use Lagrange's undetermined multiplier method as in E step.

Topic distribution

To use Lagrange's undetermined multiplier method, we define $ F_ \ theta $ as follows.

F_\theta \equiv \sum^D_{d = 1} \sum^{N_d}_{n = 1} \sum^K_{k = 1} q_{dnk} \log \theta_{dk}\phi_{kw_{dn}} - \sum^D_{d = 1} \sum^{N_d}_{n = 1} \sum^K_{k = 1} q_{dnk} \log q_{dnk} + \lambda \bigl(\sum^K_{k = 1} \theta_{dk} - 1 \bigr) \tag{7}

Partial differential of $ F_ \ theta $ with $ \ theta_ {dk} $.

\frac{\partial F_\theta}{\partial \theta_{dk}} = \sum^{N_d}_{n = 1} \frac{q_{dnk}}{\theta_{dk}} + \lambda \tag{8}

From $ \ frac {\ partial F_ \ theta} {\ partial \ theta_ {dk}} = 0 $

\theta_{dk} = - \sum^{N_d}_{n = 1} \frac{q_{dnk}}{\lambda} \tag{9}

From $ \ sum ^ K_ {k = 1} \ theta_ {dk} = 1 $

\begin{align}
& - \sum^K_{k = 1} \sum^{N_d}_{n = 1} \frac{q_{dnk}}{\lambda} = 1 \\
& \lambda = - \sum^K_{k = 1} \sum^{N_d}_{n = 1} q_{dnk} \tag{10}
\end{align}

Substitute equation (10) into equation (9) to get $ \ theta_ {dk} $.

\theta_{dk} = \frac{\displaystyle \sum^{N_d}_{n = 1} q_{dnk}}{\displaystyle \sum^K_{k' = 1} \sum^{N_d}_{n = 1} q_{dnk'}} \tag{11}

Word distribution

To use Lagrange's undetermined multiplier method, we define $ F_ \ phi $ as follows.

F_\phi \equiv \sum^D_{d = 1} \sum^{N_d}_{n = 1} \sum^K_{k = 1} q_{dnk} \log \theta_{dk}\phi_{kw_{dn}} - \sum^D_{d = 1} \sum^{N_d}_{n = 1} \sum^K_{k = 1} q_{dnk} \log q_{dnk} + \lambda \bigl(\sum^V_{v = 1} \phi_{kv} - 1 \bigr) \tag{12}

Partially differentiate $ F_ \ phi $ with $ \ phi_ {kv} $.

\frac{\partial F_\phi}{\partial \phi_{kv}} = \sum^D_{d = 1} \sum^{}_{n:w_{dn} = v} \frac{q_{dnk}}{\phi_{kv}} + \lambda \tag{13}

From $ \ frac {\ partial F_ \ phi} {\ partial \ phi_ {kv}} = 0 $

\phi_{kv} = - \sum^D_{d = 1} \sum^{}_{n:w_{dn} = v} \frac{q_{dnk}}{\lambda} \tag{14}

From $ \ sum ^ V_ {v = 1} \ phi_ {kv} = 1 $

\begin{align}
& - \sum^V_{v = 1} \sum^D_{d = 1} \sum^{}_{n:w_{dn} = v} \frac{q_{dnk}}{\lambda} = 1 \\
& \lambda = - \sum^V_{v = 1} \sum^D_{d = 1} \sum^{}_{n:w_{dn} = v} q_{dnk} \tag{15}
\end{align}

Substitute equation (15) into equation (14) to get $ \ phi_ {kv} $.

\phi_{kv} = \frac{\displaystyle \sum^D_{d = 1} \sum^{}_{n:w_{dn} = v} q_{dnk}}{\displaystyle \sum^V_{v' = 1} \sum^D_{d = 1} \sum^{}_{n:w_{dn} = v'} q_{dnk}}

Implementation

I implemented a maximum likelihood estimation toy program for topic models in python.

import numpy as np

def normalize(ndarray, axis):
	return ndarray / ndarray.sum(axis = axis, keepdims = True)
	
def normalized_random_array(d0, d1):
	ndarray = np.random.rand(d0, d1)
	return normalize(ndarray, axis = 1)

if __name__ == "__main__":
	
	# initialize parameters
	D, K, V = 1000, 3, 6
	theta = normalized_random_array(D, K)
	phi = normalized_random_array(K, V)
	theta_est = normalized_random_array(D, K)
	phi_est = normalized_random_array(K, V)

	# for generate documents
	_theta = np.array([theta[:, :k+1].sum(axis = 1) for k in range(K)]).T
	_phi = np.array([phi[:, :v+1].sum(axis = 1) for v in range(V)]).T

	# generate documents
	W, Z = [], []
	N = np.random.randint(100, 300, D)
	for (d, N_d) in enumerate(N):
		Z.append((np.random.rand(N_d, 1) < _theta[d, :]).argmax(axis = 1))
		W.append((np.random.rand(N_d, 1) < _phi[Z[-1], :]).argmax(axis = 1))
	
	# estimate parameters
	q = np.zeros((D, np.max(N), K))
	T = 300
	likelihood = np.zeros(T)
	for t in range(T):
		print(t)

		# E step
		for (d, N_d) in enumerate(N):
			q[d, :N_d, :] = normalize(theta_est[d, :] * phi_est[:, W[d]].T, axis = 1)

		# M step
		theta_est[:, :] = normalize(q.sum(axis = 1), axis = 1)
		q_sum = np.zeros((K, V))
		for (d, W_d) in enumerate(W):
			v, index, count = np.unique(W_d, return_index= True, return_counts = True)
			q_sum[:, v[:]] += count[:] * q[d, index[:], :].T
		phi_est[:, :] = normalize(q_sum, axis = 1)

		# likelihood
		for (d, W_d) in enumerate(W):
			likelihood[t] += np.log(theta_est[d, :].dot(phi_est[:, W_d])).sum()

	# perplexity
	perplexity = np.exp(-likelihood[:] / N.sum())

result

The result of running the implemented code is shown. Again, each parameter is as follows.

The word distribution estimated as the true word distribution is shown below.

phi
[[ 0.06837655  0.2294022   0.29048815  0.01610669  0.31437584  0.08125057]
 [ 0.19902324  0.19283392  0.0251427   0.23391767  0.10622823  0.24285424]
 [ 0.04165762  0.30678289  0.2579833   0.00882557  0.25580282  0.12894781]]

phi_est
[[ 0.04572686  0.20003413  0.33589344  0.00282668  0.36368718  0.0518317 ]
 [ 0.20342652  0.16884355  0.04403442  0.23473554  0.12226066  0.22669932]
 [ 0.05439782  0.37614278  0.18788228  0.01364525  0.18248741  0.18544446]]

A graph of log-likelihood is shown. The log-likelihood is calculated by the following formula.

\begin{align}
likelihood &= \sum^D_{d = 1} \log p(\boldsymbol w_d \ | \ M) \\
&= \sum^D_{d = 1} \log \prod^{N_d}_{n = 1} \sum^K_{k = 1} \theta_{dk} \phi_{kw_{dn}}
\end{align}
likelihood.png

The graph of perplexity is shown.

perplexity = \exp\Biggl(- \frac{\displaystyle \sum^D_{d = 1} \log p(\boldsymbol w_d \ | \ M)}{\displaystyle \sum^D_{d = 1} N_d}\Biggr)
perplexity.png

in conclusion

We have implemented maximum likelihood estimation for the topic model. Next, I will try ** LDA **.

Recommended Posts

Maximum likelihood estimation implementation of topic model in python
Variational Bayesian inference implementation of topic model in python
Implementation of quicksort in Python
HMM parameter estimation implementation in python
Implementation of life game in Python
Implementation of original sorting in Python
Advantages and disadvantages of maximum likelihood estimation
PRML Chapter 13 Maximum Likelihood Estimating Python Implementation of Hidden Markov Models
Python implementation of continuous hidden Markov model
Example of python code for exponential distribution and maximum likelihood estimation (MLE)
I tried to implement TOPIC MODEL in Python
Maximum likelihood estimation of various distributions with Pyro
Explanation of edit distance and implementation in Python
RNN implementation in python
ValueObject implementation in Python
SVM implementation in python
Let's try again Maximum likelihood estimation and fitting of model (probability distribution) ① Discrete probability distribution
Let's try again Maximum likelihood estimation and fitting of model (probability distribution) ② Continuous probability distribution
[Python] Implementation of clustering using a mixed Gaussian model
Maximum likelihood estimation of mean and variance with TensorFlow
Overview of generalized linear models and implementation in Python
A reminder about the implementation of recommendations in Python
Continuous space topic model implementation
Visualize Keras model in Python 3.5
Kernel density estimation in Python
Equivalence of objects in Python
Python implementation of particle filters
Neural network implementation in python
Maximum number of characters in Python3 shell call (per OS)
Pixel manipulation of images in Python
Sorting algorithm and implementation in Python
Python implementation of self-organizing particle filters
Mixed normal distribution implementation in python
Implementation of login function in Django
Division of timedelta in Python 2.7 series
MySQL-automatic escape of parameters in python
Least squares method and maximum likelihood estimation method (comparison by model fitting)
Handling of JSON files in Python
Waveform display of audio in Python
Implementation of desktop notifications using Python
Python implementation of non-recursive Segment Tree
PRML Diagram Figure 1.15 Bias in Maximum Likelihood Estimate of Gaussian Distribution
General Gaussian state-space model in Python
Implementation of Light CNN (Python Keras)
Law of large numbers in python
Implementation of Dijkstra's algorithm with python
Reversible scrambling of integers in Python
Project Euler # 4 "Maximum Palindrome" in Python
Custom state space model in Python
Implementation of custom user model authentication in Django REST Framework with djoser
Implementation of particle filters in Python and application to state space models
Explanation of production optimization model by Python
Conversion of string <-> date (date, datetime) in Python
Project Euler # 3 "Maximum Prime Factors" in Python
Check the behavior of destructor in Python
(Bad) practice of using this in Python
General Theory of Relativity in Python: Introduction
Project Euler # 11 "Maximum Product in Grid" in Python
Output tree structure of files in Python
Display a list of alphabets in Python 3
PRML Chapter 14 Conditional Mixed Model Python Implementation