[PYTHON] Summary of methods for automatically determining thresholds

Recently, I've been often asked about the desire to automatically determine the threshold, so I've summarized it. Threshold design corresponds to one-dimensional unsupervised two-group clustering. I will. This time, we will summarize five classical and useful methods using the binarization of images as an example.

** Note: ** If the distribution system is unknown, there is no correct answer for threshold determination

Preparation

This time, I will use this photo taken appropriately.

The histogram of the brightness of this image is as follows.

The problem is to create a binary image by setting the optimum threshold value from this brightness distribution.

K-means [1] The most famous method for unsupervised learning is K-means ([wiki](https://ja.wikipedia.org/wiki/K-means clustering)). K-means aims to find the center point {m1, ..., mM} of each class that solves the following optimization problems.

{\rm minimize}\_{m_1,\ldots, m_M} \quad \sum_{i=1}^N \min_j |x_i - m_j|

Here {x1, ..., xN} is the given data.

The center point of the class {m1, ..., mM)} can be found by repeating the following two steps.

Calculate the mean value in the class.
Reassign the data to the closest central class.

K-means can be applied even in the case of multiple dimensions, but in the problem of threshold design, M = 2.

The result of this classification is shown in the following figure.

The features of K-means are as follows. --It is necessary to calculate the distance of N × M order for each update. --There is an initial value dependency.

import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#Data reading
img = cv2.imread('img/main.png',0)
data = img.reshape(-1)

#Label initialization
labels = np.random.randint(0,2,data.shape[0])

#Exit conditions
OPTIMIZE_EPSILON = 1

m_0_old = -np.inf
m_1_old = np.inf

for i in range(1000):
#Calculation of each average
    m_0 = data[labels==0].mean()
    m_1 = data[labels==1].mean()
#Label recalculation
    labels[np.abs(data-m_0) < np.abs(data-m_1)] = 0
    labels[np.abs(data-m_0) >= np.abs(data-m_1)] = 1
#Exit conditions
    if np.abs(m_0 - m_0_old) + np.abs(m_1 - m_1_old) < OPTIMIZE_EPSILON:
        break
    m_0_old = m_0
    m_1_old = m_1
#Since the class changes depending on the initial value, the one with the smaller upper bound is adopted.
thresh_kmeans = np.minimum(data[labels==0].max(),data[labels==1].max())

Otsu method [2]

The Otsu method is a method of maximizing the difference between the weighted mean values of each class (wiki). Specifically, it corresponds to finding the threshold value T that solves the following optimization problem.

{\rm maximize}\_T\quad \sigma_b^2 (T):= w_1 (m_1 - m)^2 +w_2 (m_2 - m)^2

here, $ w_1 = \frac{ | \{x_1,\ldots, x_N\}\leq T|}{N},\quad w_2 = \frac{ | \{x_1,\ldots, x_N\} > T|}{N} $ $ m_1 = \frac{\sum_{x_i\leq T} x_i}{\sum x_i},\quad m_2 = \frac{\sum_{x_i>T} x_i}{\sum x_i} $ It becomes.

The problem of maximizing σb above is the weighted sum of the variances of each class.

\sigma_a^2 (T) := w_1 \frac{\sum_{x_i\leq T} (x_i - m_1)^2}{N} +w_2 \frac{\sum_{x_i>T} (x_i - m_2)^2}{N}

Consistent with the problem of minimizing. Therefore, we may solve the problem of minimizing σa and the problem of minimizing σa / σb.

The result of this classification is shown in the following figure.

The code in python is written below.

# Define Otsu scoring function
def OtsuScore(data,thresh):
    w_0 = np.sum(data<=thresh)/data.shape[0]
    w_1 = np.sum(data>thresh)/data.shape[0]
    # check ideal case    
    if (w_0 ==0) | (w_1 == 0):
        return 0
    mean_all = data.mean()
    mean_0 = data[data<=thresh].mean()
    mean_1 = data[data>thresh].mean()
    sigma2_b =  w_0 *((mean_0 - mean_all)**2) + w_1 *((mean_1 - mean_all)**2)

    return sigma2_b

# Callculation of Otsu score and analyze the optimal
scores_otsu =  np.zeros(256)
for i in range(scores_otsu.shape[0]):
    scores_otsu[i] = OtsuScore(data,i)
thresh_otsu = np.argmax(scores_otsu)

Method by Sezan et al. [3]

This is a threshold design method that utilizes the fact that there are two peaks in the image brightness distribution. The threshold is designed by finding the value of the inner skirt of the two peaks.

First, the smoothed histogram is used for polar calculation to detect the leftmost (small value) peak and the rightmost (large value) peak, and set them as [m0, m1], respectively. Similarly, the left and right hem of each peak are calculated by the polar calculation, and the left hem is [s0, s1] and the right hem is [e0, e1].

At this time, the threshold value is considered to be between the right hem e0 of the left peak and the left hem s1 of the right peak. Therefore, the threshold value is designed using the design parameter γ of 0 or more and 1 or less.

{\rm Sezan~threshold} := (1 - \gamma )e_0 + \gamma s_1

The result of this classification is shown in the following figure.

The features of the binary method by Sezan et al. Are as follows. --There are many hyperparameters to adjust, but you can use them properly according to the situation. --Difficult to make it multidimensional

The python code looks like this:

#Smoothing parameters
sigma = 5.0

#Percentage of importance
# gamma = 1 :Take a wide area of black
# gamma = 0 :Take a wide area of white
gamma = 0.5
#Gaussian kernel for smoothing
def getGaus(G_size,G_sigma):
    G_kernel = np.zeros(G_size)
    G_i0 = G_size//2
    for i in range(G_size):
        G_kernel[i] = np.exp(-(i-G_i0)**2 / (2*G_sigma**2))
#Adjust so that the sum is 1
    G_kernel = G_kernel/G_kernel.sum()

    return G_kernel
#Creating a Gaussian kernel
kernel = getGaus(55,sigma)
#Histogram
num_hist, range_hist = np.histogram(data, bins= 256)
mean_hist = (range_hist[1:] + range_hist[:-1]) / 2

#Smoothing
hist_bar = np.convolve(num_hist,kernel,'same')
#Calculation of difference
d_hist = hist_bar[:-1] - hist_bar[1:]
#Edge processing
d_hist = np.r_[[0],d_hist,[0]]

#Peak detection
m = np.where((d_hist[1:] >=0) & (d_hist[:-1] <=0))[0]
#Local detection
es =np.where((d_hist[1:] <=0) & (d_hist[:-1] >=0))[0]

#Maximum and minimum peaks
m0 = m.min()
m1 = m.max()

#Locality before and after the peak
s0 = es[es<m0].max()
e0 = es[es>m0].min()
s1 = es[es<m1].max()
e1 = es[es>m1].min()

#Threshold determination
thresh_Sezan = (1 - gamma) * mean_hist[e0] + gamma * mean_hist[s1]

Minimize the amount of Kullback-Leibler information [4]

This is a method that uses the amount of Kullback-Leibler (KL) information that is familiar to information theory researchers. There are many papers that have been studied based on information criteria other than KL information.

First, the probability distribution obtained by normalizing the luminance histogram is defined as p. Next, the probability relation distribution q (T) corresponding to the binarization determined by the threshold value T is defined as follows. $ q(i\leq T) = 0,\quad q(i>T) = 1/M $

Where M is the number of elements greater than the threshold. q(T)KL information amount D by and p(q(T)||p)The threshold value is the value that minimizes. In other words, it solves the following optimization problem. $ {\rm minimize}_T\quad D(q(T)||p) $ The result is shown in the figure below.

The features are as follows. --Convenient for estimating background noise --It cannot be applied unless the distribution is biased.

The python code is shown below. Note that in the code, the KL information amount is transformed as follows in order to avoid "0log0".

D(q(T)||p) = \sum_{i=1}^N q(T) \ln \frac{q(T)}{p}= -\ln M - \sum_{i=1}^N q(T) \ln p \\

def InfoScore(data,thresh,bins=100):
    num_hist, range_hist = np.histogram(data, bins= bins)
    mean_hist = (range_hist[1:] + range_hist[:-1]) / 2
    p = num_hist/num_hist.sum()
    N = p.shape[0]
    M = np.sum(mean_hist>thresh)
    if (M== 0):
        return np.inf
    q = np.zeros(N)
    q[mean_hist>thresh] = 1 / M
    Dqp = - np.log(M)  - np.sum(q*np.log(p))
    return Dqp

# Callculation of Otsu score and analyze the optimal
scores_info =  np.zeros(256)
for i in range(scores_info.shape[0]):
    scores_info[i] = InfoScore(data,i,bins = 190)
thresh_info = np.argmin(scores_info)

Mixed Gauss model fitting

This is a method of fitting a mixed Gaussian distribution, assuming that the distributions of the two classes are Gaussian distributions. It is known to have a Gaussian distribution and is recommended to be used only when there is sufficient data.

The intersection of the two Gaussian distributions is the threshold.

As a problem: --If it is not Gaussian shape + long tail, it cannot be estimated well. --There is an initial value dependency

There is a problem.

Below is the result and python code. The pyhton code doesn't use packages for studying, but unless you have a particular bias, go to here. Please use it.

# def gaus func
def gaus(x,mu,sigma2):
    return 1/np.sqrt(2 * np.pi * sigma2) *np.exp(-(x-mu)**2/(2*sigma2))

# Class Number
M=2
# Optimmization Condition
OPTIMIZE_EPS = 0.01

# Init
y =  data.copy()
mu = 256*np.random.rand(M)
sigma2 = 100*np.random.rand(M)
w = np.random.rand(M)
w = w / w.sum()


for cycle in range(1000): 
    # E step
    gamma_tmp = np.zeros((M,y.shape[0]))
    for i in range(M):
        gamma_tmp[i] = w[i] * gaus(y,mu[i],sigma2[i])
    gamma = gamma_tmp/ gamma_tmp.sum(axis = 0).reshape(1,-1)
    Nk = gamma.sum(axis=1)
    # M step
    mus = (gamma * y).sum(axis = 1)/Nk
    sigma2s = (gamma *((y.reshape(1,-1)- mu.reshape(-1,1))**2)).sum(axis = 1)/Nk
    ws = Nk / Nk.sum()
    # check break condition
    if (np.linalg.norm(mu-mus)<OPTIMIZE_EPS)& (np.linalg.norm(sigma2-sigma2s)<OPTIMIZE_EPS) & (np.linalg.norm(w-ws)<OPTIMIZE_EPS):
        break
    # updata
    mu = mus
    sigma2 = sigma2s
    w = ws
    
print(cycle)

# make distribution
x = np.arange(0,256,1)
p = np.zeros((M,x.shape[0]))
for i in range(M):
    p[i] = w[i] * gaus(x,mu[i],sigma2[i])

# find threshold
if m[0]<m[1]:
    thresh_gaus = np.where(p[0]<p[1])[0].min()
else:
    thresh_gaus = np.where(p[0]>p[1])[0].min()

Summary

The thresholds using the above four methods are as shown in the table below.

	K-means	Otsu	Sezan	KL divergence	Gauss Fitting
Threshold	109.0	109.0	90.558594	145.0	147.0

The binarized image is as shown in the figure below.

I can't say which one is better, so let's use it properly according to the situation! For the time being, I recommend it from the top of this article.

Some papers refer to [5]. It feels good to be well organized.

Code details

https://github.com/yuji0001/Threshold_Technic

Reference

[1] H, Steinhaus,“Sur la division des corps matériels en parties” (French). Bull. Acad. Polon. Sci. 4 (12): 801–804 (1957).

[2] Nobuyuki Otsu. "A threshold selection method from gray-level histograms". IEEE Trans. Sys. Man. Cyber. 9 (1): 62–66 (1979).

[3] M. I. Sezan, ‘‘A peak detection algorithm and its application to histogram-based image data reduction,’’ Graph. Models Image Process. 29, 47–59(1985).

[4] C. H. Li and C. K. Lee, ‘‘Minimum cross-entropy thresholding,’’ Pattern Recogn. 26, 617–625 (1993).

[5] Sezgin, M. Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, 13(1), 220(2004).

Author Yuji Okamoto [email protected]