0. Overview

It is no exaggeration to say that "normalization" is the most used technology for preprocessing datasets in machine learning. By applying normalization, numbers with different dimensions and scales can be handled with the same size, so they are used in various places. This normalization is nothing but a dimensionless quantity such as dividing the quantity by a representative value so that they can be compared with each other. This time I will share about this method.

1. MIN-MAX normalization (0-1)

The most used normalization method that expresses a certain value contained in a certain data set from 0 to 1 will be described.

1.1. encode The formula when the value to be normalized in a certain data set $ \ mathbb {D} $ is $ x $ is as follows.

x_{norm} = \dfrac{x - x_{min}}{x_{max} - x_{min} }

The Python code is below.

def min_max_encode(self, x, source_min, source_max):
  return (x - source_min) / ( source_max - source_min)

1.2. decode Next, decoding is performed as follows. Here, $ x_ {norm} $ is the value converted by encoding.

x = x_{norm}x_{max} - x_{norm}x_{min} + x_{min}

Expression expansion:

x_{norm} = \dfrac{x - x_{min}}{x_{max} - x_{min} }\\
x_{norm}x_{max} - x_{norm}x_{min} = x - x_{min} \\
x_{norm}x_{max} - x_{norm}x_{min} + x_{min} = x

The Python code is below.

def min_max_decode(self, x_norm, source_min, source_max):
  return (x_norm * source_max) - (x_norm * source_min) + source_min

2. Normalization for activation function (-1 to +1)

When using hard-tanh etc. in the activation function such as deep learning, the input value may also be converted to -1 ~ + 1. I would also like to mention the normalization at this time.

2.1. encode The formula when the value to be normalized in a certain data set $ \ mathbb {D} $ is $ x $ is as follows.

x_{norm} = \dfrac{x - x_{min}}{x_{max} - x_{min} } \cdot 2 - 1

The Python code is below.

def min_max_encode(self, x, source_min, source_max):
  return (((x - source_min) / ( source_max - source_min) ) * 2.0) - 1.0

2.2. decode Next, decoding is performed as follows. Here, $ x_ {norm} $ is the value converted by encoding.

x = \dfrac{x_{norm}x_{max} - x_{norm}x_{min} + x_{max} + x_{min}}{2}

Expression expansion:

x_{norm} = \dfrac{x - x_{min}}{x_{max} - x_{min} } \cdot 2 - 1 \\
\dfrac{x_{norm} + 1}{2} = \dfrac{x - x_{min}}{x_{max} - x_{min} }\\
\dfrac{(x_{norm} + 1)(x_{max} - x_{min})}{2}=(x - x_{min})\\
\dfrac{x_{norm}x_{max} - x_{norm}x_{min} + x_{max} - x_{min}}{2}+x_{min}=x\\
\dfrac{x_{norm}x_{max} - x_{norm}x_{min} + x_{max} + x_{min}}{2} = x

The Python code is below.

def min_max_decode(self, x_norm, source_min, source_max):
  return ((x_norm * source_max) - (x_norm * source_min) + source_max + source_min) * 0.5

[PYTHON] Normalization method (encoding) and reversion (decoding)

0. Overview

1. MIN-MAX normalization (0-1)

2. Normalization for activation function (-1 to +1)