[PYTHON] Sparse modeling: Chapter 5 From exact solution to approximate solution

Overview

Sparse modeling Implemented the numerical example in Chapter 5 in Python. Jupyter notebook summarizing the code and experimental results. Introduction

Chapter 5 deals with the case where the observation data contains noise. Since the observation data contains noise, the constraint of the error term is weakened to $ \ epsilon $ or less.

(P_{0}^{\epsilon}): \min_{\mathbf{x}} \|\|\mathbf{x}\|\|_{0} \text{ subject to } \|\|\mathbf{b}-\mathbf{Ax}\|\| _{2} \leq \epsilon

Relaxing the $ \ ell_ {0} $ norm to the $ \ ell_ {1} $ norm yields a variant of $ (P_ {1}) $, basis pursuit denoising (BPDN).

(P_{1}^{\epsilon}): \min_{\mathbf{x}} \|\|\mathbf{x}\|\|_{1} \text{ subject to } \|\|\mathbf{b}-\mathbf{Ax}\|\| _{2} \leq \epsilon

For a suitable Lagrange multiplier $ \ lambda $, the solution of $ (P_ {1} ^ {\ epsilon}) $ is consistent with the solution of the following unconstrained optimization problem.

(Q_{1}^{\lambda}): \min_{\mathbf{x}} \lambda \|\|\mathbf{x}\|\|_{1} + \frac{1}{2} \|\|\mathbf{b}-\mathbf{Ax}\|\| _{2}

A simple way to solve $ (Q_ {1} ^ {\ lambda}) $ is the iterative-reweighted-least-squares (IRLS) method.

\mathbf{X}=\rm{diag}(\mathbf{\|x\|})Then$ ||\mathbf{x}||_{1} = \mathbf{x}\mathbf{X}^{-1}\mathbf{x}It becomes. In other words\ell_{1}The norm is (weighted) squared\ell_{2}It can be regarded as a norm. Current approximate solution\mathbf{x_{\rm{k-1}}}Given,\mathbf{X_{\rm{k-1}}}=\rm{diag}(\left|\mathbf{x_{\rm{k-1}}}\right|)$Anyway, solve the following problem.

(M_{k}): \min_{\mathbf{x}} \lambda \mathbf{x^{\rm{T}}} \mathbf{X^{\rm{-1}}} \mathbf{x} + \frac{1}{2} \|\| \mathbf{b} - \mathbf{Ax} \|\|_{2}^{2}

So the IRLS algorithm uses the current weights $ \ mathbf {X_ {\ rm {k-1}}} $ to get $ \ mathbf {x_ {k}} $ and then $ \ mathbf {x_ {k} } $ Is used to obtain $ \ mathbf {X_ {\ rm {k}}} $ alternately, and approximately $ (Q_ {1} ^ {\ lambda}) $ is obtained. .. Chapter 5 mainly consists of an explanation of IRLS that approximates $ (Q_ {1} ^ {\ lambda}) $ and an outline of LARS (least angle regression stagewise).

IRLS algorithm

Initialization

As $ k = 0 $

Main loop

Set $ k \ leftarrow k + 1 $ and execute the following steps.

*Weight update:\mathbf{x_{\rm{k}}}UsingX_{k}(j,j)=\|x_{k}(j)\|+\epsilonAs a weighted diagonal matrix\mathbf{X}Is updated. *Stop condition: If\|\|\mathbf{x_{\rm{k}}}-\mathbf{x_{\rm{k-1}}}\|\|_{2}If is less than the pre-given threshold, it exits, otherwise it repeats.

Experiment

simulation

$ \ mathbf {x} $ 200 dimensional vector $ \ mathbf {b} $ 100 dimensional vector $ \ mathbf {A} $ 100 × 200 dimensional matrix, column normalization of uniform random numbers of $ [-1, 1) $

One of the issues to be solved is how to determine the value of $ \ lambda $. Here, according to the rule of thumb, $ \ lambda $ is a value close to the ratio of $ \ sigma $ and the standard deviation of the nonzero element. Since the standard deviation of non-zero elements is about 2, the value near $ \ sigma / 2 = 0.05 $ is set as $ \ lambda $.

result

lambda.png

The dashed line is\left\|\log\left(\|\|\mathbf{A\hat{x_{\rm{\lambda}}}}-\mathbf{b}\|\|^{2}/\left(n\sigma^{2}\right)\right)\right\|Indicates the value of\lambdaResidual at what value\|\|\mathbf{A\hat{x_{\rm{\lambda}}}}-\mathbf{b}\|\|Is the power of noisen\sigma^{2}It shows whether it will be about the same as. this is\lambdaAnother empirical rule that determines the value of.

IRLS.png

Compared to the true solution $ \ mathbf {x_ {\ rm {0}}} $ (red circle), $ \ mathbf {x_ {\ rm {0}} when using the optimal $ \ lambda $ value } The position of the nonzero element of $ is best restored. If the value of $ \ lambda $ is small, the solution is dense, and if it is large, the solution is sparse.

path.png

Each plot shows the value of one element of $ \ mathbf {x} $ for $ \ lambda $. The dashed line is the value of $ \ mathbf {x_ {\ rm {0}}} $.

*Optimal\lambdaFunction value for the number of IRLS iterations when(\lambda\|\|\mathbf{x}\|\|_{1}+\frac{1}{2}\|\|\mathbf{b}-\mathbf{Ax}\|\|)

iteration.png

LARS algorithm

Experiment

simulation

$ \ mathbf {x} $ 50 dimensional vector $ \ mathbf {b} $ 30 dimensional vector $ \ mathbf {A} $ 30 × 50 dimensional matrix, column normalization of uniform random numbers of $ [-1, 1) $

index

result

l2_err_OMP_LARS.png

dist_S_OMP_LARS.png

Consideration

Recommended Posts

Sparse modeling: Chapter 5 From exact solution to approximate solution
Sparse modeling: Chapter 3 Tracking algorithm
PRML: Chapter 7 Kernel machine with sparse solution
Deep Learning from scratch ① Chapter 6 "Techniques related to learning"
Sum from 1 to 10