[PYTHON] Sparse modeling: Chapter 3 Tracking algorithm
Sparse modeling Chapter 3 Implemented the algorithm of the tracking algorithm in Python.
Jupyter notebook summarizing the code and experimental results.
Implemented the following greedy algorithm and convex relaxation method. IRLS is a little suspicious ...
Greedy algorithm
- Orthogonal matching pursuit (OMP)
- Matching pursuit (MP)
- Weak matching pursuit (WMP)
- Threshold algorithm
Convex relaxation method
- Iterative-reweighted-lease-sqaures (IRLS)
Overview of greedy algorithm
Initialization
As $ k = 0 $
- Initial solution $ \ mathbf {x} ^ {0} = \ mathbf {0} $
- Initial residual $ \ mathbf {r} ^ {0} = \ mathbf {b} $
- Initial support for solution $ S ^ {0} = \ emptyset $
Main loop
Perform the following steps as $ k \ leftarrow k + 1 $.
- Error calculation, support update: Column $ \ mathbf {a} _ {j} $ that can reduce the most residual $ \ mathbf {r} ^ {k} $ from $ \ mathbf {A} $ To $ S ^ k $
1.Provisional World Update:\\| \mathbf{Ax}-\mathbf{b}\\|^{2}_{2} \, \rm{subject} \, \rm{to} \, \rm{Support}\\{\mathbf{x}\\}=S^{k}Optimal solution of\mathbf{x}^{k}Let.
- Update the residuals: Calculate $ \ mathbf {r} ^ {k} = \ mathbf {b}-\ mathbf {Ax} ^ {k} .
1.Stop condition: If\|\mathbf{r}^{k}\|<\epsilon_{0}$If so, it ends, otherwise it repeats.
In error calculation and support update, the inner product of all columns $ \ mathbf {a} _ {j} $ of $ \ mathbf {A} $ and the residual $ \ mathbf {r} $ is calculated, and the absolute value of the inner product is calculated. Add the column that maximizes to the support. In other words, columns parallel to $ \ mathbf {r} $ are added in order.
Differences in greedy algorithm
- OMP
- Stay on top
- MP
- Updated tentative solution only with newly added $ \ mathbf {a} _j $ instead of $ \ mathbf {A} $
- WMP
- Add scalar $ t , (0 <t <1) $ to hyperparameters
*With error calculations and support updates\mathbf{r}^{k-1}The absolute value of the inner product witht\\|\mathbf{r}^{k-1}\\|More than\mathbf{a}_jIf is found, the calculation is terminated. The rest is the same as MP.
- Threshold algorithm
- Added $ k $ number of basis functions included in support to hyperparameters
- $ k $ from the top of $ \ mathbf {a} _j $, which has a large absolute value of the inner product with $ \ mathbf {b} $, is supported. After that, solve the least squares problem and set it as $ \ mathbf {x} $.
IRLS overview
- Solve the $ \ left (P_ {1} \ right) $ problem instead of the $ \ left (P_ {0} \ right) $ problem.
- Approximate the $ L_ {1} $ norm with the weighted $ L_ {2} $ norm to solve the $ \ left (P_ {1} \ right) $ problem.
- It takes a lot of repetition to get a sparse solution.
Initialization
As $ k = 0 $
- Initial approximate solution $ \ mathbf {x} ^ {0} = \ mathbf {1} $ (all 1 vectors)
- Initial weight matrix $ \ mathbf {X} ^ {0} = \ mathbf {I} $
Main loop
Perform the following steps as $ k \ leftarrow k + 1 $.
- Least Squares: Linear simultaneous equations $ \ mathbf {x} \ _ {k} = \ mathbf {X} \ _ {k-1} ^ {2} \ mathbf {A} ^ {T} (\ mathbf Solve and approximate {A} \ mathbf {X} _ {k-1} ^ {2} \ mathbf {A} ^ {T}) ^ {+} \ mathbf {b} $ directly or using iterative methods Get the solution $ \ mathbf {x} \ k .
1.Weight update:X{k}(j,j)=|x_{k}(j)|^{0.5}As a weighted diagonal matrix\mathbf{X}Update
1.Stop condition: If\|\mathbf{x}_{k}-\mathbf{x}_{k-1}\|_{2}$If is less than the pre-given threshold, it ends, otherwise it repeats.
Experiment
simulation
$ \ mathbf {x} $ 50 dimensional vector
$ \ mathbf {b} $ 30 dimensional vector
$ \ mathbf {A} $ 30 × 50 dimensional matrix, column normalization of uniform random numbers of $ [-1, 1) $
- $ \ mathbf {x} $ has a non-zero number of elements $ k $, and the value of the non-zero element is iid from a uniform distribution in the range of $ [-2, -1] \ cup [1, 2] $. Extract.
- Simulate 200 times each with $ k = 1, ..., 10 $
index
- The $ L_2 $ norm of the estimated $ \ mathbf {\ hat {x}} $ and the true $ \ mathbf {x} $
*Estimated\mathbf{\hat{x}}And true\mathbf{x}Distance between supports$dist(\hat{S},S)=\frac{max\\{|\hat{S}|, |S| \\} - |\hat{S}\cap S|}{max\\{|\hat{S}|,|S|\\}}$
- Take the average of 200 simulations.
result
Average $ L_ {2} $ error
- As k increases, the error increases.
- The error of IRLS is small, but it is natural because it is $ L_ {2} $ error
- OMP is also doing its best. It's fast
Average distance between supports
- When k is low, the distance between IRLS supports has increased. There is no such thing in the textbook, so it may be insufficient or defective.
- OMP is also doing its best.
Consideration
- OMP I'm doing my best
- IRLS does not match the results of the textbook, so it is necessary to investigate the cause.