[Python] LASSO regression with equation constraints using the multiplier method

LASSO regression problem with linear equation constraints when implementing the algorithm in one paper [^ 1]

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\frac{1}{2}\|y-X\beta\|_2^2 + \rho \|\beta\|_1 \quad\\
\text{subject to}\quad &A\beta = b
\end{align}

There was a scene to solve. Since the calculation is heavy in the previously implemented LASSO with equality / inequality constraints, a new version with only equality constraints has been implemented. In implementation, $ Q = X ^ TX, \ p = -X ^ Ty $, and the equivalent problem

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\frac{1}{2}\beta^TQ\beta + p^T\beta + \rho \|\beta\|_1 \quad\\
\text{subject to}\quad &A\beta = b
\end{align}

I thought about solving.

Multiplier method

Linear equation constrained optimization problem

\begin{align}
\mathop{\text{minimize}}_{\beta,z}\quad &f(\beta) + g(z) \quad\\
\text{subject to}\quad &M\beta + Fz = c
\end{align}

The multiplier method is known as a method for dealing with. In the multiplier method, the extended Lagrange function

\mathcal{L}_\tau (\beta,z,\nu) = f(\beta) + g(z) + \nu^T(M\beta + Fz - c) + \frac{\tau}{2}\|M\beta + Fz - c\|_2^2

Update the variables to alternately minimize for $ \ beta, z $.

\newcommand{\argmin}{\mathop{\rm arg\,min}\limits}
\begin{align}
\beta^{(t+1)} &\leftarrow \argmin_\beta \mathcal{L}_\tau (\beta, z^{(t)}, \nu^{(t)})\\
z^{(t+1)} &\leftarrow \argmin_z \mathcal{L}_\tau (\beta^{(t+1)}, z, \nu^{(t)})\\
\nu^{(t+1)} &\leftarrow \nu^{(t)} + \tau(M\beta^{(t+1)}+Fz^{(t+1)}-c)
\end{align}

Application of multiplier method to equation-constrained LASSO regression

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\frac{1}{2}\beta^TQ\beta + p^T\beta + \rho \|z\|_1 \quad\\
\text{subject to}\quad &\left(\begin{array}{c}
A\\
I
\end{array}\right)\beta + \left(\begin{array}{c}
O\\
-I
\end{array}\right) z = \left(\begin{array}{c}
b\\
0
\end{array}\right)
\end{align}

By rewriting the problem

\begin{align}
f(\beta)=\frac12 \beta^TQ\beta + p^T\beta,\ 
g(z) = \rho\|z\|_1,\\
M=\left(\begin{array}{c}
A\\
I
\end{array}\right),\ 
F=\left(\begin{array}{c}
O\\
-I
\end{array}\right),\ 
c=\left(\begin{array}{c}
b\\
0
\end{array}\right)
\end{align}

As a multiplier method can be applied. At that time, set the extended Lagrange function as $ \ nu = (u ^ T, v ^ T) ^ T $.

\mathcal{L}_\tau (\beta,z,\nu) = \frac12 \beta^TQ\beta + p^T\beta + \rho\|z\|_1 + u^T(A\beta - b) + v^T(\beta - z) + \frac{\tau}{2}\|A\beta - b\|_2^2 + \frac{\tau}{2}\|\beta - z\|_2^2

Can be written as.

Beta update formula

Optimum condition

\frac{\partial \mathcal{L}_\tau}{\partial \beta}(\beta, z^{(t)}, \nu^{(t)}) = Q\beta + p + A^Tu^{(t)} + v^{(t)} + \tau A^T(A\beta - b) + \tau(\beta - z^{(t)}) = 0

To solve

\beta^{(t+1)} = \left(Q+\tau(A^TA+I)\right)^{-1}\left(-p-A^Tu^{(t)}-v^{(t)}+\tau A^Tb+\tau z^{(t)}\right)

Update with.

z update formula

Optimum condition

\partial_z \mathcal{L}_\tau(\beta^{(t+1)}, z, \nu^{(t)}) = \rho\partial_z \|z\|_1 - v^{(t)} - \tau\left(\beta^{(t+1)} - z\right)\ni0

From the update formula

z^{(t+1)} = s\left(\beta^{(t+1)} + \frac1\tau v^{(t)}; \frac\rho\tau\right)

Is obtained. Where $ s (\ cdot; \ cdot) $ is the soft threshold function

s(x; \lambda) = \mathrm{sign}\,(x)\max\left\{x - \lambda, 0\right\}

Represents.

Variable initialization

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\|\beta\|_1 \quad\\
\text{subject to}\quad &A\beta= b
\end{align}

By solving, we get $ \ beta ^ {(0)} $ that satisfies the constraint condition. This problem can be regarded as a linear programming problem by introducing a new variable $ \ gamma $.

\begin{align}
\mathop{\text{minimize}}_{\beta,\gamma}\quad &\sum_{i=1}^p \gamma_i \quad\\
\text{subject to}\quad &A\beta= b,\\
&\ \gamma\geq0,\ \gamma\geq\beta,\ \gamma\geq-\beta
\end{align}

Algorithm summary

  1. Initialize $ \ beta ^ {(0)} $ and $ z ^ {(0)} \ leftarrow \ beta ^ {(0)}, \ u ^ {(0)} \ leftarrow0, \ v ^ { (0)} \ leftarrow0, \ t \ leftarrow 0 $.
  2. Output $ z ^ {(t)} $ if the stop condition is met
  3. \beta^{(t+1)} \leftarrow \left(Q+\tau(A^TA+I)\right)^{-1}\left(-p-A^Tu^{(t)}-v^{(t)}+\tau A^Tb+\tau z^{(t)}\right)
  4. z^{(t+1)} \leftarrow s\left(\beta^{(t+1)} + \frac1\tau v^{(t)}; \frac\rho\tau\right)
  5. u^{(t+1)} \leftarrow u^{(t)} + \tau(A\beta^{(t+1)}-b)
  6. v^{(t+1)} \leftarrow v^{(t)} + \tau(\beta^{(t+1)}-z^{(t+1)})
  7. Set $ t \ leftarrow t + 1 $ and go to step 2.

Source code

Implemented in Python. We used scipy.optimize.linprog to initialize $ \ beta $.

import numpy as np
from scipy.optimize import linprog


class EqualityConstrainedL1QuadraticProgramming:
    """
    Problem
    ----------
    minimize 0.5 * βQβ + pβ + ρ||β||_1
    subject to Aβ = b

    Algorithm
    ----------
    Alternating Direction Method of Multipliers (ADMM)

    Parameters
    ----------
    A : np.ndarray, optional (default=None)
        The equality constraint matrix.

    b : np.ndarray, optional (default=None)
        The equality constraint vector.

    rho : float, optional (default=1.0)
        Constant that multiplies the L1 term.

    tau : float, optional (default=None)
        Constant that used in augmented Lagrangian function.

    tol : float, optional (default=1e-4)
        The tolerance for the optimization.

    max_iter : int, optional (default=300)
        The maximum number of iterations.

    extended_output : bool, optional (default=False)
        If set to True, objective function value will be saved in `self.f`.

    Attributes
    ----------
    f : a list of float
        objective function values.

    coef_ : array, shape (n_features,) | (n_targets, n_features)
        parameter vector.

    n_iter_ : int | array-like, shape (n_targets,)
        number of iterations run by admm to reach the specified tolerance.
    """
    def __init__(
            self,
            A: np.ndarray,
            b: np.ndarray,
            rho: float = 1.0,
            tau: float = 1.0,
            tol: float = 1e-4,
            max_iter: int = 300,
            extended_output: bool = False
    ):
        self.A = A
        self.b = b
        self.rho = rho
        self.tau = tau
        self.tol = tol
        self.max_iter = max_iter
        self.extended_output = extended_output

        self.f = list()
        self.coef_ = None
        self.n_iter_ = None

    def _linear_programming(self, n_features: int) -> np.ndarray:
        """
        Solve following problem.

        Problem:
            minimize ||β||_1 subject to Aβ=b

        Solver:
            scipy.optimize.linprog

        Parameters
        ----------
        n_features : int
            The dimension of decision variables

        Returns
        -------
        : np.ndarray, shape = (n_features, )
            The values of the decision variables that minimizes the objective function while satisfying the constraints.
        """
        # equality constraint matrix and vector
        c = np.hstack((np.zeros(n_features), np.ones(n_features)))
        A_eq = np.hstack((self.A, np.zeros_like(self.A)))
        b_eq = self.b

        # inequality constraint matrix and vector
        eye = np.eye(n_features)
        A_ub = np.vstack((
            np.hstack((eye, -eye)),
            np.hstack((-eye, -eye)),
            np.hstack((np.zeros((n_features, n_features)), -eye))
        ))
        b_ub = np.zeros(n_features * 3)

        return linprog(c=c, A_ub=A_ub, b_ub=b_ub, A_eq=A_eq, b_eq=b_eq)['x'][:n_features]

    def _prox(self, x: np.ndarray, rho: float) -> np.ndarray:
        """
        Proximal operator for L1 constraint

        Parameters
        ----------
        x : np.ndarray, shape = (n_features, )
            1D array

        rho : float
            a threshold
        """
        return np.sign(x) * np.maximum((np.abs(x) - rho), 0)

    def _objective_function(self, x: np.ndarray, Q: np.ndarray, p: np.ndarray, rho: float):
        """
        Return 1 / 2 * x^T Qx + p^T x + rho * ||x||_1
        """
        return 0.5 * x.dot(Q.dot(x)) + p.dot(x) + np.sum(np.abs(x)) * rho

    def fit(self, Q: np.ndarray, p: np.ndarray) -> None:
        """
        Parameters
        ----------
        Q : np.ndarray, shape = (n_features, n_features)
            Quadratic coefficient.

        p : np.ndarray, shape = (n_features, )
            Linear coefficient.
        """
        n_features = Q.shape[0]
        tau_inv = 1 / self.tau

        # initialize constants
        Q_inv = np.linalg.inv(Q * tau_inv + self.A.T.dot(self.A) + np.eye(n_features))
        p_tau = p * tau_inv
        Ab = self.A.T.dot(self.b)
        rho_tau = self.rho * tau_inv

        # initialize variables
        beta = self._linear_programming(n_features)
        z = np.copy(beta)
        u = np.zeros_like(self.b)
        v = np.zeros_like(beta)

        cost = self._objective_function(beta, Q, p, self.rho)
        # save objective function value if necessary
        if self.extended_output:
            self.f.append(cost)

        # main loop
        for k in range(self.max_iter):
            r = - p_tau - self.A.T.dot(u) - v + Ab + z
            beta = np.dot(Q_inv, r)
            z = self._prox(beta + v, rho_tau)
            u = u + self.A.dot(beta) - self.b
            v = v + beta - z

            pre_cost = cost
            cost = self._objective_function(beta, Q, p, self.rho)
            # save objective function value if necessary
            if self.extended_output:
                self.f.append(cost)
            if np.abs(cost - pre_cost) < self.tol:
                break

        # save result
        self.coef_ = z
        self.n_iter_ = k

test

We deal with the following linear equation-constrained optimization problem.

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\frac{1}{2}\|y-X\beta\|_2^2 + \rho \|\beta\|_1 \quad\\
\text{subject to}\quad &\sum_i\beta_i=1
\end{align}

Roughly, I generated data with $ y = X \ beta + \ varepsilon $ and compared it with the unconstrained Lasso of Scikit-Learn.

n_samples, n_features = 1000, 100
param = 0.5

#Data generation
np.random.seed(0)
X = np.random.randn(n_samples, n_features)
beta = np.zeros(n_features)
beta[:n_features//10] = np.random.randn(n_features//10)
beta = beta / np.sum(beta)
y = X.dot(beta) + np.random.randn(n_samples)

#Implemented
clf = EqualityConstrainedL1QuadraticProgramming(A = np.ones((1, n_features)), b = np.array([1.]), rho=param, tau=n_features)
clf.fit(X.T.dot(X) / n_samples, -X.T.dot(y) / n_samples)

# Scikit-Learn
lasso = Lasso(alpha=param)
lasso.fit(X, y)

Experimental result: \betaPlot the value of,\sum_i\beta_i,\ \frac12\|y-X\beta\|_2^2Display the value of. Compared with Scikit-learn, it can be confirmed that $ \ beta $ that satisfies the constraint condition is obtained.

label_true = f'True: sum = {round(np.sum(beta), 5)}, func = {round(0.5 * np.linalg.norm(y - X.dot(beta)), 5)}'
label_equality_constrained = f'Equality Constrained: sum = {round(np.sum(clf.coef_), 5)}, func = {round(0.5 * np.linalg.norm(y - X.dot(clf.coef_)), 5)}'
label_sklearn = f'Scikit-learn: sum = {round(np.sum(lasso.coef_), 5)}, func = {round(0.5 * np.linalg.norm(y - X.dot(lasso.coef_)), 5)}'

plt.figure(figsize=(15, 10))
plt.plot(beta, 'go-', label=label_true)
plt.plot(clf.coef_, 'ro-', label=label_equality_constrained)
plt.plot(lasso.coef_, 'bo-', label=label_sklearn)
plt.grid(True)
plt.legend()
plt.show()

image.png

Recommended Posts

[Python] LASSO regression with equation constraints using the multiplier method
Working with OpenStack using the Python SDK
Easy Lasso regression analysis with Python (no theory)
Region extraction method using cellular automaton Try region extraction from the image with growcut (Python)
Kernel Method with Python
Explanation of the concept of regression analysis using python Part 2
Determine the threshold using the P tile method in python
Calculate the regression coefficient of simple regression analysis with python
Explanation of the concept of regression analysis using Python Part 1
Explanation of the concept of regression analysis using Python Extra 1
I tried using the Python library from Ruby with PyCall
[S3] CRUD with S3 using Python [Python]
[Python] Calculation method with numpy
[Python] Using OpenCV with Python (Basic)
Creating the finite element method (FEM) with python ~ vba → python translation ~
Linear regression method using Numpy
[Python] Linear regression with scikit-learn
Call the API with python3.
Touch NoSQL with Python using the Oracle NoSQL Database Cloud Simulator
Using OpenCV with Python @Mac
Send using Python with Gmail
[In-Database Python Analysis Tutorial with SQL Server 2017] Step 6: Using the model
Creating the finite element method (FEM) with python ~ damyarou's messing around ~
Solve the Python knapsack problem with a branch and bound method
Complement python with emacs using company-jedi
Extract the xz file with python
[Python] Using OpenCV with Python (Image Filtering)
Using Rstan from Python with PypeR
[Python] Using OpenCV with Python (Image transformation)
[Python] Using OpenCV with Python (Edge Detection)
Get the weather with Python requests
Get the weather with Python requests 2
Find the Levenshtein Distance with python
Hit the Etherpad-lite API with Python
Install the Python plugin with Netbeans 8.0.2
I liked the tweet with python. ..
Using cgo with the go command
Extract the targz file using python
Notes on using rstrip with python.
Master the type with Python [Python 3.9 compatible]
Try using the Python Cmd module
Logistic regression analysis Self-made with python
When using MeCab with virtualenv python
Precautions when using six with Python 2.5
Installation method using the pip command of the Python package (library) Mac environment
Hit a method of a class instance with the Python Bottle Web API
I tried using "Streamlit" which can do the Web only with Python
Create a record with attachments in KINTONE using the Python requests module
Get and estimate the shape of the head using Dlib and OpenCV with python
[Introduction to Python] What is the method of repeating with the continue statement?
[AWS] Using ini files with Lambda [Python]
Try using the Wunderlist API in Python
[Python] Set the graph range with matplotlib
Try mathematical formulas using Σ with python
Saddle point search using the gradient method
Try using the Kraken API in Python
Using Python and MeCab with Azure Databricks
Machine learning with python (2) Simple regression analysis
Check the existence of the file with python
Try using the camera with Python's OpenCV
Socket communication using socketserver with python now