[Python] LASSO regression with equation constraints using the multiplier method

LASSO regression problem with linear equation constraints when implementing the algorithm in one paper [^ 1]

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\frac{1}{2}\|y-X\beta\|_2^2 + \rho \|\beta\|_1 \quad\\
\text{subject to}\quad &A\beta = b
\end{align}

There was a scene to solve. Since the calculation is heavy in the previously implemented LASSO with equality / inequality constraints, a new version with only equality constraints has been implemented. In implementation, $ Q = X ^ TX, \ p = -X ^ Ty $, and the equivalent problem

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\frac{1}{2}\beta^TQ\beta + p^T\beta + \rho \|\beta\|_1 \quad\\
\text{subject to}\quad &A\beta = b
\end{align}

I thought about solving.

Multiplier method

Linear equation constrained optimization problem

\begin{align}
\mathop{\text{minimize}}_{\beta,z}\quad &f(\beta) + g(z) \quad\\
\text{subject to}\quad &M\beta + Fz = c
\end{align}

The multiplier method is known as a method for dealing with. In the multiplier method, the extended Lagrange function

\mathcal{L}_\tau (\beta,z,\nu) = f(\beta) + g(z) + \nu^T(M\beta + Fz - c) + \frac{\tau}{2}\|M\beta + Fz - c\|_2^2

Update the variables to alternately minimize for $ \ beta, z $.

\newcommand{\argmin}{\mathop{\rm arg\,min}\limits}
\begin{align}
\beta^{(t+1)} &\leftarrow \argmin_\beta \mathcal{L}_\tau (\beta, z^{(t)}, \nu^{(t)})\\
z^{(t+1)} &\leftarrow \argmin_z \mathcal{L}_\tau (\beta^{(t+1)}, z, \nu^{(t)})\\
\nu^{(t+1)} &\leftarrow \nu^{(t)} + \tau(M\beta^{(t+1)}+Fz^{(t+1)}-c)
\end{align}

Application of multiplier method to equation-constrained LASSO regression

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\frac{1}{2}\beta^TQ\beta + p^T\beta + \rho \|z\|_1 \quad\\
\text{subject to}\quad &\left(\begin{array}{c}
A\\
I
\end{array}\right)\beta + \left(\begin{array}{c}
O\\
-I
\end{array}\right) z = \left(\begin{array}{c}
b\\
0
\end{array}\right)
\end{align}

By rewriting the problem

\begin{align}
f(\beta)=\frac12 \beta^TQ\beta + p^T\beta,\ 
g(z) = \rho\|z\|_1,\\
M=\left(\begin{array}{c}
A\\
I
\end{array}\right),\ 
F=\left(\begin{array}{c}
O\\
-I
\end{array}\right),\ 
c=\left(\begin{array}{c}
b\\
0
\end{array}\right)
\end{align}

As a multiplier method can be applied. At that time, set the extended Lagrange function as $ \ nu = (u ^ T, v ^ T) ^ T $.

\mathcal{L}_\tau (\beta,z,\nu) = \frac12 \beta^TQ\beta + p^T\beta + \rho\|z\|_1 + u^T(A\beta - b) + v^T(\beta - z) + \frac{\tau}{2}\|A\beta - b\|_2^2 + \frac{\tau}{2}\|\beta - z\|_2^2

Can be written as.

Beta update formula

Optimum condition

\frac{\partial \mathcal{L}_\tau}{\partial \beta}(\beta, z^{(t)}, \nu^{(t)}) = Q\beta + p + A^Tu^{(t)} + v^{(t)} + \tau A^T(A\beta - b) + \tau(\beta - z^{(t)}) = 0

To solve

\beta^{(t+1)} = \left(Q+\tau(A^TA+I)\right)^{-1}\left(-p-A^Tu^{(t)}-v^{(t)}+\tau A^Tb+\tau z^{(t)}\right)

Update with.

z update formula

Optimum condition

\partial_z \mathcal{L}_\tau(\beta^{(t+1)}, z, \nu^{(t)}) = \rho\partial_z \|z\|_1 - v^{(t)} - \tau\left(\beta^{(t+1)} - z\right)\ni0

From the update formula

z^{(t+1)} = s\left(\beta^{(t+1)} + \frac1\tau v^{(t)}; \frac\rho\tau\right)

Is obtained. Where $ s (\ cdot; \ cdot) $ is the soft threshold function

s(x; \lambda) = \mathrm{sign}\,(x)\max\left\{x - \lambda, 0\right\}

Represents.

Variable initialization

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\|\beta\|_1 \quad\\
\text{subject to}\quad &A\beta= b
\end{align}

By solving, we get $ \ beta ^ {(0)} $ that satisfies the constraint condition. This problem can be regarded as a linear programming problem by introducing a new variable $ \ gamma $.

\begin{align}
\mathop{\text{minimize}}_{\beta,\gamma}\quad &\sum_{i=1}^p \gamma_i \quad\\
\text{subject to}\quad &A\beta= b,\\
&\ \gamma\geq0,\ \gamma\geq\beta,\ \gamma\geq-\beta
\end{align}

Algorithm summary

Initialize $ \ beta ^ {(0)} $ and $ z ^ {(0)} \ leftarrow \ beta ^ {(0)}, \ u ^ {(0)} \ leftarrow0, \ v ^ { (0)} \ leftarrow0, \ t \ leftarrow 0 $.
Output $ z ^ {(t)} $ if the stop condition is met
\beta^{(t+1)} \leftarrow \left(Q+\tau(A^TA+I)\right)^{-1}\left(-p-A^Tu^{(t)}-v^{(t)}+\tau A^Tb+\tau z^{(t)}\right)
z^{(t+1)} \leftarrow s\left(\beta^{(t+1)} + \frac1\tau v^{(t)}; \frac\rho\tau\right)
u^{(t+1)} \leftarrow u^{(t)} + \tau(A\beta^{(t+1)}-b)
v^{(t+1)} \leftarrow v^{(t)} + \tau(\beta^{(t+1)}-z^{(t+1)})
Set $ t \ leftarrow t + 1 $ and go to step 2.

Source code

Implemented in Python. We used scipy.optimize.linprog to initialize $ \ beta $.

import numpy as np
from scipy.optimize import linprog


class EqualityConstrainedL1QuadraticProgramming:
    """
    Problem
    ----------
    minimize 0.5 * βQβ + pβ + ρ||β||_1
    subject to Aβ = b

    Algorithm
    ----------
    Alternating Direction Method of Multipliers (ADMM)

    Parameters
    ----------
    A : np.ndarray, optional (default=None)
        The equality constraint matrix.

    b : np.ndarray, optional (default=None)
        The equality constraint vector.

    rho : float, optional (default=1.0)
        Constant that multiplies the L1 term.

    tau : float, optional (default=None)
        Constant that used in augmented Lagrangian function.

    tol : float, optional (default=1e-4)
        The tolerance for the optimization.

    max_iter : int, optional (default=300)
        The maximum number of iterations.

    extended_output : bool, optional (default=False)
        If set to True, objective function value will be saved in `self.f`.

    Attributes
    ----------
    f : a list of float
        objective function values.

    coef_ : array, shape (n_features,) | (n_targets, n_features)
        parameter vector.

    n_iter_ : int | array-like, shape (n_targets,)
        number of iterations run by admm to reach the specified tolerance.
    """
    def __init__(
            self,
            A: np.ndarray,
            b: np.ndarray,
            rho: float = 1.0,
            tau: float = 1.0,
            tol: float = 1e-4,
            max_iter: int = 300,
            extended_output: bool = False
    ):
        self.A = A
        self.b = b
        self.rho = rho
        self.tau = tau
        self.tol = tol
        self.max_iter = max_iter
        self.extended_output = extended_output

        self.f = list()
        self.coef_ = None
        self.n_iter_ = None

    def _linear_programming(self, n_features: int) -> np.ndarray:
        """
        Solve following problem.

        Problem:
            minimize ||β||_1 subject to Aβ=b

        Solver:
            scipy.optimize.linprog

        Parameters
        ----------
        n_features : int
            The dimension of decision variables

        Returns
        -------
        : np.ndarray, shape = (n_features, )
            The values of the decision variables that minimizes the objective function while satisfying the constraints.
        """
        # equality constraint matrix and vector
        c = np.hstack((np.zeros(n_features), np.ones(n_features)))
        A_eq = np.hstack((self.A, np.zeros_like(self.A)))
        b_eq = self.b

        # inequality constraint matrix and vector
        eye = np.eye(n_features)
        A_ub = np.vstack((
            np.hstack((eye, -eye)),
            np.hstack((-eye, -eye)),
            np.hstack((np.zeros((n_features, n_features)), -eye))
        ))
        b_ub = np.zeros(n_features * 3)

        return linprog(c=c, A_ub=A_ub, b_ub=b_ub, A_eq=A_eq, b_eq=b_eq)['x'][:n_features]

    def _prox(self, x: np.ndarray, rho: float) -> np.ndarray:
        """
        Proximal operator for L1 constraint

        Parameters
        ----------
        x : np.ndarray, shape = (n_features, )
            1D array

        rho : float
            a threshold
        """
        return np.sign(x) * np.maximum((np.abs(x) - rho), 0)

    def _objective_function(self, x: np.ndarray, Q: np.ndarray, p: np.ndarray, rho: float):
        """
        Return 1 / 2 * x^T Qx + p^T x + rho * ||x||_1
        """
        return 0.5 * x.dot(Q.dot(x)) + p.dot(x) + np.sum(np.abs(x)) * rho

    def fit(self, Q: np.ndarray, p: np.ndarray) -> None:
        """
        Parameters
        ----------
        Q : np.ndarray, shape = (n_features, n_features)
            Quadratic coefficient.

        p : np.ndarray, shape = (n_features, )
            Linear coefficient.
        """
        n_features = Q.shape[0]
        tau_inv = 1 / self.tau

        # initialize constants
        Q_inv = np.linalg.inv(Q * tau_inv + self.A.T.dot(self.A) + np.eye(n_features))
        p_tau = p * tau_inv
        Ab = self.A.T.dot(self.b)
        rho_tau = self.rho * tau_inv

        # initialize variables
        beta = self._linear_programming(n_features)
        z = np.copy(beta)
        u = np.zeros_like(self.b)
        v = np.zeros_like(beta)

        cost = self._objective_function(beta, Q, p, self.rho)
        # save objective function value if necessary
        if self.extended_output:
            self.f.append(cost)

        # main loop
        for k in range(self.max_iter):
            r = - p_tau - self.A.T.dot(u) - v + Ab + z
            beta = np.dot(Q_inv, r)
            z = self._prox(beta + v, rho_tau)
            u = u + self.A.dot(beta) - self.b
            v = v + beta - z

            pre_cost = cost
            cost = self._objective_function(beta, Q, p, self.rho)
            # save objective function value if necessary
            if self.extended_output:
                self.f.append(cost)
            if np.abs(cost - pre_cost) < self.tol:
                break

        # save result
        self.coef_ = z
        self.n_iter_ = k

test

We deal with the following linear equation-constrained optimization problem.

\begin{align}
\mathop{\text{minimize}}_\beta\quad &\frac{1}{2}\|y-X\beta\|_2^2 + \rho \|\beta\|_1 \quad\\
\text{subject to}\quad &\sum_i\beta_i=1
\end{align}

Roughly, I generated data with $ y = X \ beta + \ varepsilon $ and compared it with the unconstrained Lasso of Scikit-Learn.

n_samples, n_features = 1000, 100
param = 0.5

#Data generation
np.random.seed(0)
X = np.random.randn(n_samples, n_features)
beta = np.zeros(n_features)
beta[:n_features//10] = np.random.randn(n_features//10)
beta = beta / np.sum(beta)
y = X.dot(beta) + np.random.randn(n_samples)

#Implemented
clf = EqualityConstrainedL1QuadraticProgramming(A = np.ones((1, n_features)), b = np.array([1.]), rho=param, tau=n_features)
clf.fit(X.T.dot(X) / n_samples, -X.T.dot(y) / n_samples)

# Scikit-Learn
lasso = Lasso(alpha=param)
lasso.fit(X, y)

Experimental result: \betaPlot the value of,\sum_i\beta_i,\ \frac12\|y-X\beta\|_2^2Display the value of. Compared with Scikit-learn, it can be confirmed that $ \ beta $ that satisfies the constraint condition is obtained.

label_true = f'True: sum = {round(np.sum(beta), 5)}, func = {round(0.5 * np.linalg.norm(y - X.dot(beta)), 5)}'
label_equality_constrained = f'Equality Constrained: sum = {round(np.sum(clf.coef_), 5)}, func = {round(0.5 * np.linalg.norm(y - X.dot(clf.coef_)), 5)}'
label_sklearn = f'Scikit-learn: sum = {round(np.sum(lasso.coef_), 5)}, func = {round(0.5 * np.linalg.norm(y - X.dot(lasso.coef_)), 5)}'

plt.figure(figsize=(15, 10))
plt.plot(beta, 'go-', label=label_true)
plt.plot(clf.coef_, 'ro-', label=label_equality_constrained)
plt.plot(lasso.coef_, 'bo-', label=label_sklearn)
plt.grid(True)
plt.legend()
plt.show()