[1]:

%matplotlib inline

2.5. Multinomial logistic regression#

We would like to use an example to show how the sparse-constrained optimization for multinomial logistic regression works in our program.

2.5.1. Introduction#

Multinomial logistic regression, also known as multinomial logit, is a classification method that generalizes logistic regression to multiclass problems. While binary logistic regression deals with two classes, multinomial logistic regression can handle three or more classes. This makes it particularly useful in scenarios where the outcome variable can take on more than two discrete values.

The multinomial logistic regression model estimates the probability of each possible outcome of a categorical dependent variable, given a set of independent variables. The model is based on the logistic function, which ensures that the estimated probabilities are between 0 and 1 and that they sum to 1 across all categories. Next are some applications of it.

Marketing and customer segmentation: By identifying distinct customer segments, companies can tailor their marketing strategies and improve targeting, ultimately increasing sales and customer satisfaction.
Medical research and diagnostics: In medical research, multinomial logistic regression can help diagnose diseases based on patient symptoms and test results. This aids in early detection and personalized treatment plans, improving patient outcomes.
Social sciences: In social sciences, multinomial logistic regression is employed to study voting behavior, social mobility, and public opinion. This provides valuable insights into societal trends and helps inform policy decisions.

In multinomial logistic regression, we have multiple categories, denoted by \(k=1,2,...,K\). We want to predict the probability of each category given a set of predictor variables \(X\). We assume that the probability of each category is a function of the predictor variables, and that the probabilities for each category sum to 1.

We can model the probability of each category using the softmax function:

\[P(Y=k|X=x) = \frac{e^{\beta _{0k} + \beta _k^TX}}{\sum_{j=1}^K e^{\beta _{0j} + \beta _j^TX}}\]

where \(Y\) is the categorical outcome, \(X\) is the vector of predictor variables, \(\beta _{0k}\) and \(\beta _k\) are the intercept and coefficient vectors for category \(k\), and \(e\) is the base of the natural logarithm.

The softmax function ensures that the probabilities for each category sum to 1. The numerator of the function represents the probability of category \(k\), and the denominator represents the sum of the probabilities for all categories.

We can estimate the coefficients using maximum likelihood estimation. The likelihood function for multinomial logistic regression is:

\[L(\beta) = \prod _{i=1}^n \prod _{k=1}^K P(Y_i=k|X_i=x_i)^{I(Y_i=k)}\]

where \(n\) is the number of observations, \(I(Y_i=k)\) is an indicator function that equals 1 if \(Y_i=k\) and 0 otherwise, and \(P(Y_i=k|X_i=x_i)\) is the predicted probability of category \(k\) for observation \(i\).

The negative log-likelihood function is:

\[-l(\beta) = -\sum _{i=1}^n \sum _{k=1}^K I(Y_i=k) \log P(Y_i=k|X_i=x_i) \tag{1}\]

This is the function that we want to minimize in order to estimate the coefficients. We can use scope algorithm to find the values of \(\beta\) with sparsity constraints that minimize the negative log-likelihood function.

Here is Python code for solving sparse gamma regression problem:

2.5.2. Data for multinomial logistic regression#

We import necessary packages and set a seed.

[2]:

import jax.numpy as jnp
import numpy as np
from skscope import ScopeSolver
import numpy as np
from abess.datasets import make_multivariate_glm_data

[3]:

np.random.seed(3)

Firstly, we shall conduct Multinomial logistic regression on an artificial dataset for demonstration. The make_multivariate_glm_data from abess.datasets function allows us to generate simulated data by specifying the family="multinomial".

The assumption behind this model is that the response vector follows a multinomial distribution. The artificial dataset contains 500 observations and 20 predictors but only five predictors have influence on the three possible classes.

[4]:

n = 500  # sample size
p = 20  # all predictors
k = 5   # real predictors
m = 3   # number of classes

data = make_multivariate_glm_data(n=n, p=p, k=k, family="multinomial", M=m)

X = data.x
y = data.y

print('real variables\' index:\n', set(np.nonzero(data.coef_)[0]))
print('real variables:\n', data.coef_)

real variables' index:
 {0, 3, 7, 10, 19}
real variables:
 [[  5.44916029  -0.94953634   0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [ -1.39241163 -12.96678673   0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  3.24543565   4.02033588   0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [ -1.38210809   4.07755579   0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  5.79719104   3.61096451   0.        ]]

2.5.3. Solve the problem#

Secondly, to carry out sparse-constrained optimization for multinomial logistic regression, we define the loss function multinomial_regression_loss accorting to 1 that matches the data generating function make_multivariate_glm_data.

[5]:

def multinomial_regression_loss(params):
    beta = params.reshape((p, m))
    # Compute the logits
    logits = jnp.dot(X, beta)

    # Compute the softmax probabilities
    softmax_probs = jnp.exp(logits) / jnp.sum(jnp.exp(logits), axis=1, keepdims=True)

    # Compute the NLL loss
    loss = -jnp.mean(jnp.sum(y * jnp.log(softmax_probs), axis=1))

    return loss

We use skscope to solve the sparse multinomial logistic regression problem. After defining the data and the loss function, we can call ScopeSolver to solve the sparse-constrained optimization problem.

[6]:

solver = ScopeSolver(p*(m), k, group=[i for i in range(p) for j in range(m)])
params = solver.solve(multinomial_regression_loss, jit=True)

Now the solver.params contains the coefficients of multinomial logistic model with no more than 5 variables. That is, those variables with a coefficient 0 is unused in the model:

[7]:

print(solver.params.reshape((p, m)))

[[ 4.39822362 -2.50996204 -1.88825765]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 3.99429591 -9.68531333  5.69101694]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.93656944  2.00800227 -2.94457135]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [-2.31647311  3.43094489 -1.11447062]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 2.54118658  0.82594677 -3.36713346]]

We can further compare the coefficients estimated by skscope and the real coefficients in two-fold:

The true support set and the estimated support set
The true nonzero parameters and the estimated nonzero parameters

[8]:

print('real variables\' index:\n', set(np.nonzero(data.coef_)[0]))
print('Estimated variables\' index:\n', set(np.nonzero(solver.params.reshape((p, m)))[0]))

real variables' index:
 {0, 3, 7, 10, 19}
Estimated variables' index:
 {0, 3, 7, 10, 19}

[9]:

print("True parameter:\n", data.coef_)
print("Estimated parameter:\n", solver.params.reshape((p, m)))

True parameter:
 [[  5.44916029  -0.94953634   0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [ -1.39241163 -12.96678673   0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  3.24543565   4.02033588   0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [ -1.38210809   4.07755579   0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  0.           0.           0.        ]
 [  5.79719104   3.61096451   0.        ]]
Estimated parameter:
 [[ 4.39822362 -2.50996204 -1.88825765]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 3.99429591 -9.68531333  5.69101694]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.93656944  2.00800227 -2.94457135]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [-2.31647311  3.43094489 -1.11447062]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 0.          0.          0.        ]
 [ 2.54118658  0.82594677 -3.36713346]]