[28]:
%matplotlib inline

2.2. Poisson regression#

We would like to use an example to show how the sparse-constrained optimization for poisson regression works in our program.

2.2.1. Introduction#

Poisson regression involves regression models in which the response variable is in the form of counts. For example, the count of number of car accidents or number of customers in line at a reception desk. The response variables is assumed to follow a Poisson distribution.

Poisson regression is a type of regression analysis used to model count data and contingency tables. It assumes that the response variable, which represents counts, follows a Poisson distribution. This type of model is particularly useful when the data represents the number of times an event occurs within a fixed period of time, area, volume, or any other well-defined interval.

In Poisson regression, the expected value of the response variable is linked to the linear predictors through a log-link function. However, in cases where covariates act additively, an identity link can be more appropriate. For instance, in certain applications where the counts are superimposed signals from multiple sources, the identity link maintains the additive nature of the covariates. Next are applications of it.

  • Healthcare and epidemiology: Poisson regression is extensively used in healthcare and epidemiology to model the number of occurrences of diseases or health-related events. This helps in identifying high-risk areas and populations, guiding public health interventions.

  • Traffic and transportation: In traffic studies, Poisson regression is applied to model the number of accidents occurring at various locations or during different time periods. This method helps in optimizing traffic management and reducing accident rates.

  • Insurance: In the insurance industry, Poisson regression models are used to predict the number of claims made by policyholders. This enables insurance companies to manage risk more effectively and ensure financial stability.

The general mathematical equation for Poisson regression is

\[\begin{align}\log(E(y)) = \beta_0 + \beta_1 X_1+\beta_2 X_2+\dots+\beta_p X_p.\end{align} \tag{1}\]

With \(n\) independent data of the explanatory variables \(x\) and the response variable \(y\), we can estimate \(\beta\) by minimizing the negative log-likelihood function under sparsity constraint:

\[\arg \min _{\beta \in R^p} L(\beta):=-\frac{1}{n} \sum_{i=1}^n\left\{y_i x_i^T \beta-\exp \left(x_i^T \beta\right)-\log \left(y!\right)\right\}, \text { s.t. }\|\beta\|_0 \leq s .\tag{2}\]

Here is Python code for solving sparse poisson regression problem:

2.2.2. Data for poisson regression#

We import necessary packages and set a seed.

[29]:
import numpy as np
from abess.datasets import make_glm_data
import jax.numpy as jnp
from skscope import ScopeSolver
[30]:
np.random.seed(1234)

Firstly, we generate some artificial data using logic 1. Consider a dataset containing n=500 observations with p=500 variables. The make_glm_data function allows us to generate simulated data by specifying the family="poisson". By specifying k = 5, we set only 5 of the 500 variables to have effect on the expectation of the response.

[39]:
n = 500
p = 500
s = 5

data = make_glm_data(n=n, p=p, k=s, family="poisson")
X = data.x
y = data.y

2.2.3. Solve the problem#

Secondly, we define the loss function poisson_loss accorting to 2 that matches the data generating function make_glm_data.

[32]:
def poisson_loss(params):
    xbeta = jnp.clip(X @ params, -30, 30)
    return jnp.sum(jnp.exp(xbeta) - y * xbeta) #omit \log y! term

We use skscope to solve the sparse possion regression problem. After defining the data generation and loss function, we can call ScopeSolver to solve the sparse-constrained optimization problem. We will use GIC to decide the optimal support size.

[33]:
from skscope.utilities import GIC

solver = ScopeSolver(p, sparsity = range(1,10), sample_size = n, ic_method = GIC)
params = solver.solve(poisson_loss, jit=True)

Now the solver.params contains the coefficients of poisson model with no more than 5 variables. That is, those variables with a coefficient 0 is unused in the model.

We can further compare the coefficients estimated by skscope and the real coefficients in three-fold:

  • The true support set and the estimated support set

  • The true nonzero parameters and the estimated nonzero parameters

  • The true loss value and the estimated values

[35]:
true_support_set = np.nonzero(data.coef_)[0]
estimated_support_set = solver.support_set
print("True support set: ", true_support_set)
print("Estimated support set: ", estimated_support_set)
True support set:  [ 53 331 336 393 431]
Estimated support set:  [ 53 331 336 393 431]
[36]:
print("True parameters: ", data.coef_[true_support_set])
print("Estimated parameters: ", solver.params[estimated_support_set])
True parameters:  [-4.59545963  7.8660496   7.09934975 -6.71406003 -5.76856858]
Estimated parameters:  [-4.59741833  7.96903051  7.83567449 -6.3614927  -5.50868553]
[37]:
print("True loss value: ", poisson_loss(data.coef_))
print("Estimated loss value: ", poisson_loss(solver.params))
True loss value:  121.09512
Estimated loss value:  119.89359

We can plot the sparse signal recovering from the noisy observations to visualize the results.

[38]:
import matplotlib.pyplot as plt
(inx_true,) =  data.coef_.nonzero()
(inx_est,) =  solver.params.nonzero()

# plot the sparse signal
plt.figure(figsize=(7, 7))
plt.subplot(2, 1, 1)
plt.stem(inx_true, data.coef_[inx_true], markerfmt='o', basefmt='k-')
plt.plot([0, 500], [0, 0], 'r-', lw=2)
plt.xlim(0, 500)
plt.title("Sparse signal")
#plt.plot(inx_true, true_params[inx_true], drawstyle='steps-post')

# plot the noisy reconstruction
plt.subplot(2, 1, 2)
plt.stem(inx_est, solver.params[inx_est], markerfmt='o', basefmt='k-')
plt.plot([0, 500], [0, 0], 'r-', lw=2)
plt.xlim(0, 500)
plt.title("Recovered signal from noisy observations")
#plt.plot(inx_est, solver.params[inx_est], drawstyle='steps-post')

plt.show()
../../_images/gallery_GeneralizedLinearModels_poisson-regression_19_0.png