Comparison with abess#

In this notebook, we compare the performance (recovery accuracy and computation time) of skscope and abesson three tasks as follows:

  • Linear Regression

  • Non-negative Linear Regression

  • Logistic Regression.

Besides, we compare the corresponding performance with different data (sample size \(n\), feature number \(p\) and support size \(s\)) dimension.

[1]:
import time
import numpy as np
import pandas as pd
from IPython.display import display

import jax.numpy as jnp
from skscope import ScopeSolver


from abess.datasets import make_glm_data
from abess import LinearRegression, LogisticRegression

The following is the main test function which records the recovery accuracy and computation time of both methods.

[2]:
def test_time(n=500, p=1000, s=5, random_state=None):
    print('='*20 + f'  n={n}, p={p}, s={s}  ' + '='*20 )
    rng = np.random.default_rng(random_state)
    true_support_set = rng.choice(np.arange(p), size=s, replace=False)
    real_coef = np.zeros(p)

    iterables = [['Linear', 'Non-Neg', 'Logistic'], ['abess', 'skscope']]
    index = pd.MultiIndex.from_product(iterables, names=['Task', 'Model'])
    res = pd.DataFrame(columns=['Accuracy', 'Time'], index = index)

    for type in ['Linear', 'Non-Neg', 'Logistic']:
        if type == 'Linear':
            real_coef[true_support_set] = rng.choice(np.arange(1, 4), size=s) * rng.choice([1, -1], size=s)
            data = make_glm_data(n=n, p=p, k=s, family='gaussian', coef_=real_coef)
            model = LinearRegression(support_size=s)
            def objective(params):
                loss = jnp.mean((y - X @ params) ** 2)
                return loss
            solver = ScopeSolver(p, sparsity=s)
        elif type == 'Non-Neg':
            real_coef[true_support_set] = rng.choice(np.arange(1, 4), size=s) * rng.choice([1], size=s)
            data = make_glm_data(n=n, p=p, k=s, family='gaussian', coef_=real_coef)
            model = LinearRegression(support_size=s)
            def objective(params):
                loss = jnp.mean((y - X @ params) ** 2)
                return loss
            solver = ScopeSolver(p, sparsity=s)
        elif type == 'Logistic':
            real_coef[true_support_set] = rng.choice(np.arange(1, 4), size=s) * rng.choice([1, -1], size=s)
            data = make_glm_data(n=n, p=p, k=s, family='binomial', coef_=real_coef)
            model = LogisticRegression(support_size=s)
            def objective(params):
                xbeta = jnp.clip(X @ params, -30, 30)
                return jnp.mean(jnp.log(1 + jnp.exp(xbeta)) - y * xbeta)
            solver = ScopeSolver(p, sparsity=s)

        X, y = data.x, data.y

        # abess
        t_begin = time.time()
        model.fit(X, y)
        t_abess = time.time() - t_begin
        acc_abess = len(set(np.nonzero(model.coef_)[0]) & set(true_support_set)) / s
        res.loc[(type, 'abess')] = [acc_abess, np.round(t_abess, 4)]

        # skscope
        t_begin = time.time()
        params = solver.solve(objective, jit=True)
        t_skscope = time.time() - t_begin
        acc_skscope = len(set(np.nonzero(params)[0]) & set(np.nonzero(data.coef_)[0])) / s
        res.loc[(type, 'skscope')] = [acc_skscope, np.round(t_skscope, 4)]

    print(res)

We compare these three tasks with

\[(n, p, s)\in\{(500, 1000, 5), (2000, 5000, 10), (5000, 10000, 10)\}.\]

Certainly, abess is faster than skscope since all three tasks are generalized linear model and are specicialized class for abess.

This phenomenon is admmissible since the auto-differention procedure of skscope is general and takes more computation time.

All results are shwon in the following tables.

[3]:
settings = [
    (500, 1000, 5),
    (2000, 5000, 10),
    (5000, 10000, 10),
]
for setting in settings:
    n, p, s = setting
    test_time(n=n, p=p, s=s)
====================  n=500, p=1000, s=5  ====================
                 Accuracy    Time
Task     Model
Linear   abess        1.0   0.006
         skscope      1.0  0.2256
Non-Neg  abess        1.0   0.011
         skscope      1.0  0.1591
Logistic abess        1.0  0.0168
         skscope      1.0  0.2517
====================  n=2000, p=5000, s=10  ====================
                 Accuracy    Time
Task     Model
Linear   abess        1.0  0.2875
         skscope      1.0  1.0385
Non-Neg  abess        1.0  0.2477
         skscope      1.0  1.2133
Logistic abess        1.0  0.2196
         skscope      1.0  1.3835
====================  n=5000, p=10000, s=10  ====================
                 Accuracy    Time
Task     Model
Linear   abess        1.0  1.3192
         skscope      1.0  5.8157
Non-Neg  abess        1.0  1.4239
         skscope      1.0  5.3028
Logistic abess        1.0  1.1628
         skscope      1.0  5.1351