Comparison with abess#
In this notebook, we compare the performance (recovery accuracy and computation time) of skscope and abesson three tasks as follows:
Linear Regression
Non-negative Linear Regression
Logistic Regression.
Besides, we compare the corresponding performance with different data (sample size \(n\), feature number \(p\) and support size \(s\)) dimension.
[1]:
import time
import numpy as np
import pandas as pd
from IPython.display import display
import jax.numpy as jnp
from skscope import ScopeSolver
from abess.datasets import make_glm_data
from abess import LinearRegression, LogisticRegression
The following is the main test function which records the recovery accuracy and computation time of both methods.
[2]:
def test_time(n=500, p=1000, s=5, random_state=None):
print('='*20 + f' n={n}, p={p}, s={s} ' + '='*20 )
rng = np.random.default_rng(random_state)
true_support_set = rng.choice(np.arange(p), size=s, replace=False)
real_coef = np.zeros(p)
iterables = [['Linear', 'Non-Neg', 'Logistic'], ['abess', 'skscope']]
index = pd.MultiIndex.from_product(iterables, names=['Task', 'Model'])
res = pd.DataFrame(columns=['Accuracy', 'Time'], index = index)
for type in ['Linear', 'Non-Neg', 'Logistic']:
if type == 'Linear':
real_coef[true_support_set] = rng.choice(np.arange(1, 4), size=s) * rng.choice([1, -1], size=s)
data = make_glm_data(n=n, p=p, k=s, family='gaussian', coef_=real_coef)
model = LinearRegression(support_size=s)
def objective(params):
loss = jnp.mean((y - X @ params) ** 2)
return loss
solver = ScopeSolver(p, sparsity=s)
elif type == 'Non-Neg':
real_coef[true_support_set] = rng.choice(np.arange(1, 4), size=s) * rng.choice([1], size=s)
data = make_glm_data(n=n, p=p, k=s, family='gaussian', coef_=real_coef)
model = LinearRegression(support_size=s)
def objective(params):
loss = jnp.mean((y - X @ params) ** 2)
return loss
solver = ScopeSolver(p, sparsity=s)
elif type == 'Logistic':
real_coef[true_support_set] = rng.choice(np.arange(1, 4), size=s) * rng.choice([1, -1], size=s)
data = make_glm_data(n=n, p=p, k=s, family='binomial', coef_=real_coef)
model = LogisticRegression(support_size=s)
def objective(params):
xbeta = jnp.clip(X @ params, -30, 30)
return jnp.mean(jnp.log(1 + jnp.exp(xbeta)) - y * xbeta)
solver = ScopeSolver(p, sparsity=s)
X, y = data.x, data.y
# abess
t_begin = time.time()
model.fit(X, y)
t_abess = time.time() - t_begin
acc_abess = len(set(np.nonzero(model.coef_)[0]) & set(true_support_set)) / s
res.loc[(type, 'abess')] = [acc_abess, np.round(t_abess, 4)]
# skscope
t_begin = time.time()
params = solver.solve(objective, jit=True)
t_skscope = time.time() - t_begin
acc_skscope = len(set(np.nonzero(params)[0]) & set(np.nonzero(data.coef_)[0])) / s
res.loc[(type, 'skscope')] = [acc_skscope, np.round(t_skscope, 4)]
print(res)
We compare these three tasks with
Certainly, abess is faster than skscope since all three tasks are generalized linear model and are specicialized class for abess.
This phenomenon is admmissible since the auto-differention procedure of skscope is general and takes more computation time.
All results are shwon in the following tables.
[3]:
settings = [
(500, 1000, 5),
(2000, 5000, 10),
(5000, 10000, 10),
]
for setting in settings:
n, p, s = setting
test_time(n=n, p=p, s=s)
==================== n=500, p=1000, s=5 ====================
Accuracy Time
Task Model
Linear abess 1.0 0.006
skscope 1.0 0.2256
Non-Neg abess 1.0 0.011
skscope 1.0 0.1591
Logistic abess 1.0 0.0168
skscope 1.0 0.2517
==================== n=2000, p=5000, s=10 ====================
Accuracy Time
Task Model
Linear abess 1.0 0.2875
skscope 1.0 1.0385
Non-Neg abess 1.0 0.2477
skscope 1.0 1.2133
Logistic abess 1.0 0.2196
skscope 1.0 1.3835
==================== n=5000, p=10000, s=10 ====================
Accuracy Time
Task Model
Linear abess 1.0 1.3192
skscope 1.0 5.8157
Non-Neg abess 1.0 1.4239
skscope 1.0 5.3028
Logistic abess 1.0 1.1628
skscope 1.0 5.1351