Comparison with abess#
Abess is a well-known open-source library that supports variable selection. A comparative table summarizing the supported features of abess and skscope is listed below:
Supported features |
|
|
|---|---|---|
Sparsity matrices |
Covarites only |
YES |
Warm starts |
YES |
YES |
GPU/TPU |
NO |
YES |
Multi CPUs |
YES |
YES |
YES |
YES |
|
General objective function |
YES |
YES |
No. supported algorithms |
1 |
7 |
Group-structure sparisty |
YES |
YES |
Nuisance parameters |
YES |
YES |
Table: The support features of the abess and skscope.
In this notebook, we compare the performance (recovery accuracy and computation time) of skscope and abesson three tasks as follows:
Linear Regression
Non-negative Linear Regression
Logistic Regression.
Besides, we compare the corresponding performance with different data (sample size \(n\), feature number \(p\) and support size \(s\)) dimension.
[1]:
import time
import numpy as np
import pandas as pd
from IPython.display import display
import jax.numpy as jnp
from skscope import ScopeSolver
from abess.datasets import make_glm_data
from abess import LinearRegression, LogisticRegression
The following is the main test function which records the recovery accuracy and computation time of both methods.
[2]:
def test_time(n=500, p=1000, s=5, random_state=None, rep=1):
print('='*20 + f' n={n}, p={p}, s={s} ' + '='*20 )
# rng = np.random.default_rng(random_state)
# true_support_set = rng.choice(np.arange(p), size=s, replace=False)
# real_coef = np.zeros(p)
# iterables = [['Linear', 'Non-Neg', 'Logistic'], ['abess', 'skscope']]
# index = pd.MultiIndex.from_product(iterables, names=['Task', 'Model'])
# res = pd.DataFrame(columns=['Accuracy', 'Time'], index = index)
res = pd.DataFrame(columns=['Task', 'Model', 'Accuracy', 'Time'])
for i in range(rep):
rng = np.random.default_rng(i)
true_support_set = rng.choice(np.arange(p), size=s, replace=False)
real_coef = np.zeros(p)
for task in ['Linear', 'Non-Neg', 'Logistic']:
if task == 'Linear':
real_coef[true_support_set] = rng.choice(np.arange(1, 4), size=s) * rng.choice([1, -1], size=s)
data = make_glm_data(n=n, p=p, k=s, family='gaussian', coef_=real_coef)
model = LinearRegression(support_size=s)
def objective(params):
loss = jnp.mean((y - X @ params) ** 2)
return loss
solver = ScopeSolver(p, sparsity=s)
elif task == 'Non-Neg':
real_coef[true_support_set] = rng.choice(np.arange(1, 4), size=s) * rng.choice([1], size=s)
data = make_glm_data(n=n, p=p, k=s, family='gaussian', coef_=real_coef)
model = LinearRegression(support_size=s)
def objective(params):
loss = jnp.mean((y - X @ params) ** 2)
return loss
solver = ScopeSolver(p, sparsity=s)
elif task == 'Logistic':
real_coef[true_support_set] = rng.choice(np.arange(1, 4), size=s) * rng.choice([1, -1], size=s)
data = make_glm_data(n=n, p=p, k=s, family='binomial', coef_=real_coef)
model = LogisticRegression(support_size=s)
def objective(params):
xbeta = jnp.clip(X @ params, -30, 30)
return jnp.mean(jnp.log(1 + jnp.exp(xbeta)) - y * xbeta)
solver = ScopeSolver(p, sparsity=s)
X, y = data.x, data.y
# abess
t_begin = time.time()
model.fit(X, y)
t_abess = time.time() - t_begin
acc_abess = len(set(np.nonzero(model.coef_)[0]) & set(true_support_set)) / s
# res.loc[(type, 'abess')] = [acc_abess, np.round(t_abess, 4)]
res.loc[len(res)] = [task, 'abess', acc_abess, t_abess]
# skscope
t_begin = time.time()
params = solver.solve(objective, jit=True)
t_skscope = time.time() - t_begin
acc_skscope = len(set(np.nonzero(params)[0]) & set(np.nonzero(data.coef_)[0])) / s
# res.loc[(type, 'skscope')] = [acc_skscope, np.round(t_skscope, 4)]
res.loc[len(res)] = [task, 'skscope', acc_skscope, t_skscope]
res_mean = res.groupby(['Task', 'Model']).mean()
res_std = res.groupby(['Task', 'Model']).std()
res_mean['Accuracy'] = res_mean['Accuracy'].map(lambda x: f'{x:.2f}')
res_mean['Time'] = res_mean['Time'].map(lambda x: f'{x:.2f}')
res_std['Accuracy'] = res_std['Accuracy'].map(lambda x: f' ({x:.2f})')
res_std['Time'] = res_std['Time'].map(lambda x: f' ({x:.2f})')
res_all = res_mean + res_std
print(res_all)
return res_all
We compare these three tasks with
Certainly, abess is faster than skscope since all three tasks are generalized linear model and are specicialized class for abess.
This phenomenon is admmissible since the auto-differention procedure of skscope is general and takes more computation time.
All results are shwon in the following tables.
[3]:
settings = [
(500, 1000, 10),
(2500, 5000, 50),
(5000, 10000, 100),
]
for setting in settings:
n, p, s = setting
res = test_time(n=n, p=p, s=s, rep=10)
==================== n=500, p=1000, s=10 ====================
Accuracy Time
Task Model
Linear abess 1.00 (0.00) 0.01 (0.00)
skscope 1.00 (0.00) 0.23 (0.03)
Logistic abess 0.99 (0.03) 0.02 (0.00)
skscope 0.99 (0.03) 0.26 (0.03)
Non-Neg abess 1.00 (0.00) 0.01 (0.00)
skscope 1.00 (0.00) 0.22 (0.02)
==================== n=2500, p=5000, s=50 ====================
Accuracy Time
Task Model
Linear abess 1.00 (0.00) 0.55 (0.12)
skscope 1.00 (0.00) 3.88 (1.10)
Logistic abess 1.00 (0.00) 0.83 (0.15)
skscope 1.00 (0.00) 5.00 (0.84)
Non-Neg abess 1.00 (0.00) 0.63 (0.27)
skscope 1.00 (0.00) 3.86 (1.01)
==================== n=5000, p=10000, s=100 ====================
Accuracy Time
Task Model
Linear abess 1.00 (0.00) 2.19 (0.22)
skscope 1.00 (0.00) 14.09 (1.28)
Logistic abess 1.00 (0.00) 6.81 (3.32)
skscope 1.00 (0.00) 24.12 (5.83)
Non-Neg abess 1.00 (0.00) 2.26 (0.10)
skscope 1.00 (0.00) 14.80 (1.00)