Sparsity-Constraint Optimization Solvers#

Current supported solvers for sparsity-constraint optimization. These solvers inherit a BaseSolver.

Classes#

ScopeSolver

Get sparse optimal solution of convex objective function by Sparse-Constrained Optimization via Splicing Iteration (SCOPE) algorithm, which also can be used for variables selection.

HTPSolver

Get sparse optimal solution of convex objective function by Gradient Hard Thresholding Pursuit (GraHTP) algorithm.

IHTSolver

Get sparse optimal solution of convex objective function by Iterative Hard Thresholding (IHT) algorithm.

GraspSolver

Get sparse optimal solution of convex objective function by Gradient Support Pursuit (GraSP) algorithm.

FobaSolver

Get sparse optimal solution of convex objective function by Forward-Backward greedy (FoBa) algorithm.

ForwardSolver

Get sparse optimal solution of convex objective function by Forward Selection algorithm.

OMPSolver

Get sparse optimal solution of convex objective function by Orthogonal Matching Pursuit (OMP) algorithm.

class skscope.solver.FobaSolver(dimensionality, sparsity=None, sample_size=1, *, use_gradient=True, threshold=0.0, foba_threshold_ratio=0.5, strict_sparsity=True, preselect=[], numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#

Get sparse optimal solution of convex objective function by Forward-Backward greedy (FoBa) algorithm. Specifically, FobaSolver aims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.

Parameters:
  • dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).

  • sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is range(int(p/log(log(p))/log(p))).

  • sample_size (int, default=1) – Sample size, denoted as \(n\).

  • use_gradient (bool, default=True) – Whether to use gradient information to metric the importance of variables or not. Using gradient information will accelerate the algorithm but the solution may be not accurate.

  • threshold (float, default=0.0) – The threshold to determine whether a variable is selected or not.

  • foba_threshold_ratio (float, default=0.5) – The threshold for determining whether a variable is deleted or not will be set to threshold * foba_threshold_ratio.

  • strict_sparsity (bool, default=True) – Whether to strictly control the sparsity level to be sparsity or not.

  • preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.

  • numeric_solver (callable, optional) – A solver for the convex optimization problem. FobaSolver will call this function to solve the convex optimization problem in each iteration. It should have the same interface as skscope.convex_solver_nlopt.

  • max_iter (int, default=100) – Maximum number of iterations taken for converging.

  • group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples: [0,2,1,2] (not incremental), [1,2,3,3] (not start from 0), [0,2,2,3] (there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, and always_include=[0,3] means the 0-th and 3-th groups must be selected.

  • ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level. ic(loss, p, s, n) -> ic_value, where loss is the value of the objective function, p is the dimensionality, s is the sparsity level and n is the sample size. Used only when sparsity is array and cv is 1. Note that sample_size must be given when using ic_method.

  • cv (int, default=1) – The folds number when use the cross-validation method. - If cv = 1, the sparsity level will be chosen by the information criterion. - If cv > 1, the sparsity level will be chosen by the cross-validation method.

  • split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be (data, index) -> part_data where index is an array of int.

  • cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to cv. Used only when cv > 1.

  • random_state (int, optional) – The random seed used for cross-validation.

params#

The sparse optimal solution.

Type:

array of shape(dimensionality,)

objective_value#

The value of objective function on the solution.

Type:

float

support_set#

The indices of selected variables, sorted in ascending order.

Type:

array of int

References

Liu J, Ye J, Fujimaki R. Forward-backward greedy algorithms for general convex smooth functions over a cardinality constraint[C]//International Conference on Machine Learning. PMLR, 2014: 503-511.

class skscope.solver.ForwardSolver(dimensionality, sparsity=None, sample_size=1, *, threshold=0.0, strict_sparsity=True, preselect=[], numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#

Get sparse optimal solution of convex objective function by Forward Selection algorithm. Specifically, ForwardSolver aims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.

Parameters:
  • dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).

  • sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is range(int(p/log(log(p))/log(p))).

  • sample_size (int, default=1) – Sample size, denoted as \(n\).

  • threshold (float, default=0.0) – The threshold to determine whether a variable is selected or not.

  • strict_sparsity (bool, default=True) – Whether to strictly control the sparsity level to be sparsity or not.

  • preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.

  • numeric_solver (callable, optional) – A solver for the convex optimization problem. ForwardSolver will call this function to solve the convex optimization problem in each iteration. It should have the same interface as skscope.convex_solver_nlopt.

  • max_iter (int, default=100) – Maximum number of iterations taken for converging.

  • group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples: [0,2,1,2] (not incremental), [1,2,3,3] (not start from 0), [0,2,2,3] (there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, and always_include=[0,3] means the 0-th and 3-th groups must be selected.

  • ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level. ic(loss, p, s, n) -> ic_value, where loss is the value of the objective function, p is the dimensionality, s is the sparsity level and n is the sample size. Used only when sparsity is array and cv is 1. Note that sample_size must be given when using ic_method.

  • cv (int, default=1) – The folds number when use the cross-validation method. - If cv = 1, the sparsity level will be chosen by the information criterion. - If cv > 1, the sparsity level will be chosen by the cross-validation method.

  • split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be (data, index) -> part_data where index is an array of int.

  • cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to cv. Used only when cv > 1.

  • random_state (int, optional) – The random seed used for cross-validation.

params#

The sparse optimal solution.

Type:

array of shape(dimensionality,)

objective_value#

The value of objective function on the solution.

Type:

float

support_set#

The indices of selected variables, sorted in ascending order.

Type:

array of int

class skscope.solver.GraspSolver(dimensionality, sparsity=None, sample_size=1, *, preselect=[], numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#

Get sparse optimal solution of convex objective function by Gradient Support Pursuit (GraSP) algorithm. Specifically, GraspSolver aims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.

Parameters:
  • dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).

  • sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is range(int(p/log(log(p))/log(p))).

  • sample_size (int, default=1) – Sample size, denoted as \(n\).

  • preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.

  • numeric_solver (callable, optional) – A solver for the convex optimization problem. GraspSolver will call this function to solve the convex optimization problem in each iteration. It should have the same interface as skscope.convex_solver_nlopt.

  • max_iter (int, default=100) – Maximum number of iterations taken for converging.

  • group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples: [0,2,1,2] (not incremental), [1,2,3,3] (not start from 0), [0,2,2,3] (there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, and always_include=[0,3] means the 0-th and 3-th groups must be selected.

  • ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level. ic(loss, p, s, n) -> ic_value, where loss is the value of the objective function, p is the dimensionality, s is the sparsity level and n is the sample size. Used only when sparsity is array and cv is 1. Note that sample_size must be given when using ic_method.

  • cv (int, default=1) – The folds number when use the cross-validation method. - If cv = 1, the sparsity level will be chosen by the information criterion. - If cv > 1, the sparsity level will be chosen by the cross-validation method.

  • split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be (data, index) -> part_data where index is an array of int.

  • cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to cv. Used only when cv > 1.

  • random_state (int, optional) – The random seed used for cross-validation.

params#

The sparse optimal solution.

Type:

array of shape(dimensionality,)

objective_value#

The value of objective function on the solution.

Type:

float

support_set#

The indices of selected variables, sorted in ascending order.

Type:

array of int

References

Bahmani S, Raj B, Boufounos P T. Greedy sparsity-constrained optimization[J]. The Journal of Machine Learning Research, 2013, 14(1): 807-841.

class skscope.solver.HTPSolver(dimensionality, sparsity=None, sample_size=1, *, preselect=[], step_size=0.005, numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#

Get sparse optimal solution of convex objective function by Gradient Hard Thresholding Pursuit (GraHTP) algorithm. Specifically, HTPSolver aims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.

Parameters:
  • dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).

  • sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is range(int(p/log(log(p))/log(p))).

  • sample_size (int, default=1) – Sample size, denoted as \(n\).

  • preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.

  • step_size (float, default=0.005) – Step size of gradient descent.

  • numeric_solver (callable, optional) – A solver for the convex optimization problem. HTPSolver will call this function to solve the convex optimization problem in each iteration. It should have the same interface as skscope.convex_solver_nlopt.

  • max_iter (int, default=100) – Maximum number of iterations taken for converging.

  • group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples: [0,2,1,2] (not incremental), [1,2,3,3] (not start from 0), [0,2,2,3] (there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, and always_include=[0,3] means the 0-th and 3-th groups must be selected.

  • ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level. ic(loss, p, s, n) -> ic_value, where loss is the value of the objective function, p is the dimensionality, s is the sparsity level and n is the sample size. Used only when sparsity is array and cv is 1. Note that sample_size must be given when using ic_method.

  • cv (int, default=1) – The folds number when use the cross-validation method. - If cv = 1, the sparsity level will be chosen by the information criterion. - If cv > 1, the sparsity level will be chosen by the cross-validation method.

  • split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be (data, index) -> part_data where index is an array of int.

  • cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to cv. Used only when cv > 1.

  • random_state (int, optional) – The random seed used for cross-validation.

params#

The sparse optimal solution.

Type:

array of shape(dimensionality,)

objective_value#

The value of objective function on the solution.

Type:

float

support_set#

The indices of selected variables, sorted in ascending order.

Type:

array of int

References

Yuan X T, Li P, Zhang T. Gradient Hard Thresholding Pursuit[J]. J. Mach. Learn. Res., 2017, 18(1): 6027-6069.

class skscope.solver.IHTSolver(dimensionality, sparsity=None, sample_size=1, *, preselect=[], step_size=0.005, numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#

Get sparse optimal solution of convex objective function by Iterative Hard Thresholding (IHT) algorithm. Specifically, IHTSolver aims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.

Parameters:
  • dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).

  • sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is range(int(p/log(log(p))/log(p))).

  • sample_size (int, default=1) – Sample size, denoted as \(n\).

  • preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.

  • step_size (float, default=0.005) – Step size of gradient descent.

  • numeric_solver (callable, optional) – A solver for the convex optimization problem. IHTSolver will call this function to solve the convex optimization problem in each iteration. It should have the same interface as skscope.convex_solver_nlopt.

  • max_iter (int, default=100) – Maximum number of iterations taken for converging.

  • group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples: [0,2,1,2] (not incremental), [1,2,3,3] (not start from 0), [0,2,2,3] (there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, and always_include=[0,3] means the 0-th and 3-th groups must be selected.

  • ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level. ic(loss, p, s, n) -> ic_value, where loss is the value of the objective function, p is the dimensionality, s is the sparsity level and n is the sample size. Used only when sparsity is array and cv is 1. Note that sample_size must be given when using ic_method.

  • cv (int, default=1) – The folds number when use the cross-validation method. - If cv = 1, the sparsity level will be chosen by the information criterion. - If cv > 1, the sparsity level will be chosen by the cross-validation method.

  • split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be (data, index) -> part_data where index is an array of int.

  • cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to cv. Used only when cv > 1.

  • random_state (int, optional) – The random seed used for cross-validation.

params#

The sparse optimal solution.

Type:

array of shape(dimensionality,)

objective_value#

The value of objective function on the solution.

Type:

float

support_set#

The indices of selected variables, sorted in ascending order.

Type:

array of int

References

Yuan X T, Li P, Zhang T. Gradient Hard Thresholding Pursuit[J]. J. Mach. Learn. Res., 2017, 18(1): 6027-6069.

class skscope.solver.OMPSolver(dimensionality, sparsity=None, sample_size=1, *, threshold=0.0, strict_sparsity=True, preselect=[], numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#

Get sparse optimal solution of convex objective function by Orthogonal Matching Pursuit (OMP) algorithm. Specifically, OMPSolver aims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.

Parameters:
  • dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).

  • sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is range(int(p/log(log(p))/log(p))).

  • sample_size (int, default=1) – Sample size, denoted as \(n\).

  • threshold (float, default=0.0) – The threshold to determine whether a variable is selected or not.

  • strict_sparsity (bool, default=True) – Whether to strictly control the sparsity level to be sparsity or not.

  • preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.

  • numeric_solver (callable, optional) – A solver for the convex optimization problem. OMPSolver will call this function to solve the convex optimization problem in each iteration. It should have the same interface as skscope.convex_solver_nlopt.

  • max_iter (int, default=100) – Maximum number of iterations taken for converging.

  • group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples: [0,2,1,2] (not incremental), [1,2,3,3] (not start from 0), [0,2,2,3] (there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, and always_include=[0,3] means the 0-th and 3-th groups must be selected.

  • ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level. ic(loss, p, s, n) -> ic_value, where loss is the value of the objective function, p is the dimensionality, s is the sparsity level and n is the sample size. Used only when sparsity is array and cv is 1. Note that sample_size must be given when using ic_method.

  • cv (int, default=1) – The folds number when use the cross-validation method. - If cv = 1, the sparsity level will be chosen by the information criterion. - If cv > 1, the sparsity level will be chosen by the cross-validation method.

  • split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be (data, index) -> part_data where index is an array of int.

  • cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to cv. Used only when cv > 1.

  • random_state (int, optional) – The random seed used for cross-validation.

params#

The sparse optimal solution.

Type:

array of shape(dimensionality,)

objective_value#

The value of objective function on the solution.

Type:

float

support_set#

The indices of selected variables, sorted in ascending order.

Type:

array of int

References

Shalev-Shwartz S, Srebro N, Zhang T. Trading accuracy for sparsity in optimization problems with sparsity constraints[J]. SIAM Journal on Optimization, 2010, 20(6): 2807-2832.Shalev-Shwartz S, Srebro N, Zhang T. Trading accuracy for sparsity in optimization problems with sparsity constraints[J]. SIAM Journal on Optimization, 2010, 20(6): 2807-2832.

class skscope.solver.ScopeSolver(dimensionality, sparsity=None, sample_size=1, *, preselect=[], numeric_solver=convex_solver_nlopt, max_iter=20, ic_method=None, cv=1, split_method=None, cv_fold_id=None, group=None, important_search=128, screening_size=-1, max_exchange_num=5, is_dynamic_max_exchange_num=True, greedy=True, splicing_type='halve', path_type='seq', gs_lower_bound=None, gs_upper_bound=None, thread=1, random_state=None, console_log_level='off', file_log_level='off', log_file_name='logs/skscope.log')[source]#

Get sparse optimal solution of convex objective function by Sparse-Constrained Optimization via Splicing Iteration (SCOPE) algorithm, which also can be used for variables selection. Specifically, ScopeSolver aims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.

Parameters:
  • dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).

  • sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). default is range(int(p/log(log(p))/log(p))). Used only when path_type is “seq”.

  • sample_size (int, default=1) – Sample size, denoted as \(n\).

  • preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.

  • numeric_solver (callable, optional) – A solver for the convex optimization problem. ScopeSolver will call this function to solve the convex optimization problem in each iteration. It should have the same interface as skscope.convex_solver_nlopt.

  • max_iter (int, default=20) – Maximum number of iterations taken for converging.

  • ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level. ic(loss, p, s, n) -> ic_value, where loss is the value of the objective function, p is the dimensionality, s is the sparsity level and n is the sample size. Used only when sparsity is array and cv is 1. Note that sample_size must be given when using ic_method.

  • cv (int, default=1) – The folds number when use the cross-validation method. If cv = 1, the sparsity level will be chosen by the information criterion. If cv > 1, the sparsity level will be chosen by the cross-validation method.

  • split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be (data, index) -> part_data where index is an array of int.

  • cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to cv. Used only when cv > 1.

  • group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples: [0,2,1,2] (not incremental), [1,2,3,3] (not start from 0), [0,2,2,3] (there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, and always_include=[0,3] means the 0-th and 3-th groups must be selected.

  • important_search (int, default=128) – The number of important variables which need be splicing. This is used to reduce the computational cost. If it’s too large, it would greatly increase runtime.

  • screening_size (int, default=-1) – The number of variables remaining after the screening before variables select. Screening is used to reduce the computational cost. screening_size should be a non-negative number smaller than p, but larger than any value in sparsity. If screening_size is -1, screening will not be used. If screening_size is 0, screening_size will be set as int(p/log(log(p))/log(p)).

  • max_exchange_num (int, optional, default=5) – Maximum exchange number when splicing.

  • is_dynamic_max_exchange_num (bool, default=True) – If is_dynamic_max_exchange_num is True, max_exchange_num will be decreased dynamically according to the number of variables exchanged in the last iteration.

  • greedy (bool, default=True,) – If greedy is True, the first exchange-number which can reduce the objective function value will be selected. Otherwise, the exchange-number which can reduce the objective function value most will be selected.

  • splicing_type ({"halve", "taper"}, default="halve") – The type of reduce the exchange number in each iteration from max_exchange_num. “halve” for decreasing by half, “taper” for decresing by one.

  • path_type ({"seq", "gs"}, default="seq") – The method to be used to select the optimal sparsity level. For path_type = “seq”, we solve the problem for all sizes in sparsity successively. For path_type = “gs”, we solve the problem with sparsity level ranged between gs_lower_bound and gs_upper_bound, where the specific sparsity level to be considered is determined by golden section.

  • gs_lower_bound (int, default=0) – The lower bound of golden-section-search for sparsity searching. Used only when path_type = “gs”.

  • gs_upper_bound (int, optional) – The higher bound of golden-section-search for sparsity searching. Default is int(p/log(log(p))/log(p)). Used only when path_type = “gs”.

  • thread (int, default=1) – Max number of multithreads used for cross-validation. If thread = 0, the maximum number of threads supported by the device will be used.

  • random_state (int, optional) – The random seed used for cross-validation.

  • console_log_level (str, default="off") – The level of output log to console, which can be “off”, “error”, “warning”, “debug”. For example, if console_log_level is “warning”, only error and warning log will be output to console.

  • file_log_level (str, default="off") – The level of output log to file, which can be “off”, “error”, “warning”, “debug”. For example, if file_log_level is “off”, no log will be output to file.

  • log_file_name (str, default="logs/skscope.log") – The name (relative path) of log file, which is used to store the log information.

params#

The sparse optimal solution.

Type:

array of shape(dimensionality,)

objective_value#

The value of objective function on the solution.

Type:

float

support_set#

The indices of selected variables, sorted in ascending order.

Type:

array of int

cv_test_loss#

If cv=1, it stores the score under chosen information criterion. If cv>1, it stores the test objective under cross-validation.

Type:

float

cv_train_loss#

The objective on training data.

Type:

float

References

  • Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, and Xueqin Wang. A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117(52):33117-33123, 2020.

get_config(deep=True)[source]#

Get parameters for this solver.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this solver and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

get_estimated_params()[source]#

Get the optimal parameters of the objective function.

Returns:

parameters – The optimal solution of optimization.

Return type:

array of shape (dimensionality,)

get_result()[source]#

Get the result of optimization.

Returns:

results – The result of optimization, including the following keys:

  • paramsarray of shape (dimensionality,)

    The optimal parameters.

  • support_setarray of int

    The support set of the optimal parameters.

  • objective_valuefloat

    The value of objective function at the optimal parameters.

  • information_criterionfloat

    The value of information criterion.

  • cross_validation_lossfloat

    The mean loss of cross-validation.

Return type:

dict

get_support()[source]#

Get the support set of the optimal parameters.

Returns:

support_set – The indices of selected variables, sorted in ascending order.

Return type:

array of int

set_config(**params)[source]#

Set the parameters of this solver.

Parameters:

**params (dict) – Solver parameters.

Returns:

Solver instance.

Return type:

self

solve(objective, data=None, layers=[], init_support_set=None, init_params=None, gradient=None, jit=False)[source]#

Optimize the optimization objective function.

Parameters:
  • objective (callable) – The objective function to be minimized: objective(params, data) -> loss where params is a 1-D array with shape (dimensionality,) and data is the fixed parameters needed to completely specify the function. objective must be written in JAX library if gradient is not provided.

  • data (optional) – Extra arguments passed to the objective function and its derivatives (if existed).

  • layers (list of Layer objects, default=[]) – Layer is a “decorator” of the objective function. The parameters will be processed by the Layer before entering the objective function. The different layers can achieve different effects, and they can be sequentially concatenated together to form a larger layer, enabling the implementation of more complex functionalities. The Layer objects can be found in skscope.layers. If layers is not empty, objective must be written in JAX library.

  • init_support_set (array of int, default=[]) – The index of the variables in initial active set.

  • init_params (array of shape (dimensionality,), optional) – The initial value of parameters, default is an all-zero vector.

  • gradient (callable, optional) – A function that returns the gradient vector of parameters: gradient(params, data) -> array of shape (dimensionality,), where params is a 1-D array with shape (dimensionality,) and data is the fixed parameters needed to completely specify the function. If gradient is not provided, objective must be written in JAX library.

  • jit (bool, default=False) – If objective or gradient are written in JAX, jit can be set to True to speed up the optimization.

Returns:

parameters – The optimal solution of optimization.

Return type:

array of shape (dimensionality,)