Sparsity-Constraint Optimization Solvers#
Current supported solvers for sparsity-constraint optimization. These solvers inherit a BaseSolver.
Classes#
Get sparse optimal solution of convex objective function by Sparse-Constrained Optimization via Splicing Iteration (SCOPE) algorithm, which also can be used for variables selection. |
|
Get sparse optimal solution of convex objective function by Gradient Hard Thresholding Pursuit (GraHTP) algorithm. |
|
Get sparse optimal solution of convex objective function by Iterative Hard Thresholding (IHT) algorithm. |
|
Get sparse optimal solution of convex objective function by Gradient Support Pursuit (GraSP) algorithm. |
|
Get sparse optimal solution of convex objective function by Forward-Backward greedy (FoBa) algorithm. |
|
Get sparse optimal solution of convex objective function by Forward Selection algorithm. |
|
Get sparse optimal solution of convex objective function by Orthogonal Matching Pursuit (OMP) algorithm. |
- class skscope.solver.FobaSolver(dimensionality, sparsity=None, sample_size=1, *, use_gradient=True, threshold=0.0, foba_threshold_ratio=0.5, strict_sparsity=True, preselect=[], numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#
Get sparse optimal solution of convex objective function by Forward-Backward greedy (FoBa) algorithm. Specifically,
FobaSolveraims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.- Parameters:
dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).
sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is
range(int(p/log(log(p))/log(p))).sample_size (int, default=1) – Sample size, denoted as \(n\).
use_gradient (bool, default=True) – Whether to use gradient information to metric the importance of variables or not. Using gradient information will accelerate the algorithm but the solution may be not accurate.
threshold (float, default=0.0) – The threshold to determine whether a variable is selected or not.
foba_threshold_ratio (float, default=0.5) – The threshold for determining whether a variable is deleted or not will be set to
threshold*foba_threshold_ratio.strict_sparsity (bool, default=True) – Whether to strictly control the sparsity level to be
sparsityor not.preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.
numeric_solver (callable, optional) – A solver for the convex optimization problem.
FobaSolverwill call this function to solve the convex optimization problem in each iteration. It should have the same interface asskscope.convex_solver_nlopt.max_iter (int, default=100) – Maximum number of iterations taken for converging.
group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples:
[0,2,1,2](not incremental),[1,2,3,3](not start from 0),[0,2,2,3](there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, andalways_include=[0,3]means the 0-th and 3-th groups must be selected.ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level.
ic(loss, p, s, n) -> ic_value, wherelossis the value of the objective function,pis the dimensionality,sis the sparsity level andnis the sample size. Used only whensparsityis array andcvis 1. Note thatsample_sizemust be given when usingic_method.cv (int, default=1) – The folds number when use the cross-validation method. - If
cv= 1, the sparsity level will be chosen by the information criterion. - Ifcv> 1, the sparsity level will be chosen by the cross-validation method.split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be
(data, index) -> part_datawhereindexis an array of int.cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to
cv. Used only when cv > 1.random_state (int, optional) – The random seed used for cross-validation.
- params#
The sparse optimal solution.
- Type:
array of shape(dimensionality,)
- objective_value#
The value of objective function on the solution.
- Type:
float
- support_set#
The indices of selected variables, sorted in ascending order.
- Type:
array of int
References
Liu J, Ye J, Fujimaki R. Forward-backward greedy algorithms for general convex smooth functions over a cardinality constraint[C]//International Conference on Machine Learning. PMLR, 2014: 503-511.
- class skscope.solver.ForwardSolver(dimensionality, sparsity=None, sample_size=1, *, threshold=0.0, strict_sparsity=True, preselect=[], numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#
Get sparse optimal solution of convex objective function by Forward Selection algorithm. Specifically,
ForwardSolveraims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.- Parameters:
dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).
sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is
range(int(p/log(log(p))/log(p))).sample_size (int, default=1) – Sample size, denoted as \(n\).
threshold (float, default=0.0) – The threshold to determine whether a variable is selected or not.
strict_sparsity (bool, default=True) – Whether to strictly control the sparsity level to be
sparsityor not.preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.
numeric_solver (callable, optional) – A solver for the convex optimization problem.
ForwardSolverwill call this function to solve the convex optimization problem in each iteration. It should have the same interface asskscope.convex_solver_nlopt.max_iter (int, default=100) – Maximum number of iterations taken for converging.
group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples:
[0,2,1,2](not incremental),[1,2,3,3](not start from 0),[0,2,2,3](there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, andalways_include=[0,3]means the 0-th and 3-th groups must be selected.ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level.
ic(loss, p, s, n) -> ic_value, wherelossis the value of the objective function,pis the dimensionality,sis the sparsity level andnis the sample size. Used only whensparsityis array andcvis 1. Note thatsample_sizemust be given when usingic_method.cv (int, default=1) – The folds number when use the cross-validation method. - If
cv= 1, the sparsity level will be chosen by the information criterion. - Ifcv> 1, the sparsity level will be chosen by the cross-validation method.split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be
(data, index) -> part_datawhereindexis an array of int.cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to
cv. Used only when cv > 1.random_state (int, optional) – The random seed used for cross-validation.
- params#
The sparse optimal solution.
- Type:
array of shape(dimensionality,)
- objective_value#
The value of objective function on the solution.
- Type:
float
- support_set#
The indices of selected variables, sorted in ascending order.
- Type:
array of int
- class skscope.solver.GraspSolver(dimensionality, sparsity=None, sample_size=1, *, preselect=[], numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#
Get sparse optimal solution of convex objective function by Gradient Support Pursuit (GraSP) algorithm. Specifically,
GraspSolveraims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.- Parameters:
dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).
sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is
range(int(p/log(log(p))/log(p))).sample_size (int, default=1) – Sample size, denoted as \(n\).
preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.
numeric_solver (callable, optional) – A solver for the convex optimization problem.
GraspSolverwill call this function to solve the convex optimization problem in each iteration. It should have the same interface asskscope.convex_solver_nlopt.max_iter (int, default=100) – Maximum number of iterations taken for converging.
group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples:
[0,2,1,2](not incremental),[1,2,3,3](not start from 0),[0,2,2,3](there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, andalways_include=[0,3]means the 0-th and 3-th groups must be selected.ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level.
ic(loss, p, s, n) -> ic_value, wherelossis the value of the objective function,pis the dimensionality,sis the sparsity level andnis the sample size. Used only whensparsityis array andcvis 1. Note thatsample_sizemust be given when usingic_method.cv (int, default=1) – The folds number when use the cross-validation method. - If
cv= 1, the sparsity level will be chosen by the information criterion. - Ifcv> 1, the sparsity level will be chosen by the cross-validation method.split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be
(data, index) -> part_datawhereindexis an array of int.cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to
cv. Used only when cv > 1.random_state (int, optional) – The random seed used for cross-validation.
- params#
The sparse optimal solution.
- Type:
array of shape(dimensionality,)
- objective_value#
The value of objective function on the solution.
- Type:
float
- support_set#
The indices of selected variables, sorted in ascending order.
- Type:
array of int
References
Bahmani S, Raj B, Boufounos P T. Greedy sparsity-constrained optimization[J]. The Journal of Machine Learning Research, 2013, 14(1): 807-841.
- class skscope.solver.HTPSolver(dimensionality, sparsity=None, sample_size=1, *, preselect=[], step_size=0.005, numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#
Get sparse optimal solution of convex objective function by Gradient Hard Thresholding Pursuit (GraHTP) algorithm. Specifically,
HTPSolveraims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.- Parameters:
dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).
sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is
range(int(p/log(log(p))/log(p))).sample_size (int, default=1) – Sample size, denoted as \(n\).
preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.
step_size (float, default=0.005) – Step size of gradient descent.
numeric_solver (callable, optional) – A solver for the convex optimization problem.
HTPSolverwill call this function to solve the convex optimization problem in each iteration. It should have the same interface asskscope.convex_solver_nlopt.max_iter (int, default=100) – Maximum number of iterations taken for converging.
group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples:
[0,2,1,2](not incremental),[1,2,3,3](not start from 0),[0,2,2,3](there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, andalways_include=[0,3]means the 0-th and 3-th groups must be selected.ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level.
ic(loss, p, s, n) -> ic_value, wherelossis the value of the objective function,pis the dimensionality,sis the sparsity level andnis the sample size. Used only whensparsityis array andcvis 1. Note thatsample_sizemust be given when usingic_method.cv (int, default=1) – The folds number when use the cross-validation method. - If
cv= 1, the sparsity level will be chosen by the information criterion. - Ifcv> 1, the sparsity level will be chosen by the cross-validation method.split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be
(data, index) -> part_datawhereindexis an array of int.cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to
cv. Used only when cv > 1.random_state (int, optional) – The random seed used for cross-validation.
- params#
The sparse optimal solution.
- Type:
array of shape(dimensionality,)
- objective_value#
The value of objective function on the solution.
- Type:
float
- support_set#
The indices of selected variables, sorted in ascending order.
- Type:
array of int
References
Yuan X T, Li P, Zhang T. Gradient Hard Thresholding Pursuit[J]. J. Mach. Learn. Res., 2017, 18(1): 6027-6069.
- class skscope.solver.IHTSolver(dimensionality, sparsity=None, sample_size=1, *, preselect=[], step_size=0.005, numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#
Get sparse optimal solution of convex objective function by Iterative Hard Thresholding (IHT) algorithm. Specifically,
IHTSolveraims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.- Parameters:
dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).
sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is
range(int(p/log(log(p))/log(p))).sample_size (int, default=1) – Sample size, denoted as \(n\).
preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.
step_size (float, default=0.005) – Step size of gradient descent.
numeric_solver (callable, optional) – A solver for the convex optimization problem.
IHTSolverwill call this function to solve the convex optimization problem in each iteration. It should have the same interface asskscope.convex_solver_nlopt.max_iter (int, default=100) – Maximum number of iterations taken for converging.
group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples:
[0,2,1,2](not incremental),[1,2,3,3](not start from 0),[0,2,2,3](there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, andalways_include=[0,3]means the 0-th and 3-th groups must be selected.ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level.
ic(loss, p, s, n) -> ic_value, wherelossis the value of the objective function,pis the dimensionality,sis the sparsity level andnis the sample size. Used only whensparsityis array andcvis 1. Note thatsample_sizemust be given when usingic_method.cv (int, default=1) – The folds number when use the cross-validation method. - If
cv= 1, the sparsity level will be chosen by the information criterion. - Ifcv> 1, the sparsity level will be chosen by the cross-validation method.split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be
(data, index) -> part_datawhereindexis an array of int.cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to
cv. Used only when cv > 1.random_state (int, optional) – The random seed used for cross-validation.
- params#
The sparse optimal solution.
- Type:
array of shape(dimensionality,)
- objective_value#
The value of objective function on the solution.
- Type:
float
- support_set#
The indices of selected variables, sorted in ascending order.
- Type:
array of int
References
Yuan X T, Li P, Zhang T. Gradient Hard Thresholding Pursuit[J]. J. Mach. Learn. Res., 2017, 18(1): 6027-6069.
- class skscope.solver.OMPSolver(dimensionality, sparsity=None, sample_size=1, *, threshold=0.0, strict_sparsity=True, preselect=[], numeric_solver=convex_solver_nlopt, max_iter=100, group=None, ic_method=None, cv=1, cv_fold_id=None, split_method=None, random_state=None)[source]#
Get sparse optimal solution of convex objective function by Orthogonal Matching Pursuit (OMP) algorithm. Specifically,
OMPSolveraims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.- Parameters:
dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).
sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). Default is
range(int(p/log(log(p))/log(p))).sample_size (int, default=1) – Sample size, denoted as \(n\).
threshold (float, default=0.0) – The threshold to determine whether a variable is selected or not.
strict_sparsity (bool, default=True) – Whether to strictly control the sparsity level to be
sparsityor not.preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.
numeric_solver (callable, optional) – A solver for the convex optimization problem.
OMPSolverwill call this function to solve the convex optimization problem in each iteration. It should have the same interface asskscope.convex_solver_nlopt.max_iter (int, default=100) – Maximum number of iterations taken for converging.
group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples:
[0,2,1,2](not incremental),[1,2,3,3](not start from 0),[0,2,2,3](there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, andalways_include=[0,3]means the 0-th and 3-th groups must be selected.ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level.
ic(loss, p, s, n) -> ic_value, wherelossis the value of the objective function,pis the dimensionality,sis the sparsity level andnis the sample size. Used only whensparsityis array andcvis 1. Note thatsample_sizemust be given when usingic_method.cv (int, default=1) – The folds number when use the cross-validation method. - If
cv= 1, the sparsity level will be chosen by the information criterion. - Ifcv> 1, the sparsity level will be chosen by the cross-validation method.split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be
(data, index) -> part_datawhereindexis an array of int.cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to
cv. Used only when cv > 1.random_state (int, optional) – The random seed used for cross-validation.
- params#
The sparse optimal solution.
- Type:
array of shape(dimensionality,)
- objective_value#
The value of objective function on the solution.
- Type:
float
- support_set#
The indices of selected variables, sorted in ascending order.
- Type:
array of int
References
Shalev-Shwartz S, Srebro N, Zhang T. Trading accuracy for sparsity in optimization problems with sparsity constraints[J]. SIAM Journal on Optimization, 2010, 20(6): 2807-2832.Shalev-Shwartz S, Srebro N, Zhang T. Trading accuracy for sparsity in optimization problems with sparsity constraints[J]. SIAM Journal on Optimization, 2010, 20(6): 2807-2832.
- class skscope.solver.ScopeSolver(dimensionality, sparsity=None, sample_size=1, *, preselect=[], numeric_solver=convex_solver_nlopt, max_iter=20, ic_method=None, cv=1, split_method=None, cv_fold_id=None, group=None, important_search=128, screening_size=-1, max_exchange_num=5, is_dynamic_max_exchange_num=True, greedy=True, splicing_type='halve', path_type='seq', gs_lower_bound=None, gs_upper_bound=None, thread=1, random_state=None, console_log_level='off', file_log_level='off', log_file_name='logs/skscope.log')[source]#
Get sparse optimal solution of convex objective function by Sparse-Constrained Optimization via Splicing Iteration (SCOPE) algorithm, which also can be used for variables selection. Specifically,
ScopeSolveraims to tackle this problem: \(\min_{x \in R^p} f(x) \text{ s.t. } ||x||_0 \leq s\), where \(f(x)\) is a convex objective function and \(s\) is the sparsity level. Each element of \(x\) can be seen as a variable, and the nonzero elements of \(x\) are the selected variables.- Parameters:
dimensionality (int) – Dimension of the optimization problem, which is also the total number of variables that will be considered to select or not, denoted as \(p\).
sparsity (int or array of int, optional) – The sparsity level, which is the number of nonzero elements of the optimal solution, denoted as \(s\). default is
range(int(p/log(log(p))/log(p))). Used only whenpath_typeis “seq”.sample_size (int, default=1) – Sample size, denoted as \(n\).
preselect (array of int, default=[]) – An array contains the indexes of variables which must be selected.
numeric_solver (callable, optional) – A solver for the convex optimization problem.
ScopeSolverwill call this function to solve the convex optimization problem in each iteration. It should have the same interface asskscope.convex_solver_nlopt.max_iter (int, default=20) – Maximum number of iterations taken for converging.
ic_method (callable, optional) – A function to calculate the information criterion for choosing the sparsity level.
ic(loss, p, s, n) -> ic_value, wherelossis the value of the objective function,pis the dimensionality,sis the sparsity level andnis the sample size. Used only whensparsityis array andcvis 1. Note thatsample_sizemust be given when usingic_method.cv (int, default=1) – The folds number when use the cross-validation method. If
cv= 1, the sparsity level will be chosen by the information criterion. Ifcv> 1, the sparsity level will be chosen by the cross-validation method.split_method (callable, optional) – A function to get the part of data used in each fold of cross-validation. Its interface should be
(data, index) -> part_datawhereindexis an array of int.cv_fold_id (array of shape (sample_size,), optional) – An array indicates different folds in CV, which samples in the same fold should be given the same number. The number of different elements should be equal to
cv. Used only when cv > 1.group (array of shape (dimensionality,), default=range(dimensionality)) – The group index for each variable, and it must be an incremental integer array starting from 0 without gap. The variables in the same group must be adjacent, and they will be selected together or not. Here are wrong examples:
[0,2,1,2](not incremental),[1,2,3,3](not start from 0),[0,2,2,3](there is a gap). It’s worth mentioning that the concept “a variable” means “a group of variables” in fact. For example,``sparsity=[3]`` means there will be 3 groups of variables selected rather than 3 variables, andalways_include=[0,3]means the 0-th and 3-th groups must be selected.important_search (int, default=128) – The number of important variables which need be splicing. This is used to reduce the computational cost. If it’s too large, it would greatly increase runtime.
screening_size (int, default=-1) – The number of variables remaining after the screening before variables select. Screening is used to reduce the computational cost.
screening_sizeshould be a non-negative number smaller than p, but larger than any value in sparsity. Ifscreening_sizeis -1, screening will not be used. Ifscreening_sizeis 0,screening_sizewill be set asint(p/log(log(p))/log(p)).max_exchange_num (int, optional, default=5) – Maximum exchange number when splicing.
is_dynamic_max_exchange_num (bool, default=True) – If
is_dynamic_max_exchange_numis True,max_exchange_numwill be decreased dynamically according to the number of variables exchanged in the last iteration.greedy (bool, default=True,) – If
greedyis True, the first exchange-number which can reduce the objective function value will be selected. Otherwise, the exchange-number which can reduce the objective function value most will be selected.splicing_type ({"halve", "taper"}, default="halve") – The type of reduce the exchange number in each iteration from
max_exchange_num. “halve” for decreasing by half, “taper” for decresing by one.path_type ({"seq", "gs"}, default="seq") – The method to be used to select the optimal sparsity level. For path_type = “seq”, we solve the problem for all sizes in sparsity successively. For path_type = “gs”, we solve the problem with sparsity level ranged between gs_lower_bound and gs_upper_bound, where the specific sparsity level to be considered is determined by golden section.
gs_lower_bound (int, default=0) – The lower bound of golden-section-search for sparsity searching. Used only when path_type = “gs”.
gs_upper_bound (int, optional) – The higher bound of golden-section-search for sparsity searching. Default is
int(p/log(log(p))/log(p)). Used only when path_type = “gs”.thread (int, default=1) – Max number of multithreads used for cross-validation. If thread = 0, the maximum number of threads supported by the device will be used.
random_state (int, optional) – The random seed used for cross-validation.
console_log_level (str, default="off") – The level of output log to console, which can be “off”, “error”, “warning”, “debug”. For example, if
console_log_levelis “warning”, only error and warning log will be output to console.file_log_level (str, default="off") – The level of output log to file, which can be “off”, “error”, “warning”, “debug”. For example, if
file_log_levelis “off”, no log will be output to file.log_file_name (str, default="logs/skscope.log") – The name (relative path) of log file, which is used to store the log information.
- params#
The sparse optimal solution.
- Type:
array of shape(dimensionality,)
- objective_value#
The value of objective function on the solution.
- Type:
float
- support_set#
The indices of selected variables, sorted in ascending order.
- Type:
array of int
- cv_test_loss#
If cv=1, it stores the score under chosen information criterion. If cv>1, it stores the test objective under cross-validation.
- Type:
float
- cv_train_loss#
The objective on training data.
- Type:
float
References
Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, and Xueqin Wang. A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117(52):33117-33123, 2020.
- get_config(deep=True)[source]#
Get parameters for this solver.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this solver and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
dict
- get_estimated_params()[source]#
Get the optimal parameters of the objective function.
- Returns:
parameters – The optimal solution of optimization.
- Return type:
array of shape (dimensionality,)
- get_result()[source]#
Get the result of optimization.
- Returns:
results – The result of optimization, including the following keys:
paramsarray of shape (dimensionality,)The optimal parameters.
support_setarray of intThe support set of the optimal parameters.
objective_valuefloatThe value of objective function at the optimal parameters.
information_criterionfloatThe value of information criterion.
cross_validation_lossfloatThe mean loss of cross-validation.
- Return type:
dict
- get_support()[source]#
Get the support set of the optimal parameters.
- Returns:
support_set – The indices of selected variables, sorted in ascending order.
- Return type:
array of int
- set_config(**params)[source]#
Set the parameters of this solver.
- Parameters:
**params (dict) – Solver parameters.
- Returns:
Solver instance.
- Return type:
self
- solve(objective, data=None, layers=[], init_support_set=None, init_params=None, gradient=None, jit=False)[source]#
Optimize the optimization objective function.
- Parameters:
objective (callable) – The objective function to be minimized:
objective(params, data) -> losswhereparamsis a 1-D array with shape (dimensionality,) anddatais the fixed parameters needed to completely specify the function.objectivemust be written inJAXlibrary ifgradientis not provided.data (optional) – Extra arguments passed to the objective function and its derivatives (if existed).
layers (list of
Layerobjects, default=[]) –Layeris a “decorator” of the objective function. The parameters will be processed by theLayerbefore entering the objective function. The different layers can achieve different effects, and they can be sequentially concatenated together to form a larger layer, enabling the implementation of more complex functionalities. TheLayerobjects can be found inskscope.layers. Iflayersis not empty,objectivemust be written inJAXlibrary.init_support_set (array of int, default=[]) – The index of the variables in initial active set.
init_params (array of shape (dimensionality,), optional) – The initial value of parameters, default is an all-zero vector.
gradient (callable, optional) – A function that returns the gradient vector of parameters:
gradient(params, data) -> array of shape (dimensionality,), whereparamsis a 1-D array with shape (dimensionality,) anddatais the fixed parameters needed to completely specify the function. Ifgradientis not provided,objectivemust be written inJAXlibrary.jit (bool, default=False) – If
objectiveorgradientare written in JAX,jitcan be set to True to speed up the optimization.
- Returns:
parameters – The optimal solution of optimization.
- Return type:
array of shape (dimensionality,)