scikit-learn-compatible Machine Learning Methods#

Classes#

skmodel

PortfolioSelection

Construct a sparse portfolio using skscope with MinVar or MeanVar measure.

NonlinearSelection

Select relevant features which may have nonlinear dependence on the target.

RobustRegression

A robust regression procedure via sparsity constrained exponential loss minimization.

MultivariateFailure

Multivariate failure time model.

class skscope.skmodel.MultivariateFailure(sparsity=5)[source]#

Multivariate failure time model.

Parameters:

sparsity (int, default=5) – The number of features to be selected, i.e., the sparsity level.

fit(X, y, delta, sample_weight=None)[source]#

Minimize negative partial log-likelihood with sparsity constraint for provided data.

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Data matrix

  • y (array-like, shape = (n_samples, n_events)) – Observed time of multiple events.

  • delta (array-like, shape = (n_samples, n_events)) – Indicator matrix of censoring.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted Estimator.

Return type:

object

predict(X)[source]#

Given the features, predict the hazard function up to some constant independent of the sample.

Parameters:

X (array-like, shape(n_samples, n_features)) – Feature matrix.

Returns:

hazard – the quantity \(e^{\beta^{\top}X_i}\) proportional to the harzard function up to some constant independent of the sample index \(i\) such that \(\lambda_k(t;X_{i})=\lambda_{0k}(t)e^{\beta^{\top}X_i}\).

Return type:

array, shape = (n_samples,)

score(X, y, delta, sample_weight=None)[source]#

Give test data, and it return the test score of this fitted model.

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Data matrix

  • y (array-like, shape = (n_samples, n_events)) – Observed time of multiple events.

  • delta (array-like, shape = (n_samples, n_events)) – Indicator matrix of censoring.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

score – The log likelihood of the given data.

Return type:

float

class skscope.skmodel.NonlinearSelection(sparsity=1, gamma_x=0.7, gamma_y=0.7)[source]#

Select relevant features which may have nonlinear dependence on the target.

Parameters:
  • sparsity (int, default=5) – The number of features to be selected, i.e., the sparsity level.

  • gamma_x (float, default=0.7) – The width parameter of Gaussian kernel for X.

  • gamma_y (float, default=0.7) – The width parameter of Gaussian kernel for y.

fit(X, y, sample_weight=None)[source]#

The fit function is used to comupte the coeffifient vector coef_ and those features corresponding to larger coefficients are considered having stronger dependence on the target.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted Estimator.

Return type:

object

score(X, y, sample_weight=None)[source]#

Give test data, and it return the test score of this fitted model.

Parameters:
  • X (array-like, shape(n_samples, n_features)) – Feature matrix.

  • y (array-like, shape(n_samples,)) – Target values.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

score – The negative loss on the given data.

Return type:

float

class skscope.skmodel.PortfolioSelection(sparsity=1, obj='MinVar', alpha=0, cov_matrix='lw', random_state=None)[source]#

Construct a sparse portfolio using skscope with MinVar or MeanVar measure.

Parameters:
  • sparsity (int, default=10) – The number of stocks to be chosen, i.e., the sparsity level

  • obj ({"MinVar", "MeanVar"}, default="MinVar") – The objective of the portfolio optimization

  • alpha (float, default=0) – The penalty coefficient of the return

  • cov_matrix ({"empirical", "lw"}, default="lw") – Specify the estimator of covariance matrix. If empirical, it will be the empirical estimator. If lw, it will be the LedoitWolf estimator.

  • random_state ({None, int, array_like[ints], SeedSequence, BitGenerator, Generator}, default=None) – The seed to initialize the parameter init_params in ScopeSolver

fit(X, y=None, sample_weight=None)[source]#

The fit function is used to comupte the weight of the desired sparse portfolio with a certain objective.

Parameters:
  • X (array-like of shape (n_periods, n_assets)) – Return data of n_assets assets spanning n_periods periods

  • y (ignored) – Not used, present here for API consistency by convention.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted Estimator.

Return type:

object

score(X, y=None, sample_weight=None, measure='Sharpe')[source]#

Give data, and it return the Sharpe ratio of the portfolio constructed with the weight self.coef_

Parameters:
  • X (array-like of shape (n_periods, n_assets)) – Return data of n_assets assets spanning n_periods periods

  • y (ignored) – Not used, present here for API consistency by convention.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

  • measure ({"Sharpe"}, default="Sharpe") – The measure of the performance of a portfolio.

Returns:

score – The Sharpe ratio of the constructed portfolio.

Return type:

float

class skscope.skmodel.RobustRegression(sparsity=1, gamma=1)[source]#

A robust regression procedure via sparsity constrained exponential loss minimization. Specifically, RobustRegression solves the following problem: \(\min_{\beta}-\sum_{i=1}^n\exp\{-(y_i-x_i^{\top}\beta)^2/\gamma\} \text{ s.t. } \|\beta\|_0 \leq s\) where \(\gamma\) is a hyperparameter controlling the degree of robustness and \(s\) is a hyperparameter controlling the sparsity level of \(\beta\).

Note: When \(\gamma\) is large, the exponential loss is approximately equivalent to \(|y_i-x_i^{\top}\beta|^2/\gamma\) and thus similar to the least square estimator. When \(\gamma\) is small, the sample \(i\) with large error \(|y_i-x_i^{\top}\beta|\) will obtain small impact on the estimation of \(\beta\) and thus limiting the impact of outlier (i.e., improve the robustness but reduce the efficiency). Therefore, \(\gamma\) need to be selected carefully with prior knowledge of the data or via some data-dirven methods (e.g. cross validation) to achieve a appropriate trade-off between robustness and efficiency of the resulting estimator.

Parameters:
  • sparsity (int, default=1) – The number of features to be selected, i.e., the sparsity level.

  • gamma (float, default=1) – The parameter controlling the degree of robustness for the estimator.

fit(X, y=None, sample_weight=None)[source]#

The fit function is used to comupte the coeffifient vector coef_.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted Estimator.

Return type:

object

score(X, y, sample_weight=None)[source]#

Give test data, and it return the test score of this fitted model.

Parameters:
  • X (array-like, shape(n_samples, n_features)) – Feature matrix.

  • y (array-like, shape(n_samples,)) – Target values.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

score – The weighted exponential loss of the given data.

Return type:

float