scikit-learn-compatible Machine Learning Methods#

Classes#

skmodel

PortfolioSelection

Construct a sparse portfolio using skscope with MinVar or MeanVar measure.

NonlinearSelection

Select relevant features which may have nonlinear dependence on the target.

RobustRegression

A robust regression procedure via sparsity constrained exponential loss minimization.

MultivariateFailure

Multivariate failure time model.

IsotonicRegression

Isotonic regression.

class skscope.skmodel.IsotonicRegression(sparsity=5)[source]#

Isotonic regression.

Parameters:

sparsity (int, default=5) – The number of features to be selected, i.e., the sparsity level.

fit(X, y, sample_weight=None)[source]#

Fit the model using X, y as training data.

Parameters:
  • X (array-like of shape (n_samples,) or (n_samples, 1)) – Training data.

  • y (array-like of shape (n_samples,)) – Training target.

  • sample_weight (array-like of shape (n_samples,), default=None) – Weights. If set to None, all weights will be set to 1 (equal weights).

Returns:

self – Returns an instance of self.

Return type:

object

predict(X)[source]#

Predict new data by linear interpolation.

Parameters:

X (array-like of shape (n_samples,) or (n_samples, 1)) – Data to transform.

Returns:

y_pred – Transformed data.

Return type:

ndarray of shape (n_samples,)

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score\(R^2\) of self.predict(X) w.r.t. y.

Return type:

float

transform(X)[source]#

Transform new data by linear interpolation.

Parameters:

X (array-like of shape (n_samples,) or (n_samples, 1)) – Data to transform.

Returns:

y_pred – The transformed data.

Return type:

ndarray of shape (n_samples,)

class skscope.skmodel.MultivariateFailure(sparsity=5)[source]#

Multivariate failure time model.

Parameters:

sparsity (int, default=5) – The number of features to be selected, i.e., the sparsity level.

fit(X, y, delta, sample_weight=None)[source]#

Minimize negative partial log-likelihood with sparsity constraint for provided data.

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Data matrix

  • y (array-like, shape = (n_samples, n_events)) – Observed time of multiple events.

  • delta (array-like, shape = (n_samples, n_events)) – Indicator matrix of censoring.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted Estimator.

Return type:

object

predict(X)[source]#

Given the features, predict the hazard function up to some constant independent of the sample.

Parameters:

X (array-like, shape(n_samples, n_features)) – Feature matrix.

Returns:

hazard – the quantity \(e^{\beta^{\top}X_i}\) proportional to the harzard function up to some constant independent of the sample index \(i\) such that \(\lambda_k(t;X_{i})=\lambda_{0k}(t)e^{\beta^{\top}X_i}\).

Return type:

array, shape = (n_samples,)

score(X, y, delta, sample_weight=None)[source]#

Give test data, and it return the test score of this fitted model.

Parameters:
  • X (array-like, shape = (n_samples, n_features)) – Data matrix

  • y (array-like, shape = (n_samples, n_events)) – Observed time of multiple events.

  • delta (array-like, shape = (n_samples, n_events)) – Indicator matrix of censoring.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

score – The log likelihood of the given data.

Return type:

float

class skscope.skmodel.NonlinearSelection(sparsity=1, gamma_x=0.7, gamma_y=0.7)[source]#

Select relevant features which may have nonlinear dependence on the target.

Parameters:
  • sparsity (int, default=5) – The number of features to be selected, i.e., the sparsity level.

  • gamma_x (float, default=0.7) – The width parameter of Gaussian kernel for X.

  • gamma_y (float, default=0.7) – The width parameter of Gaussian kernel for y.

fit(X, y, sample_weight=None)[source]#

The fit function is used to comupte the coeffifient vector coef_ and those features corresponding to larger coefficients are considered having stronger dependence on the target.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted Estimator.

Return type:

object

score(X, y, sample_weight=None)[source]#

Give test data, and it return the test score of this fitted model.

Parameters:
  • X (array-like, shape(n_samples, n_features)) – Feature matrix.

  • y (array-like, shape(n_samples,)) – Target values.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

score – The negative loss on the given data.

Return type:

float

class skscope.skmodel.PortfolioSelection(sparsity=1, obj='MinVar', alpha=0, cov_matrix='lw', random_state=None)[source]#

Construct a sparse portfolio using skscope with MinVar or MeanVar measure.

Parameters:
  • sparsity (int, default=10) – The number of stocks to be chosen, i.e., the sparsity level

  • obj ({"MinVar", "MeanVar"}, default="MinVar") – The objective of the portfolio optimization

  • alpha (float, default=0) – The penalty coefficient of the return

  • cov_matrix ({"empirical", "lw"}, default="lw") – Specify the estimator of covariance matrix. If empirical, it will be the empirical estimator. If lw, it will be the LedoitWolf estimator.

  • random_state ({None, int, array_like[ints], SeedSequence, BitGenerator, Generator}, default=None) – The seed to initialize the parameter init_params in ScopeSolver

fit(X, y=None, sample_weight=None)[source]#

The fit function is used to comupte the weight of the desired sparse portfolio with a certain objective.

Parameters:
  • X (array-like of shape (n_periods, n_assets)) – Return data of n_assets assets spanning n_periods periods

  • y (ignored) – Not used, present here for API consistency by convention.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted Estimator.

Return type:

object

score(X, y=None, sample_weight=None, measure='Sharpe')[source]#

Give data, and it return the Sharpe ratio of the portfolio constructed with the weight self.coef_

Parameters:
  • X (array-like of shape (n_periods, n_assets)) – Return data of n_assets assets spanning n_periods periods

  • y (ignored) – Not used, present here for API consistency by convention.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

  • measure ({"Sharpe"}, default="Sharpe") – The measure of the performance of a portfolio.

Returns:

score – The Sharpe ratio of the constructed portfolio.

Return type:

float

class skscope.skmodel.RobustRegression(sparsity=1, gamma=1)[source]#

A robust regression procedure via sparsity constrained exponential loss minimization. Specifically, RobustRegression solves the following problem: \(\min_{\beta}-\sum_{i=1}^n\exp\{-(y_i-x_i^{\top}\beta)^2/\gamma\} \text{ s.t. } \|\beta\|_0 \leq s\) where \(\gamma\) is a hyperparameter controlling the degree of robustness and \(s\) is a hyperparameter controlling the sparsity level of \(\beta\).

Note: When \(\gamma\) is large, the exponential loss is approximately equivalent to \(|y_i-x_i^{\top}\beta|^2/\gamma\) and thus similar to the least square estimator. When \(\gamma\) is small, the sample \(i\) with large error \(|y_i-x_i^{\top}\beta|\) will obtain small impact on the estimation of \(\beta\) and thus limiting the impact of outlier (i.e., improve the robustness but reduce the efficiency). Therefore, \(\gamma\) need to be selected carefully with prior knowledge of the data or via some data-dirven methods (e.g. cross validation) to achieve a appropriate trade-off between robustness and efficiency of the resulting estimator.

Parameters:
  • sparsity (int, default=1) – The number of features to be selected, i.e., the sparsity level.

  • gamma (float, default=1) – The parameter controlling the degree of robustness for the estimator.

fit(X, y=None, sample_weight=None)[source]#

The fit function is used to comupte the coeffifient vector coef_.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Feature matrix.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

self – Fitted Estimator.

Return type:

object

score(X, y, sample_weight=None)[source]#

Give test data, and it return the test score of this fitted model.

Parameters:
  • X (array-like, shape(n_samples, n_features)) – Feature matrix.

  • y (array-like, shape(n_samples,)) – Target values.

  • sample_weight (ignored) – Not used, present here for API consistency by convention.

Returns:

score – The weighted exponential loss of the given data.

Return type:

float