Logistic Regression

class snapml.LogisticRegression(max_iter=1000, regularizer=1.0, device_ids=[], verbose=False, use_gpu=False, class_weight=None, dual=True, n_jobs=1, penalty='l2', tol=0.001, generate_training_history=None, privacy=False, eta=0.3, batch_size=100, privacy_epsilon=10, grad_clip=1, fit_intercept=False, intercept_scaling=1.0, normalize=False, kernel='linear', gamma=1.0, n_components=100, random_state=None)

Logistic Regression classifier

This class implements regularized logistic regression using the IBM Snap ML solver. It supports both local and distributed(MPI) methods of the Snap ML solver. It can be used for both binary and multi-class classification problems. For multi-class classification it predicts only classes (no probabilities). It handles both dense and sparse matrix inputs. Use csr, csc, ndarray, deviceNDArray or SnapML data partition format for training and csr, ndarray or SnapML data partition format for prediction. DeviceNDArray input data format is currently not supported for training with MPI implementation. We recommend the user to first normalize the input values.

Parameters:
max_iterint, default=1000

Maximum number of iterations used by the solver to converge.

regularizerfloat, default=1.0

Regularization strength. It must be a positive float. Larger regularization values imply stronger regularization.

use_gpubool, default=False

Flag for indicating the hardware platform used for training. If True, the training is performed using the GPU. If False, the training is performed using the CPU.

device_idsarray-like of int, default=[]

If use_gpu is True, it indicates the IDs of the GPUs used for training. For single-GPU training, set device_ids to the GPU ID to be used for training, e.g., [0]. For multi-GPU training, set device_ids to a list of GPU IDs to be used for training, e.g., [0, 1].

class_weight{‘balanced’, None}, default=None

If set to ‘None’, all classes will have weight 1.

dualbool, default=True

Dual or primal formulation. Recommendation: if n_samples > n_features use dual=True.

verbosebool, default=False

If True, it prints the training cost, one per iteration. Warning: this will increase the training time. For performance evaluation, use verbose=False.

n_jobsint, default=1

The number of threads used for running the training. The value of this parameter should be a multiple of 32 if the training is performed on GPU (use_gpu=True).

penalty{‘l1’, ‘l2’}, default=’l2’

The regularization / penalty type. Possible values are “l2” for L2 regularization (LogisticRegression) or “l1” for L1 regularization (SparseLogisticRegression). L1 regularization is possible only for the primal optimization problem (dual=False).

tolfloat, default=0.001

The tolerance parameter. Training will finish when maximum change in model coefficients is less than tol.

generate_training_history{‘summary’, ‘full’, None}, default=None

Determines the level of summary statistics that are generated during training. By default no information is generated (None), but this parameter can be set to “summary”, to obtain summary statistics at the end of training, or “full” to obtain a complete set of statistics for the entire training procedure. Note, enabling either option will result in slower training. generate_training_history is not supported for DeviceNDArray input format.

privacybool, default=False

Train the model using a differentially private algorithm. Currently not supported for MPI implementation.

etafloat, default=0.3

Learning rate for the differentially private training algorithm. Currently not supported for MPI implementation.

batch_sizeint, default=100

Mini-batch size for the differentially private training algorithm. Currently not supported for MPI implementation.

privacy_epsilonfloat, default=10.0

Target privacy gaurantee. Learned model will be (privacy_epsilon, 0.01)-private. Currently not supported for MPI implementation.

grad_clipfloat, default=1.0

Gradient clipping parameter for the differentially private training algorithm. Currently not supported for MPI implementation.

fit_interceptbool, default=False

Add bias term – note, may affect speed of convergence, especially for sparse datasets.

intercept_scalingfloat, default=1.0

Scaling of bias term. The inclusion of a bias term is implemented by appending an additional feature to the dataset. This feature has a constant value, that can be set using this parameter.

normalizebool, default=False

Normalize rows of dataset (recommended for fast convergence).

kernel{‘rbf’, ‘linear’}, default=’linear’

Approximate feature map of a specified kernel function.

gammafloat, default=1.0

Parameter of RBF kernel: exp(-gamma * x^2)

n_componentsint, default=100

Dimensionality of the feature space when approximating a kernel function.

random_stateint, or None, default=None

If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.

Attributes:
coef_array-like, shape (n_features, 1) for binary classification or

(n_features, n_classes) for multi-class classification. Coefficients of the features in the trained model.

intercept_: ndarray of shape (1,) or (n_classes,)

Intercept (bias) added to the decision function. If fit_intercept is False, the intercept is set to zero. intercept_ is of shape (1,) when the given problem is binary.

support_array-like

Indices of the features that contribute to the decision. (only available for L1) Currently not supported for MPI implementation.

model_sparsity_float

fraction of non-zeros in the model parameters. (only available for L1) Currently not supported for MPI implementation.

training_history_dict

Training history statistics.

fit(X_train, y_train=None)

Fit the model according to the given train dataset.

Parameters:
X_trainTrain dataset. Supports the following input data-types :
  1. Sparse matrix (csr_matrix, csc_matrix) or dense matrix (ndarray)

  2. DeviceNDArray. Not supported for MPI execution.

  3. SnapML data partition of type DensePartition, SparsePartition or ConstantValueSparsePartition

y_trainThe target corresponding to X_train.

If X_train is sparse matrix or dense matrix, y_train should be array-like of shape = (n_samples,) In case of deviceNDArray, y_train should be array-like of shape = (n_samples, 1) If X_train is SnapML data partition type, then y_train is not required (i.e. None).

Returns:
selfobject
get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X, n_jobs=None)

Class predictions

The returned class estimates.

Parameters:
XDataset used for predicting estimates or class. Supports the following input data-types :
  1. Sparse matrix (csr_matrix, csc_matrix) or dense matrix (ndarray)

  2. SnapML data partition of type DensePartition, SparsePartition or ConstantValueSparsePartition

n_jobsint, default=None

Number of threads used to run inference. By default the value of the class attribute is used.. This parameter is ignored for predict of a single observation.

Returns:
pred: array-like, shape = (n_samples,)

Returns the predicted estimate/class of the sample.

predict_proba(X, n_jobs=None)

Probability estimates

The returned probability estimates for the two classes. Only for binary classification.

Parameters:
XDataset used for predicting probability estimates. Supports the following input data-types :
  1. Sparse matrix (csr_matrix, csc_matrix) or dense matrix (ndarray)

  2. SnapML data partition of type DensePartition, SparsePartition or ConstantValueSparsePartition

n_jobsint, defaultNone

Number of threads used to run inference. By default the value of the class attribute is used.. This parameter is ignored for predict_proba of a single observation.

Returns:
proba: array-like of shape (n_samples, 2) or (n_samples, 1)

Probability of the sample of each of the two classes for local implementation. Probability of the positive class only for the MPI implementation.

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_fit_request(*, X_train: bool | None | str = '$UNCHANGED$', y_train: bool | None | str = '$UNCHANGED$') LogisticRegression

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
X_trainstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for X_train parameter in fit.

y_trainstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for y_train parameter in fit.

Returns:
selfobject

The updated object.

set_params(**params)

Set the parameters of this model.

Valid parameter keys can be listed with get_params().

Returns:
self
set_predict_proba_request(*, n_jobs: bool | None | str = '$UNCHANGED$') LogisticRegression

Request metadata passed to the predict_proba method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict_proba.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
n_jobsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for n_jobs parameter in predict_proba.

Returns:
selfobject

The updated object.

set_predict_request(*, n_jobs: bool | None | str = '$UNCHANGED$') LogisticRegression

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
n_jobsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for n_jobs parameter in predict.

Returns:
selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') LogisticRegression

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.