Linear Regression
- class snapml.LinearRegression(max_iter=1000, regularizer=1.0, device_ids=[], verbose=False, use_gpu=False, dual=True, n_jobs=1, penalty='l2', tol=0.001, generate_training_history=None, privacy=False, eta=0.3, batch_size=100, privacy_epsilon=10, grad_clip=1, fit_intercept=False, intercept_scaling=1.0, normalize=False, kernel='linear', gamma=1.0, n_components=100, random_state=None)
Linear Regression
This class implements regularized linear regression using the IBM Snap ML solver. It supports both local and distributed(MPI) methods of the snap ML solver. It handles both dense and sparse matrix inputs. Use csr, csc, ndarray, deviceNDArray or SnapML data partition format for training and csr, ndarray or SnapML data partition format for prediction. DeviceNDArray input data format is currently not supported for training with MPI implementation.
- Parameters
- max_iterint, default=1000
Maximum number of iterations used by the solver to converge.
- regularizerfloat, default=1.0
Regularization strength. It must be a positive float. Larger regularization values imply stronger regularization.
- use_gpubool, default=False
Flag for indicating the hardware platform used for training. If True, the training is performed using the GPU. If False, the training is performed using the CPU.
- device_idsarray-like of int, default=[]
If use_gpu is True, it indicates the IDs of the GPUs used for training. For single GPU training, set device_ids to the GPU ID to be used for training, e.g., [0]. For multi-GPU training, set device_ids to a list of GPU IDs to be used for training, e.g., [0, 1].
- dualbool, default=True
Dual or primal formulation. Recommendation: if n_samples > n_features use dual=True.
- verbosebool, default=False
If True, it prints the training cost, one per iteration. Warning: this will increase the training time. For performance evaluation, use verbose=False.
- n_jobsint, default=1
The number of threads used for running the training. The value of this parameter should be a multiple of 32 if the training is performed on GPU (use_gpu=True).
- penalty{‘l1’, ‘l2’}, default=’l2’
The regularization / penalty type. Possible values are “l2” for L2 regularization (RidgeRegression) or “l1” for L1 regularization (LassoRegression). L1 regularization is possible only for the primal optimization problem (dual=False).
- tolfloat, default=0.001
The tolerance parameter. Training will finish when maximum change in model coefficients is less than tol.
- generate_training_history{‘summary’, ‘full’, None}, default=None
Determines the level of summary statistics that are generated during training. By default no information is generated (None), but this parameter can be set to “summary”, to obtain summary statistics at the end of training, or “full” to obtain a complete set of statistics for the entire training procedure. Note, enabling either option will result in slower training. generate_training_history is not supported for DeviceNDArray input format.
- privacybool, default=False
Train the model using a differentially private algorithm. Currently not supported for MPI implementation.
- etafloat, default=0.3
Learning rate for the differentially private training algorithm. Currently not supported for MPI implementation.
- batch_sizeint, default=100
Mini-batch size for the differentially private training algorithm. Currently not supported for MPI implementation.
- privacy_epsilonfloat, default=10.0
Target privacy gaurantee. Learned model will be (privacy_epsilon, 0.01)-private. Currently not supported for MPI implementation.
- grad_clipfloat, default=1.0
Gradient clipping parameter for the differentially private training algorithm. Currently not supported for MPI implementation.
- fit_interceptbool, default=False
Add bias term – note, may affect speed of convergence, especially for sparse datasets.
- intercept_scalingfloat, default=1.0
Scaling of bias term. The inclusion of a bias term is implemented by appending an additional feature to the dataset. This feature has a constant value, that can be set using this parameter.
- normalizebool, default=False
Normalize rows of dataset (recommended for fast convergence).
- kernel{‘rbf’, ‘linear’}, default=’linear’
Approximate feature map of a specified kernel function.
- gammafloat, default=1.0
Parameter of RBF kernel: exp(-gamma * x^2)
- n_componentsint, default=100
Dimensionality of the feature space when approximating a kernel function.
- random_stateint, or None, default=None
If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.
- Attributes
- coef_array-like, shape (n_features,)
Coefficients of the features in the trained model.
- intercept_float
Independent term in the decision function. Set to 0.0 if fit_intercept == False.
- support_array-like
Indices of the features that lie in the support ond contribute to the decision. (only available for L1). Currently not supported for MPI implementation.
- model_sparsity_float
fraction of non-zeros in the model parameters. (only available for L1). Currently not supported for MPI implementation.
- training_history_dict
Training history statistics.
- fit(X_train, y_train=None)
Fit the model according to the given train dataset.
- Parameters
- X_trainTrain dataset. Supports the following input data-types :
Sparse matrix (csr_matrix, csc_matrix) or dense matrix (ndarray)
DeviceNDArray. Not supported for MPI execution.
SnapML data partition of type DensePartition, SparsePartition or ConstantValueSparsePartition
- y_trainThe target corresponding to X_train.
If X_train is sparse matrix or dense matrix, y_train should be array-like of shape = (n_samples,) In case of deviceNDArray, y_train should be array-like of shape = (n_samples, 1) If X_train is SnapML data partition type, then y_train is not required (i.e. None).
- Returns
- selfobject
- get_params(deep=True)
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- predict(X, n_jobs=None)
Class predictions
The returned class estimates.
- Parameters
- XDataset used for predicting estimates or class. Supports the following input data-types :
Sparse matrix (csr_matrix, csc_matrix) or dense matrix (ndarray)
SnapML data partition of type DensePartition, SparsePartition or ConstantValueSparsePartition
- n_jobsint, default=None
Number of threads used to run inference. By default the value of the class attribute is used.. This parameter is ignored for predict of a single observation.
- Returns
- pred: array-like, shape = (n_samples,)
Returns the predicted estimate/class of the sample.
- score(X, y, sample_weight=None)
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns
- scorefloat
\(R^2\) of
self.predict(X)
wrt. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score()
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)
Set the parameters of this model.
Valid parameter keys can be listed with
get_params()
.- Returns
- self