Model Import

Snap ML supports importing tree ensembles models that were trained with other frameworks (e.g., scikit-learn, XGBoost, LightGBM) so one can leverage Snap ML’s accelerated inference engine.

Model Import Flow

One can import a model either by:

Details regarding which Snap ML classes can import which types of pre-trained model, and which model formats are supported are given in the following table:

Pre-trained Model

Supported Formats

Target Snap ML Class

xgboost.XGBClassifier

PMML, ONNX, JSON

snapml.BoostingMachineClassifier

xgboost.XGBRegressor

PMML, ONNX, JSON

snapml.BoostingMachineRegressor

lightgbm.LGBMClassifier

PMML, ONNX, Text

snapml.BoostingMachineClassifier

lightgbm.LGBMRegressor

PMML, ONNX, Text

snapml.BoostingMachineRegressor

snapml.BoostingMachineClassifier

PMML

snapml.BoostingMachineClassifier

snapml.BoostingMachineRegressor

PMML

snapml.BoostingMachineRegressor

sklearn.ensemble.RandomForestClassifier

PMML, ONNX

snapml.RandomForestClassifier

sklearn.ensemble.RandomForestRegressor

PMML, ONNX

snapml.RandomForestRegressor

sklearn.ensemble.ExtraTreesClassifier

PMML, ONNX

snapml.RandomForestClassifier

sklearn.ensemble.ExtraTreesRegressor

PMML, ONNX

snapml.RandomForestRegressor

snapml.RandomForestClassifier

PMML

snapml.RandomForestClassifier

snapml.RandomForestRegressor

PMML

snapml.RandomForestRegressor

Note that the standard way to save and load models trained with Snap ML is using pickle/joblib. However, since the resulting binary models depend on the endianness of the platform, it is currently not possible to save a model on an Intel™ (x86_64) platform and then load it on an IBM Z™ (s390x) platform (and vice-versa). To overcome this issue, we also provide support for exporting and importing tree ensembles trained with Snap ML via the platform-independent PMML format. For details regarding how to export Snap ML tree ensembles as PMML see the documentation for the corresponding member functions (e.g., snapml.RandomForestClassifier.export_model()).

snapml.import_model(input_file, input_type='pmml', tree_format='auto', X=None, remap_feature_indices=False, verbose=False)

Import a pre-trained tree ensemble model and optimize the trees for fast inference.

This function will detect the ensemble type (e.g. boosting or forest) and task type (classification or regression) from the model file and return the correct Snap ML class.

Currently only models stored as PMML are supported.

Depending on how the tree_format argument is set, this function will return a different optimized model format. This format determines which inference engine is used for subsequent calls to ‘predict’ or ‘predict_proba’.

If tree_format is set to ‘zdnn_tensors’, the model will be optimized for execution on the IBM z16 AI accelerator, using a matrix-based inference algorithm leveraging the zDNN library.

By default tree_format is set to ‘auto’. A check is performed and if the IBM z16 AI accelerator is available the model will be optimized according to ‘zdnn_tensors’, otherwise it will be optimized according to ‘compress_trees’. The selected

optimized tree format can be read by parameter self.optimized_tree_format_.

Information regarding the PMML input/output schema is stored in the schema_ attribute of the model that is returned.

Note: If the input file contains features that are not supported by the import function, then an exception is thrown indicating the feature and the line number within the input file containing the feature.

Parameters:
input_filestr

Input filename

input_type{‘pmml’}

Input file type

tree_format{‘auto’, ‘compress_trees’, ‘zdnn_tensors’}

Tree format

Xdense matrix (ndarray)

Optional input dataset used for compressing trees

remap_feature_indicesbool

If enabled, predict and predict_proba functions will expect numpy arrays containing only the (ordered) features that are listed in the model file. This can often be a subset of the full set of feature that were provided during training. These features are stored in the used_features_ attribute in the imported model.

verbosebool

Print off information useful for debugging (e.g., whether the z16 AI accelerator was detected; how n_jobs gets set).

Returns:
selfSnap ML object ready for scoring