Model Import

Snap ML supports importing tree ensembles models that were trained with other frameworks (e.g., scikit-learn, XGBoost, LightGBM) so one can leverage Snap ML’s accelerated inference engine.

One can import a model either by:

instantiating the corresponding Snap ML class (e.g., snapml.RandomForestClassifier) and then call the import_model() member function (e.g., snapml.RandomForestClassifier.import_model())
calling the generic snapml.import_model() function documented below, which will detect the type of model from the model file and return the corresponding Snap ML class.

Details regarding which Snap ML classes can import which types of pre-trained model, and which model formats are supported are given in the following table:

Pre-trained Model	Supported Formats	Target Snap ML Class
`xgboost.XGBClassifier`	PMML, ONNX, JSON	`snapml.BoostingMachineClassifier`
`xgboost.XGBRegressor`	PMML, ONNX, JSON	`snapml.BoostingMachineRegressor`
`lightgbm.LGBMClassifier`	PMML, ONNX, Text	`snapml.BoostingMachineClassifier`
`lightgbm.LGBMRegressor`	PMML, ONNX, Text	`snapml.BoostingMachineRegressor`
`snapml.BoostingMachineClassifier`	PMML	`snapml.BoostingMachineClassifier`
`snapml.BoostingMachineRegressor`	PMML	`snapml.BoostingMachineRegressor`
`sklearn.ensemble.RandomForestClassifier`	PMML, ONNX	`snapml.RandomForestClassifier`
`sklearn.ensemble.RandomForestRegressor`	PMML, ONNX	`snapml.RandomForestRegressor`
`sklearn.ensemble.ExtraTreesClassifier`	PMML, ONNX	`snapml.RandomForestClassifier`
`sklearn.ensemble.ExtraTreesRegressor`	PMML, ONNX	`snapml.RandomForestRegressor`
`snapml.RandomForestClassifier`	PMML	`snapml.RandomForestClassifier`
`snapml.RandomForestRegressor`	PMML	`snapml.RandomForestRegressor`

Note that the standard way to save and load models trained with Snap ML is using pickle/joblib. However, since the resulting binary models depend on the endianness of the platform, it is currently not possible to save a model on an Intel™ (x86_64) platform and then load it on an IBM Z™ (s390x) platform (and vice-versa). To overcome this issue, we also provide support for exporting and importing tree ensembles trained with Snap ML via the platform-independent PMML format. For details regarding how to export Snap ML tree ensembles as PMML see the documentation for the corresponding member functions (e.g., snapml.RandomForestClassifier.export_model()).

snapml.import_model(input_file, input_type='pmml', tree_format='auto', X=None, remap_feature_indices=False, verbose=False)

Import a pre-trained tree ensemble model and optimize the trees for fast inference.

This function will detect the ensemble type (e.g. boosting or forest) and task type (classification or regression) from the model file and return the correct Snap ML class.

Currently only models stored as PMML are supported.

Depending on how the tree_format argument is set, this function will return a different optimized model format. This format determines which inference engine is used for subsequent calls to ‘predict’ or ‘predict_proba’.

If tree_format is set to ‘zdnn_tensors’, the model will be optimized for execution on the IBM z16 AI accelerator, using a matrix-based inference algorithm leveraging the zDNN library.

By default tree_format is set to ‘auto’. A check is performed and if the IBM z16 AI accelerator is available the model will be optimized according to ‘zdnn_tensors’, otherwise it will be optimized according to ‘compress_trees’. The selected

optimized tree format can be read by parameter self.optimized_tree_format_.

Information regarding the PMML input/output schema is stored in the schema_ attribute of the model that is returned.

Note: If the input file contains features that are not supported by the import function, then an exception is thrown indicating the feature and the line number within the input file containing the feature.

Parameters:

input_filestr: Input filename
input_type{‘pmml’}: Input file type
tree_format{‘auto’, ‘compress_trees’, ‘zdnn_tensors’}: Tree format
Xdense matrix (ndarray): Optional input dataset used for compressing trees
remap_feature_indicesbool: If enabled, predict and predict_proba functions will expect numpy arrays containing only the (ordered) features that are listed in the model file. This can often be a subset of the full set of feature that were provided during training. These features are stored in the used_features_ attribute in the imported model.
verbosebool: Print off information useful for debugging (e.g., whether the z16 AI accelerator was detected; how n_jobs gets set).

Returns:

selfSnap ML object ready for scoring