Preprocessing Pipeline Export

Snap ML now has the capability to export scikit-learn prepreprocessing pipelines to a JSON format. This format is then consuamble by IBM Z Accelerated for NVIDIA Triton Inference Sever™.

The export function handles pipelines for numerical and categorical features separately (see example below).

For numerical features, we currently support the following preprocessing steps and hyper-parameters:

sklearn.preprocessing.Normalizer()

sklearn.preprocessing.KBinsDiscretizer(
    encode="ordinal"
)

sklearn.preprocessing.FunctionTransformer(
    func=np.log1p
)

And for categorical features, we currently support the following preprocessing steps and hyper-parameters:

sklearn.preprocessing.OneHotEncoder(
    categories="auto",
    sparse_output=False,
    handle_unknown="ignore"
)

An example of using the export function is given below:

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from snapml import export_preprocessing_pipeline

num_trsf = Pipeline(
    steps=[
        ('normalizer', Normalizer()),
        ('discretizer', KBinsDiscretizer(encode="ordinal"))
    ]
)
cat_trsf = Pipeline(
    steps=[
        ('onehot', OneHotEncoder(categories='auto', sparse=False)
    )])
preprocessor = ColumnTransformer(
    transformers=[
        ('num', num_trsf, [2,3]),
        ('cat', cat_trsf, [0,1])
    ],
    remainder='passthrough'
)
pipeline = Pipeline(
    steps=[
        ('prep', preprocessor),
        ('classifier', model)
    ]
)
pipeline.fit(X,y)

# export preprocessing pipeline to json format
export_preprocessing_pipeline(
    pipeline['prep'], 'pipeline.json'
)