Packaging ML models using MLflow

Once you have trained a high performing model, you will have to package it so that others can also use your model. Generally, packaging your model is the very first step to deploy any ML model in production. First, we need to understand what it means to package a ML model, and second, why do we need it?

Let me be honest about it, making inference is hard. You cannot feed the input directly to the model, you will first have to process it. This pre-processing can be a simple normalization operation or a complex operation that combines input data with other features stores in different databases. Only once the input is in the right format, it can be fed to the ML model. Furthermore, the prediction (generally tensors) from the model cannot be used directly by end-user. You again will have to apply some post-processing logic to turn these tensors into meaningful information (like class name, probability, time required, etc.). Hence, in MLOps, you never deploy a ML model instead you deploy an end-to-end pipeline.

Yet another thing that makes it even hard, is that ML models fail silently i.e. you won’t get any error even if you have small bug/mistake in you pipeline. But any small change in pre-processing logic will effect your model’s prediction. For example, instead of using imagenet’s mean and std values to normalize an image, if you use (0.5, 0.5, 0.5) as mean and std then you will get very different results. Thus its really important that you always test the complete pipeline and not just the model.

image

To me, packaging a ML model means wrapping the complete pipeline into an easy to use python package/class. The goal is to abstract away all the complexity from the users, and allow them to interact with the pipeline, through some simple APIs, without worrying about the internals.

Before using MLflow, I use to do exactly the same. Train a model, wrap the pipeline into a python package/class, and finally, write some unit-tests to test the package. But there is one major problems with this brute-force approach : inconsistent APIs. As you start training, and packaging more and more pipelines, this problem will become apparent. Things will get even worst once you start using multiple ML frameworks.

In this tutorial, we will discuss “how we can use MLflow to package our ML models?”. MLFlow makes it super easy (its just a function call). Let me show you . . .

Disclaimer: I will we using pytorch in this blog, but you can use any ML/DL framework. MLflow has built-in support for many frameworks, you can find the complete list of supported frameworks here.

we will start by importing all the required packages. . . .

import mlflow
import torch
from torchvision import models

Here, I will use a ResNet18 model (from torchvision) for the sake of this tutorial, but you can literally use any pytorch model (incase, you are following along).

model = models.resnet18(pretrained=False)
model = model.eval()

lets package our pytorch model using mlflow. We just have to pass the model object, and the path (where want to save it) to mlflow.pytorch.save_model() method.

mlflow.pytorch.save_model(model, 'resnet18_model')

To load you model back into memory, we can again use mlflow. Just pass the model path . . .

## loading pytorch flavour
loaded_model = mlflow.pytorch.load_model('resnet18_model')

easy, right? Lets generate a random tensor and pass it through both the models to make sure that they are the same model

x = torch.rand(4, 3, 224, 224)
x.shape
torch.Size([4, 3, 224, 224])
torch.eq(model(x), loaded_model(x)).all()
tensor(True)

perfect, all the values are equal. You might say, this is same as torch.load_model. What is so special about mlflow? Actually, there are many things that mlflow does under the hood. We will discuss everything that MLflow does in detail but the most obvious benefit is the unified API for saving and loading models (irrespective of the framework).

MLflow Models

An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools. Source

Each MLflow Model is a directory containing arbitrary files, together with an MLmodel file in the root of the directory that can define multiple flavors that the model can be viewed in. Flavors are the key concept that makes MLflow Models so powerful : they are a convention that deployment tools can use to understand and load the model. MLflow supports many different ML/DL frameworks, you can find the complete list here. In simple terms, flavor is nothing but a way of abstracting model saving and loading.

Along with the flavor, using which the model was saved, MLflow defines a “standard” flavor that all of its built-in deployment tools support, called “Python function” flavor that describes how to run the model as a Python function. For example, if you save the model using mlflow.pytorch then MLflow will define both pytorch flavor and python function flavor to load the model.

Enough about flavors, its an abstract concept and is very lossly defined. The best way to understand it, is to use MLflow. Lets continue our discussion and see what all files does MLflow generate when packaging a ML model. mlflow.pytorch outputs models as follows:

# Directory written by mlflow.pytorch.save_model(model, 'resnet18_model')
resnet18_model/
├── data
    ├── model.pth
    └── pickle_module_info.txt
├── conda.yaml
├── MLmodel
└── requirements.txt

All of the flavors that a particular model supports are defined in its MLmodel file in YAML format. Here, our MLmodel file describes two flavors:

flavors:
  python_function:
    data: data
    env: conda.yaml
    loader_module: mlflow.pytorch
    pickle_module_name: mlflow.pytorch.pickle_module
    python_version: 3.8.12
  pytorch:
    model_data: data
    pytorch_version: 1.10.1
model_uuid: 2cb018e2588e478a97e626ea187b2c97
utc_time_created: '2022-02-17 17:00:59.055824'

This model can then be used with any tool that supports either the pytorch or python_function model flavor. The ability to deploy you model as a python_function is extremely useful if you plan to use serverless deployment options like AWS Lamda, Azure Functions, Google Cloud functions, etc.

Apart from a flavors field listing the model flavors, the MLmodel YAML format can contain other information like model_uuid and time_created.

Environment recreation

For environment recreation, MLflow automatically logs conda.yaml and requirements.txt files whenever a model is logged. These files can then be used to reinstall dependencies using either conda or pip.

When saving a model, MLflow provides the option to pass in a conda environment as parameter that can contain dependencies used by the model. If no conda environment is provided, a default environment is created based on the flavor of the model. This conda environment is then saved in conda.yaml. You can read more this here.

Input and Output schema

MLflow also allows you to specify extra meta data, such as “What inputs does it expect?” and “What output does it produce?”. This additional metadata about model inputs and outputs can be used by downstream tooling like MLflow model server. And if you plan to use python_function flavor then MLflow will validate each input against input schema. Again, we won’t be discussing this in detail but if you are interested, you can read more about it here.

In summary, when you save your ML model using MLflow, you get the following benefits out-of-the-box:

IMO, this is all the information you need to deploy any ML model in production. Additionally, having a standard convention for storing all this information is a huge advantage. Because it will allow others to use your model easily, and will also make it effortless to develop tools for downstream tasks, regardless of framework that was used to train/build the model.

Time to write some code. I told you that MLflow will automatically define a “python function” flavor when you save you model using any standard flavor. So, lets try to load our resnet18_model using mlflow.pyfunc

## load pyfunc flavour
model_pytorch = mlflow.pyfunc.load_model('resnet18_model')
model_pytorch.predict(x.numpy())
array([[-0.3058473 ,  0.0832468 ,  0.38781884, ..., -1.1486118 ,
        -0.7298596 ,  0.8924906 ],
       [-0.30963922,  0.10418254,  0.39320928, ..., -1.1335343 ,
        -0.74505943,  0.8977703 ],
       [-0.2785825 ,  0.06999737,  0.39136988, ..., -1.1411592 ,
        -0.7524246 ,  0.8632568 ],
       [-0.30011272,  0.08436975,  0.3953184 , ..., -1.1422383 ,
        -0.7507808 ,  0.8822411 ]], dtype=float32)

You will get the exact same output. Go ahead and verify it yourself. When you load any MLflow model using mlflow.pyfunc, you will have to remember two important things. First, you will have to call the predict method to get model predictions. Another thing to note is that “python_function” models only support pandas dataframe and numpy array as input. Hence, you will have to convert your input tensor to numpy array before passing it to the predict method.

The Resnet18 model, that we saved aboved using mlflow.pytorch.save_model(), works with tensors i.e. tensor in and tensor out. Incase of “python function”, its nd-array in and nd-array out. But in reality, you will not get the data as tensor or nd-array. Plus, you will have to transform the raw input before you can feed it to the model. For example, when you load an image, you will have to resize it, then normalize it and finally convert it into a tensor. Sometimes, this transformation logic can be generic and in other situations, models can have very specific transformations. Hence, repeating myself again, its generally a good practice in MLOps to package the complete pipeline (not just the model).

Saving as pyfunc

MLflow allows you to package any python function as model. This is extremely powerful and flexible at the same time, because it will allow us to package the complete pipeline (both the pre-processing and post-processing logic along with the ML model). Let me show you . . .

import torch
import mlflow
from pathlib import Path

before packaging the actual model, first lets record our environment

from sys import version_info

PYTHON_VERSION = "{major}.{minor}.{micro}".format(major=version_info.major,
                                                  minor=version_info.minor,
                                                  micro=version_info.micro)


# Create a Conda environment for the new MLflow Model that contains all necessary dependencies.
import cloudpickle

conda_env = {
    'channels': ['defaults'],
    'dependencies': [
      'python={}'.format(PYTHON_VERSION),
      'pip',
      {
        'pip': [
          'mlflow', 
          'torch=={}'.format(torch.__version__),
          'cloudpickle=={}'.format(cloudpickle.__version__),
        ],
      },
    ],
    'name': 'mlflow_env'
}

this might look like a lot of code but actually its just a python dictionary. You can always use the same dictionary. Plus, MLflow provides a helper function to get the default dictionary for each flavor, so that you don’t have search for it. For example, you can access the default conda environment for pytorch as follows:

mlflow.pytorch.get_default_conda_env()
{'name': 'mlflow-env',
 'channels': ['conda-forge'],
 'dependencies': ['python=3.8.12',
  'pip',
  {'pip': ['mlflow', 'torch==1.10.1', 'cloudpickle==2.0.0']}]}

Now that we have defined our conda environment, lets package the model . . .

First, you must define a Python class which inherits from mlflow.pyfunc.PythonModel, then defining predict() and, optionally define load_context(). load_context() method will be called as soon as you load the this model, and to make predictions you will have to call predict() method because its a python flavor MLflow model. For simplicity, you can think of load_context() as __init__() method and predict() as forward() method in a pytorch model.

# Define the model class

class PytorchWrapper(mlflow.pyfunc.PythonModel):    
    def load_context(self, context):
        import json
        from torchvision import models, transforms
        
        
        self.transforms = transforms.Compose([
                transforms.Resize(224),
                transforms.ToTensor(),
                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
                ])
        
        self.class_names = json.load(open(context.artifacts['imagenet_class_index']))
        
        self.model = models.resnet18(pretrained=True)
        return self.model.eval()
    
    def predict(self, context, image):
        import torch
        from torch.nn import functional as F
    
        with torch.no_grad():
            # pre-processing
            image = self.transforms(image).float()
            image = image.unsqueeze(0)
            
            # prediction
            results = self.model(image)

            # post-processing
            probs = F.softmax(results, dim=-1)
            idx = torch.argmax(probs.data).item()
            
            class_name = self.class_names[str(idx)][1] 
            conf = float(probs[0][idx])
        
        return class_name, conf

Some models also require artifact files like model weights, vocab, config.yaml, etc. To package all the required artifacts along with the model, we will have to create an artifacts dictionary (containing <name, artifact_uri> entries), and pass it while saving the model. Our mlflow.pyfunc.PythonModel can reference these entries as the artifacts property of the context parameter in predict() and load_context() method.

Here, we will need imagenet_class_index.json file to map class names and index. You can download the imagenet_class_index.json file from here and save it in the present working directory.

artifacts = {
    "imagenet_class_index" : 'imagenet_class_index.json', # file path in your system
}

we have all the pieces (conda environment, model class, and artifacts dictionary) now. Finally, time to save our model.

# Save the MLflow Model
mlflow.pyfunc.save_model(
        path="resnet18_mlflow_pyfunc", 
        python_model=PytorchWrapper(), 
        artifacts=artifacts,
        conda_env=conda_env
)

Done! we have successfully package a custom ML model along with pre-processing and post-processing logic, and saved it using MLflow. I know this is not the best explanation possible and might look confusing to some. I would recommend reading the above code again, look at a few more examples, and finally, try saving your own custom model. If you are using MLflow for the first time, it will take some time getting use to MLflow.

Lets load our model back and lets try to make some predictions . . .

# Load the model in `python_function` format
loaded_model = mlflow.pyfunc.load_model('resnet18_mlflow_pyfunc')
import requests
from io import BytesIO
from PIL import Image
image_url = 'https://th.bing.com/th/id/OIP.NqzP6hNEcRDD-eHfSRvUFwHaJB?pid=ImgDet&rs=1'
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
results = loaded_model.predict(image)
results
('Bernese_mountain_dog', 0.7799622416496277)

This was all about packaging ML models using MLflow. Just create a wrapper class, define artifacts dictionary, and another dictionary for conda environment. Then take your model and use it like any other python function.

Note: Make user you install all the required depencies because mlflow.pyfunc does not check for dependencies.

I think the most important take away from this blog should be this

Stop thinking in terms of Models, and start thinking in terms of Pipelines. Always, package and test the complete pipeline.

References