# Online Inference

This tutorial shows how to use trained PyTorch, TensorFlow, and ONNX (format) models, written in Python, directly in HPC workloads written in Fortran, C, C++ and Python.

The example simulation here is written in Python for brevity, however, the inference API in SmartRedis is the same (besides extra parameters for compiled langauges) across all clients. 


## Installing the ML backends

In order to use the `Orchestrator` database as an inference engine, the Machine Learning (ML) backends need to be built and supplied to the database at runtime. 

To check which backends are built, a simple helper function is available in SmartSim as shown below.

In [1]:
## Installing the ML backends
from smartsim._core.utils.helpers import installed_redisai_backends
print(installed_redisai_backends())

{'torch'}


As you can see, only the Torch backend is built. In order to use the TensorFlow and ONNX backends as well, they need to be built.

The `smart` command line interface can be used to build the backends using the `smart build` command. The output of `smart build --help` is shown below.

In [2]:
!smart build --help

usage: smart build [-h] [-v] [--device {cpu,gpu}] [--only_python_packages]
                   [--no_pt] [--no_tf] [--onnx] [--torch_dir TORCH_DIR]
                   [--libtensorflow_dir LIBTENSORFLOW_DIR] [--keydb]

Build SmartSim dependencies (Redis, RedisAI, ML runtimes)

options:
  -h, --help            show this help message and exit
  -v                    Enable verbose build process
  --device {cpu,gpu}    Device to build ML runtimes for
  --only_python_packages
                        Only evaluate the python packages (i.e. skip building
                        backends)
  --no_pt               Do not build PyTorch backend
  --no_tf               Do not build TensorFlow backend
  --onnx                Build ONNX backend (off by default)
  --torch_dir TORCH_DIR
                        Path to custom <path>/torch/share/cmake/Torch/
                        directory (ONLY USE IF NEEDED)
  --libtensorflow_dir LIBTENSORFLOW_DIR
                        Path to custom libtensorflow d

We use `smart clean` first to remove the previous build, and then call `smart build` to build the new backend set. For larger teams, CrayLabs will help setup your system so that the backends do not have to be built by each user.

By default, the PyTorch and TensorFlow backends are built. To build all three backends for use on CPU, we issue the following command.

In [3]:
!smart clean && smart build --device cpu --onnx

[34m[SmartSim][0m [1;30mINFO[0m Successfully removed existing RedisAI installation
[34m[SmartSim][0m [1;30mINFO[0m Successfully removed ML runtimes
[34m[SmartSim][0m [1;30mINFO[0m Running SmartSim build process...
[34m[SmartSim][0m [1;30mINFO[0m Checking requested versions...
[34m[SmartSim][0m [1;30mINFO[0m Checking for build tools...
[34m[SmartSim][0m [1;30mINFO[0m Redis build complete!

ML Backends Requested
╒════════════╤════════╤══════╕
│ PyTorch    │ 2.0.1  │ [32mTrue[0m │
│ TensorFlow │ 2.13.1 │ [32mTrue[0m │
│ ONNX       │ 1.14.1 │ [32mTrue[0m │
╘════════════╧════════╧══════╛

Building for GPU support: [31mFalse[0m

[34m[SmartSim][0m [1;30mINFO[0m Building RedisAI version 1.2.7 from https://github.com/RedisAI/RedisAI.git/
[34m[SmartSim][0m [1;30mINFO[0m ML Backends and RedisAI build complete!
[34m[SmartSim][0m [1;30mINFO[0m Tensorflow, Onnxruntime, Torch backend(s) built
[34m[SmartSim][0m [1;30mINFO[0m SmartSim build complete!


## Starting the Database for Inference

SmartSim performs online inference by using the SmartRedis clients to call into the
Machine Learning (ML) runtimes linked into the Orchestrator database. The Orchestrator
is the name in SmartSim for a Redis or KeyDB database with a RedisAI module built
into it with the ML runtimes.

Therefore, to perform inference, you must first create an Orchestrator database and
launch it. There are two methods to couple the database to your application in
order to add inference capability to your application.
 - standard (not colocated)
 - colocated
 
`standard` mode launches an optionally clustered (across many compute hosts) database instance
that can be treated as a single storage device for many clients (possibly the many ranks
of an MPI program) where there is a single address space for keys across all hosts.

`colocated` mode launches a orchestrator instance on each compute host used by a,
possibly distributed, application. each instance contains their own address space
for keys. In SmartSim, `Model` instances can be launched with a colocated orchetrator
through `Model.colocate_db_tcp` or `Model.colocate_db_udp`. Colocated `Model`s are used for
highly scalable inference where global aggregations aren't necessary for inference.

The code below launches the `Orchestrator` database using the `standard` deployment
method.

In [4]:
# some helper libraries for the tutorial
import io
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import logging
import numpy as np

# import smartsim and smartredis
from smartredis import Client
from smartsim import Experiment

In [5]:
exp = Experiment("Inference-Tutorial", launcher="local")

In [6]:
db = exp.create_database(port=6780, interface="lo")
exp.start(db)

## Using PyTorch

The Orchestrator supports both [PyTorch](https://pytorch.org/)
models and [TorchScript](https://pytorch.org/docs/stable/jit.html) functions and scripts
in PyTorch.

Below, the code is shown to create, jit-trace (prepare for inference), set,
and call a PyTorch Convolutional Neural Network (CNN) with SmartSim and SmartRedis

In [7]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x, 2)
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output


To set a PyTorch model, we create a function to "jit-trace" the model
and save it to a buffer in memory.

If you aren't familiar with the concept of tracing, take a look at the
Torch documentation for [trace](https://pytorch.org/docs/stable/generated/torch.jit.trace.html#torch.jit.trace)


In [8]:
# Initialize an instance of our CNN model
n = Net()
n.eval()

# prepare a sample input to trace on (random noise is fine)
example_forward_input = torch.rand(1, 1, 28, 28)

def create_torch_model(torch_module, example_forward_input):

    # perform the trace of the nn.Module.forward() method
    module = torch.jit.trace(torch_module, example_forward_input)

    # save the traced module to a buffer
    model_buffer = io.BytesIO()
    torch.jit.save(module, model_buffer)
    return model_buffer.getvalue()

traced_cnn = create_torch_model(n, example_forward_input)

Lastly, we use the SmartRedis Python client to

1. Connect to the database
2. Put a batch of 20 tensors into the database  (``put_tensor``)
3. Set the Torch model in the database (``set_model``)
4. Run the model on the batch of tensors (``run_model``)
5. Retrieve the result (``get_tensor``)


In [9]:
client = Client(address=db.get_address()[0], cluster=False)

client.put_tensor("input", torch.rand(20, 1, 28, 28).numpy())

# put the PyTorch CNN in the database in GPU memory
client.set_model("cnn", traced_cnn, "TORCH", device="CPU")

# execute the model, supports a variable number of inputs and outputs
client.run_model("cnn", inputs=["input"], outputs=["output"])

# get the output
output = client.get_tensor("output")
print(f"Prediction: {output}")

Prediction: [[-2.1860428 -2.3318565 -2.2773128 -2.2742267 -2.2679536 -2.304159
  -2.423439  -2.3406057 -2.2474668 -2.3950338]
 [-2.1803837 -2.3286302 -2.2805855 -2.2874444 -2.261593  -2.3145547
  -2.4357762 -2.3169715 -2.2618299 -2.3798223]
 [-2.1833746 -2.3249795 -2.28497   -2.2851245 -2.2555952 -2.308204
  -2.4274755 -2.3441646 -2.2553194 -2.3779805]
 [-2.1843016 -2.3395848 -2.2619352 -2.294549  -2.2571433 -2.312943
  -2.4161577 -2.338785  -2.2538524 -2.3881512]
 [-2.1936755 -2.3315516 -2.2739122 -2.2832148 -2.2666094 -2.3038912
  -2.4211216 -2.3300066 -2.2564852 -2.3846986]
 [-2.1709712 -2.3271346 -2.280365  -2.286064  -2.2617233 -2.3227994
  -2.4253702 -2.3313646 -2.2593162 -2.383301 ]
 [-2.1948013 -2.3318067 -2.2713811 -2.2844    -2.2526758 -2.3178148
  -2.4255004 -2.3233378 -2.2388031 -2.4088087]
 [-2.17515   -2.3240736 -2.2818787 -2.2857373 -2.259629  -2.3184
  -2.425821  -2.3519678 -2.2413275 -2.385761 ]
 [-2.187554  -2.3335872 -2.2767708 -2.2818003 -2.2654893 -2.3097534
  -2.4

As we gave the CNN random noise, the predictions reflect that.

If running on GPU, be sure to change the argument in the ``set_model`` call
above to ``device="GPU"``.

## Using TorchScript

In addition to PyTorch models, TorchScript scripts and functions can be set in the
Orchestrator database and called from any of the SmartRedis languages. Functions
can be set in the database in Python prior to application launch and then used
directly in Fortran, C, and C++ simulations.

The example below uses the TorchScript Singular Value Decomposition (SVD) function.
The function set in side the database and then called with a random input
tensor.


In [10]:
def calc_svd(input_tensor):
    # svd function from TorchScript API
    return input_tensor.svd()

In [11]:
# connect a client to the database
client = Client(address=db.get_address()[0], cluster=False)

# test the SVD function
tensor = np.random.randint(0, 100, size=(5, 3, 2)).astype(np.float32)
client.put_tensor("input", tensor)
client.set_function("svd", calc_svd)
client.run_script("svd", "calc_svd", ["input"], ["U", "S", "V"])
U = client.get_tensor("U")
S = client.get_tensor("S")
V = client.get_tensor("V")
print(f"U: {U}\n\n, S: {S}\n\n, V: {V}\n")

U: [[[-0.31189808  0.86989427]
  [-0.48122275 -0.49140105]
  [-0.81923395 -0.0425336 ]]

 [[-0.5889101  -0.29554686]
  [-0.43949458 -0.66398275]
  [-0.6782547   0.68686163]]

 [[-0.61623317  0.05853765]
  [-0.6667615  -0.5695148 ]
  [-0.4191489   0.81989413]]

 [[-0.5424681   0.8400398 ]
  [-0.31990844 -0.2152339 ]
  [-0.77678    -0.49800384]]

 [[-0.43667376  0.8088193 ]
  [-0.70812154 -0.57906115]
  [-0.5548693   0.10246649]]]

, S: [[137.10924   25.710997]
 [131.49983   37.79937 ]
 [178.72423   24.792084]
 [125.13014   49.733784]
 [137.48834   53.57199 ]]

, V: [[[-0.8333395   0.5527615 ]
  [-0.5527615  -0.8333395 ]]

 [[-0.5085228  -0.8610485 ]
  [-0.8610485   0.5085228 ]]

 [[-0.8650402   0.5017025 ]
  [-0.5017025  -0.8650402 ]]

 [[-0.56953645  0.8219661 ]
  [-0.8219661  -0.56953645]]

 [[-0.6115895   0.79117525]
  [-0.79117525 -0.6115895 ]]]



In [12]:
## TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
tf.get_logger().setLevel(logging.ERROR)

# create a simple Fully connected network in Keras
model = keras.Sequential(
    layers=[
        keras.layers.InputLayer(input_shape=(28, 28), name="input"),
        keras.layers.Flatten(input_shape=(28, 28), name="flatten"),
        keras.layers.Dense(128, activation="relu", name="dense"),
        keras.layers.Dense(10, activation="softmax", name="output"),
    ],
    name="FCN",
)

# Compile model with optimizer
model.compile(optimizer="adam",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

### Setting TensorFlow and Keras Models

After a model is created (trained or not), the graph of the model is
frozen and saved to file so the client method `client.set_model_from_file`
can load it into the database.

SmartSim includes a utility to freeze the graph of a TensorFlow or Keras model in
`smartsim.ml.tf`. To use TensorFlow or Keras in SmartSim, specify
`TF` as the argument for *backend* in the call to `client.set_model` or
`client.set_model_from_file`.

Note that TensorFlow and Keras, unlike the other ML libraries supported by
SmartSim, requires an `input` and `output` argument in the call to
`set_model`. These arguments correspond to the layer names of the
created model. The `smartsim.ml.tf.freeze_model` utility
returns these values for convenience as shown below.

In [13]:
from smartsim.ml.tf import freeze_model

# SmartSim utility for Freezing the model and saving it to a file.
model_path, inputs, outputs = freeze_model(model, os.getcwd(), "fcn.pb")

# use the same client we used for PyTorch to set the TensorFlow model
# this time the method for setting a model from a saved file is shown. 
# TensorFlow backed requires named inputs and outputs on graph
# this differs from PyTorch and ONNX.
client.set_model_from_file(
    "keras_fcn", model_path, "TF", device="CPU", inputs=inputs, outputs=outputs
)

# put random random input tensor into the database
input_data = np.random.rand(1, 28, 28).astype(np.float32)
client.put_tensor("input", input_data)

# run the Fully Connected Network model on the tensor we just put
# in and store the result of the inference at the "output" key
client.run_model("keras_fcn", "input", "output")

# get the result of the inference
pred = client.get_tensor("output")
print(pred)

[[0.05032112 0.06484107 0.03512685 0.14747524 0.14440396 0.02395445
  0.03395916 0.06222691 0.26738793 0.1703033 ]]


## Using ONNX

ONNX is a standard format for representing models. A number of different Machine Learning
Libraries are supported by ONNX and can be readily used with SmartSim.

Some popular ones are:


- [Scikit-learn](https://scikit-learn.org)
- [XGBoost](https://xgboost.readthedocs.io)
- [CatBoost](https://catboost.ai)
- [LightGBM](https://lightgbm.readthedocs.io/en/latest/)
- [libsvm](https://www.csie.ntu.edu.tw/~cjlin/libsvm/)


As well as some that are not listed. There are also many tools to help convert
models to ONNX.

- [onnxmltools](https://github.com/onnx/onnxmltools)
- [skl2onnx](https://github.com/onnx/sklearn-onnx/)
- [tensorflow-onnx](https://github.com/onnx/tensorflow-onnx/)


And PyTorch has its own converter.

Below are some examples of a few models in [Scikit-learn](https://scikit-learn.org)
that are converted into ONNX format for use with SmartSim. To use ONNX in SmartSim, specify
`ONNX` as the argument for *backend* in the call to `client.set_model` or
`client.set_model_from_file`

### Scikit-Learn K-means Cluster


K-means clustering is an unsupervised ML algorithm. It is used to categorize data points
into functional groups ("clusters"). Scikit Learn has a built in implementation of K-means clustering
and it is easily converted to ONNX for use with SmartSim through 
[skl2onnx.to_onnx](http://onnx.ai/sklearn-onnx/auto_examples/plot_convert_syntax.html)

Since the KMeans model returns two outputs, we provide the `client.run_model` call
with two `output` key names.


In [14]:
from skl2onnx import to_onnx
from sklearn.cluster import KMeans

In [15]:

X = np.arange(20, dtype=np.float32).reshape(10, 2)
tr = KMeans(n_clusters=2)
tr.fit(X)

# save the trained k-means model in memory with skl2onnx
kmeans = to_onnx(tr, X, target_opset=11)
model = kmeans.SerializeToString()

# random input data
sample = np.arange(20, dtype=np.float32).reshape(10, 2)

# use the same client from TensorFlow and Pytorch examples.
client.put_tensor("input", sample)
client.set_model("kmeans", model, "ONNX", device="CPU")
client.run_model("kmeans", inputs="input", outputs=["labels", "transform"])

print(client.get_tensor("labels"))

[1 1 1 1 1 0 0 0 0 0]


### Scikit-Learn Random Forest

The Random Forest example uses the Iris dataset from Scikit Learn to train a
RandomForestRegressor. As with the other examples, the skl2onnx function
`skl2onnx.to_onnx` is used to convert the model to ONNX format.


In [16]:
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

In [17]:
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, _ = train_test_split(X, y, random_state=13)
clr = RandomForestRegressor(n_jobs=1, n_estimators=100)
clr.fit(X_train, y_train)

rf_model = to_onnx(clr, X_test.astype(np.float32), target_opset=11)

sample = np.array([[6.4, 2.8, 5.6, 2.2]]).astype(np.float32)
model = rf_model.SerializeToString()

client.put_tensor("input", sample)
client.set_model("rf_regressor", model, "ONNX", device="CPU")
client.run_model("rf_regressor", inputs="input", outputs="output")
print(client.get_tensor("output"))

[[1.9999987]]


In [18]:
exp.stop(db)

In [19]:
exp.summary(style="html")

Unnamed: 0,Name,Entity-Type,JobID,RunID,Time,Status,Returncode
0,orchestrator_0,DBNode,31857,0,32.7161,Cancelled,0


# Colocated Deployment

A colocated Orchestrator is a special type of Orchestrator that is deployed
on the same compute hosts an a Model instance defined by the user. In this
deployment, the database is not connected together in a cluster and each shard
of the database is addressed individually by the processes running on that compute
host. This is particularly important for GPU-intensive workloads which require
frequent communication with the database.

<img src="https://www.craylabs.org/docs/_images/co-located-orc-diagram.png" alt="lattice" width="600"/>


In [20]:
# create colocated model
colo_settings = exp.create_run_settings(
    exe="python",
    exe_args="./colo-db-torch-example.py"
)

colo_model = exp.create_model("colocated_model", colo_settings)
colo_model.colocate_db_tcp(
    port=6780,
    db_cpus=1,
    debug=False,
    ifname="lo"
)

In [21]:
exp.start(colo_model, summary=True)

21:18:06 C02G13RYMD6N SmartSim[30945] INFO 

=== Launch Summary ===
Experiment: Inference-Tutorial
Experiment Path: /Users/smartsim/smartsim/tutorials/ml_inference/Inference-Tutorial
Launcher: local
Models: 1
Database Status: inactive

=== Models ===
colocated_model
Executable: /Users/smartsim/venv/bin/python
Executable Arguments: ./colo-db-torch-example.py
Co-located Database: True



21:18:09 C02G13RYMD6N SmartSim[30945] INFO colocated_model(31865): Completed


In [22]:
exp.summary(style="html")

Unnamed: 0,Name,Entity-Type,JobID,RunID,Time,Status,Returncode
0,orchestrator_0,DBNode,31857,0,32.7161,Cancelled,0
1,colocated_model,Model,31865,0,3.5862,Completed,0
