gRPC Client


source

TorchServeClientGRPC

 TorchServeClientGRPC (base_url=None, management_port=7071,
                       inference_port=7070)

Initialize self. See help(type(self)) for accurate signature.

To create a gRPC client, simply create a TorchServeClientGRPC object

# Initialize the gRPC TorchServeClient object
ts_client = TorchServeClientGRPC()
ts_client
TorchServeClientGRPC(base_url=localhost, management_port=7071, inference_port=7070)

To customize base URL and default ports, pass them as arguments during initialization

# Initialize the gRPC TorchServeClient object
ts_client = TorchServeClientGRPC(base_url='http://your-torchserve-server.com', 
                             management_port=7071, inference_port=7070)
ts_client
TorchServeClientGRPC(base_url=your-torchserve-server.com, management_port=7071, inference_port=7070)

Management APIs

Here is the list of all the supported gRPC management endpoints:

  • describe_model: Provide detailed information about the default version of a model

    Arguments:

    • model_name (str, required): Name of the model to describe

    • model_version (str, optional): Version of the model to describe

    • customized (bool, optional): Customized metadata

    Usage:

    response = ts_client.management.describe_model(model_name="mnist")
    response.msg
  • list_models: List all registered models in TorchServe

    Arguments:

    • limit (int, optional): Maximum number of items to return (default: 100).

    • next_page_token (int, optional): Token to retrieve the next set of results

    Usage:

    response = ts_client.management.list_models()
    response.msg
  • register_model : Register a new model to TorchServe

    Arguments:

    • batch_size (int, optional): Inference batch size (default: 1).

    • handler (str, optional): Inference handler entry-point.

    • initial_workers (int, optional): Number of initial workers (default: 0).

    • max_batch_delay (int, optional): Maximum delay for batch aggregation (default: 100).

    • model_name (str, optional): Name of the model.

    • response_timeout (int, optional): Maximum time for model response (default: 120 seconds).

    • runtime (str, optional): Runtime for model custom service code.

    • synchronous (bool, optional): Synchronous worker creation (default: False).

    • url (str, required): Model archive download URL.

    • s3_sse_kms (bool, optional): S3 SSE KMS enabled (default: False).

    Usage:

    response = ts_client.management.register_model()
    response.msg
  • scale_worker: Configure the number of workers for a model. This is an asynchronous call by default

    Arguments:

    • model_name (str, required): Name of the model to scale workers.

    • model_version (str, optional): Model version.

    • max_worker (int, optional): Maximum number of worker processes.

    • min_worker (int, optional): Minimum number of worker processes.

    • number_gpu (int, optional): Number of GPU worker processes to create.

    • synchronous (bool, optional): Synchronous call (default: False).

    • timeout (int, optional): Wait time for worker completion (0: terminate immediately, -1: wait infinitely).

    Usage:

    response = ts_client.management.scale_worker()
    response.msg
  • set_default: Set default version of a model

    Arguments:

    • model_name (str, required): Name of the model for which the default version should be updated

    • model_version (str, required): Version of the model to set as the default version

    Usage:

    response = ts_client.management.set_default()
    response.msg
  • unregister_model: Unregister a particular version of a model from TorchServe. This call is asynchronous by default.

    Arguments:

    • model_name (str, required): Name of the model to unregister.

    • model_version (str, optional): Version of the model to unregister. If none, then default version of the model will be unregistered.

    Usage:

    response = ts_client.management.unregister_model()
    response.msg

Check management.proto file to better understand the arguments of each method.

Inference APIs

Here is a list gRPC inference endpoints:

  • ping: Check Health Status

    Usage:

    response = ts_client.inference.ping()
    response.health
  • predictions: Get predictions

    Arguments:

    • model_name (str, required): Name of the model.

    • model_version (str, optional): Version of the model. If not provided, default version will be used.

    • input (Dict[str, bytes], required): Input data for model prediction

    Usage:

    response = ts_client.inference.predictions(model_name="mnist", input={"data": data})
    response.prediction.decode("utf-8")
  • steam_predictions: Get steaming predictions

    Arguments:

    • model_name (str, required): Name of the model.

    • model_version (str, optional): Version of the model. If not provided, default version will be used.

    • input (Dict[str, bytes], required): Input data for model prediction

    Usage:

    response = ts_client.inference.stream_predictions(model_name="mnist", input={"data": data})
    response.prediction.decode("utf-8")

Again, for more detail about gRPC request and response objects, refer inference.proto.