# Initialize the gRPC TorchServeClient object
ts_client = TorchServeClientGRPC()
ts_clientTorchServeClientGRPC(base_url=localhost, management_port=7071, inference_port=7070)
TorchServeClientGRPC (base_url=None, management_port=7071, inference_port=7070)
Initialize self. See help(type(self)) for accurate signature.
To create a gRPC client, simply create a TorchServeClientGRPC object
TorchServeClientGRPC(base_url=localhost, management_port=7071, inference_port=7070)
To customize base URL and default ports, pass them as arguments during initialization
Here is the list of all the supported gRPC management endpoints:
describe_model: Provide detailed information about the default version of a model
Arguments:
model_name (str, required): Name of the model to describe
model_version (str, optional): Version of the model to describe
customized (bool, optional): Customized metadata
Usage:
list_models: List all registered models in TorchServe
Arguments:
limit (int, optional): Maximum number of items to return (default: 100).
next_page_token (int, optional): Token to retrieve the next set of results
Usage:
register_model : Register a new model to TorchServe
Arguments:
batch_size (int, optional): Inference batch size (default: 1).
handler (str, optional): Inference handler entry-point.
initial_workers (int, optional): Number of initial workers (default: 0).
max_batch_delay (int, optional): Maximum delay for batch aggregation (default: 100).
model_name (str, optional): Name of the model.
response_timeout (int, optional): Maximum time for model response (default: 120 seconds).
runtime (str, optional): Runtime for model custom service code.
synchronous (bool, optional): Synchronous worker creation (default: False).
url (str, required): Model archive download URL.
s3_sse_kms (bool, optional): S3 SSE KMS enabled (default: False).
Usage:
scale_worker: Configure the number of workers for a model. This is an asynchronous call by default
Arguments:
model_name (str, required): Name of the model to scale workers.
model_version (str, optional): Model version.
max_worker (int, optional): Maximum number of worker processes.
min_worker (int, optional): Minimum number of worker processes.
number_gpu (int, optional): Number of GPU worker processes to create.
synchronous (bool, optional): Synchronous call (default: False).
timeout (int, optional): Wait time for worker completion (0: terminate immediately, -1: wait infinitely).
Usage:
set_default: Set default version of a model
Arguments:
model_name (str, required): Name of the model for which the default version should be updated
model_version (str, required): Version of the model to set as the default version
Usage:
unregister_model: Unregister a particular version of a model from TorchServe. This call is asynchronous by default.
Arguments:
model_name (str, required): Name of the model to unregister.
model_version (str, optional): Version of the model to unregister. If none, then default version of the model will be unregistered.
Usage:
Check management.proto file to better understand the arguments of each method.
Here is a list gRPC inference endpoints:
ping: Check Health Status
Usage:
predictions: Get predictions
Arguments:
model_name (str, required): Name of the model.
model_version (str, optional): Version of the model. If not provided, default version will be used.
input (Dict[str, bytes], required): Input data for model prediction
Usage:
steam_predictions: Get steaming predictions
Arguments:
model_name (str, required): Name of the model.
model_version (str, optional): Version of the model. If not provided, default version will be used.
input (Dict[str, bytes], required): Input data for model prediction
Usage:
Again, for more detail about gRPC request and response objects, refer inference.proto.