# Initialize the gRPC TorchServeClient object
= TorchServeClientGRPC()
ts_client ts_client
TorchServeClientGRPC(base_url=localhost, management_port=7071, inference_port=7070)
TorchServeClientGRPC (base_url=None, management_port=7071, inference_port=7070)
Initialize self. See help(type(self)) for accurate signature.
To create a gRPC client, simply create a TorchServeClientGRPC
object
TorchServeClientGRPC(base_url=localhost, management_port=7071, inference_port=7070)
To customize base URL and default ports, pass them as arguments during initialization
Here is the list of all the supported gRPC management endpoints:
describe_model
: Provide detailed information about the default version of a model
Arguments:
model_name
(str, required): Name of the model to describe
model_version
(str, optional): Version of the model to describe
customized
(bool, optional): Customized metadata
Usage:
list_models
: List all registered models in TorchServe
Arguments:
limit
(int, optional): Maximum number of items to return (default: 100).
next_page_token
(int, optional): Token to retrieve the next set of results
Usage:
register_model
: Register a new model to TorchServe
Arguments:
batch_size
(int, optional): Inference batch size (default: 1).
handler
(str, optional): Inference handler entry-point.
initial_workers
(int, optional): Number of initial workers (default: 0).
max_batch_delay
(int, optional): Maximum delay for batch aggregation (default: 100).
model_name
(str, optional): Name of the model.
response_timeout
(int, optional): Maximum time for model response (default: 120 seconds).
runtime
(str, optional): Runtime for model custom service code.
synchronous
(bool, optional): Synchronous worker creation (default: False).
url
(str, required): Model archive download URL.
s3_sse_kms
(bool, optional): S3 SSE KMS enabled (default: False).
Usage:
scale_worker
: Configure the number of workers for a model. This is an asynchronous call by default
Arguments:
model_name
(str, required): Name of the model to scale workers.
model_version
(str, optional): Model version.
max_worker
(int, optional): Maximum number of worker processes.
min_worker
(int, optional): Minimum number of worker processes.
number_gpu
(int, optional): Number of GPU worker processes to create.
synchronous
(bool, optional): Synchronous call (default: False).
timeout
(int, optional): Wait time for worker completion (0: terminate immediately, -1: wait infinitely).
Usage:
set_default
: Set default version of a model
Arguments:
model_name
(str, required): Name of the model for which the default version should be updated
model_version
(str, required): Version of the model to set as the default version
Usage:
unregister_model
: Unregister a particular version of a model from TorchServe. This call is asynchronous by default.
Arguments:
model_name
(str, required): Name of the model to unregister.
model_version
(str, optional): Version of the model to unregister. If none, then default version of the model will be unregistered.
Usage:
Check management.proto
file to better understand the arguments of each method.
Here is a list gRPC inference endpoints:
ping
: Check Health Status
Usage:
predictions
: Get predictions
Arguments:
model_name
(str, required): Name of the model.
model_version
(str, optional): Version of the model. If not provided, default version will be used.
input
(Dict[str, bytes], required): Input data for model prediction
Usage:
steam_predictions
: Get steaming predictions
Arguments:
model_name
(str, required): Name of the model.
model_version
(str, optional): Version of the model. If not provided, default version will be used.
input
(Dict[str, bytes], required): Input data for model prediction
Usage:
Again, for more detail about gRPC request and response objects, refer inference.proto
.