TensorFlow Serving
A flexible, high-performance serving system for machine learning models, designed for production environments.
model-serving production tensorflow-serving tensorflow library code


Servables are the central abstraction in TensorFlow Serving. They are the underlying objects that clients use to perform computation (for example, a lookup or inference).

The size and granularity of a Servable is flexible. A single Servable might include anything from a single shard of a lookup table to a single model to a tuple of inference models. Servables can be of any type and interface, enabling flexibility and future improvements such as:

  • streaming results
  • experimental APIs
  • asynchronous modes of operation
  • Servables do not manage their own lifecycle

Don't forget to tag @tensorflow in your comment, otherwise they may not be notified.

Authors community post
Share this project
Similar projects
BentoML is an open-source framework for high-performance ML model serving.
Deploying your ML Model with TorchServe
In this talk, Brad Heintz walks through how to use TorchServe to deploy trained models at scale without writing custom code.
Efficient Serverless Deployment of PyTorch Models on Azure
A tutorial for serving models cost-effectively at scale using Azure Functions and ONNX Runtime.
Build machine learning APIs.