• TorchServe - a new open-source model serving library providing a clean, well supported, and industrial-grade path to deploying PyTorch models for inference at scale. • TorchElastic - a library for fault-tolerant and elastic training in PyTorch. With the TorchElastic Kubernetes controller, developers can create fault-tolerant distributed training jobs in PyTorch using their Kubernetes clusters, including Amazon EC2 Spot instances on Amazon Elastic Kubernetes Service (EKS).

Don't forget to tag @pytorch , @kiukchung , @mikestef9 , @jspisak , @adityabindal in your comment, otherwise they may not be notified.

Authors community post
EKS & Container OSS @aws
Share this project
Similar projects
Hyperparameter search with W&B and Kubernetes
An end-to-end guide on training your models on a Kubernetes cluster, and tracking them with Weights & Biases.
Deep Dive into ML Models in Production Using TFX and Kubeflow
Meet TensorFlow Extended - learn to take an example ML project and put it in production using TFX, Google AI Platform Pipelines, and Kubeflow!
An Introduction to Kubernetes
This blog post will provide an introduction to Kubernetes so that you can understand the motivation behind the tool, what it is, and how you can use it.
Top collections