Kubetorch Examples

Kubetorch is the easiest way to execute ML workloads on Kubernetes at any scale. Simply write regular, undecorated Python programs, define the compute resources and environment you need, and dispatch them to run on your remote cluster with .to() or with a decorator and running kubetorch deploy.

Kubetorch is a generational improvement on existing systems, including Kubeflow or custom CD applications.

  • Platform engineers who like Kubernetes clusters can rely on the observability, auth, quota management, and logging features available in Kubernetes.
  • ML/AI engineers and researchers who prefer to work in Python can work in Python, from defining an "image," to requesting GPUs and multiple nodes, to dispatching and eagerly executing their programs.
  • In development, iteration loops are ~3 seconds since code changes are hot-synced to remote, which eliminates the slow 20-30 minute delays from push-and-pray development and constant Docker image rebuilding.
  • In production, run the dispatch code identically for perfectly reproducible execution across research and production (and back to research).
  • Complete flexibility and future-proof your platform. You can adopt any distributed framework (Ray, Spark, PyTorch Distributed, Dask, etc), use any orchestrator, use any model registry, and add any cloud.

In the examples, you will see a range of ML applications from training to inference, hyperparameter optimization, and batch data processing. We have many other examples, just send us a ping if you'd like to see anything specific!

Installation

Kubetorch is deployed onto your own Kubernetes clusters via Helm chart, and any end users (or systems) with a kubeconfig can use the Kubetorch Python client to interact with powerful remote compute from any Python interpreter. If you do not currently use Kubernetes today, we have Terraform examples that provide reasonable defaults for EKS/GKE/AKS.

We are currently under a private beta. If you are interested in trying it out, shoot us a quick note at support@run.house and we will share the required deployment resources with you.