Kubetorch Examples

Kubetorch is the easiest way to execute ML workloads on Kubernetes at any scale. Simply write regular, undecorated Python programs, define the compute resources and environment you need, and dispatch them to run on your remote cluster with .to().

Kubetorch is a generational improvement on existing systems, including Kubeflow or custom CD applications.

  • Platform engineers who like Kubernetes clusters can continue to rely on the observability, auth, quota management, and logging features available in Kubernetes.
  • For engineers and researchers who prefer to work in Python, everything from defining an "image" to requesting GPUs and multiple nodes, to actual execution happens Pythonically.
  • All code is regular code and execution is perfectly reproducible across research and production (and back to research).
  • Complete flexibility and future-proof your platform. You can adopt any distributed framework (Ray, Spark, PyTorch Distributed, Dask, etc), use any orchestrator, use any model registry, and add any cloud.

In the examples, you will see a range of ML applications from training to inference, hyperparameter optimization, and batch data processing. We have many other examples! If you'd like to see anything specific or want us to adapt your code to Kubetorch please send us a note at hello@run.house.