Installation Guide
This guide will help you get to a working setup with Kubetorch using its base default settings. You will:
- Install the Python client with
piporuv - Download and
helm installthe Kubetorch chart on your cluster - Optionally enable features like autoscaling or Ray with additional installs.
After that, we recommend running our hello world to ensure everything is working. For advanced configuration and other cloud-specific options, please contact the Kubetorch team at hello@run.house.
Python Client Installation
Kubetorch provides a Python client for interacting with your cluster, and should be installed for both local development and within your Docker images. You can install it either with pip or with uv, which offers faster resolution and reproducible lockfiles.
Installing with pip
This works anywhere Python is available and is the simplest option if you just need to get started quickly.
pip install "kubetorch[client]"
Installing with uv
uv pip install "kubetorch[client]"
Note
If you are running Kubetorch from a Mac, you should update rsync with: brew install rsync.
Mac devices ship with an older version of rsync that is missing modern features required by Kubetorch for code and data syncing.
Kubernetes Installation
You can install Kubetorch on an existing cluster or with a new one.
Kubetorch Helm charts are hosted publicly on GitHub Container Registry (GHCR), so you can pull or install them directly โ no authentication or token required.
helm registry login ghcr.io --username run-house --password-stdin
Install Kubetorch
You can install Kubetorch in several ways:
Option 1: Pull the chart locally
Download and extract the chart:
helm pull oci://ghcr.io/run-house/charts/kubetorch --version <VERSION> --untar
This creates a local directory named kubetorch. Update values.yaml if needed, then install:
helm upgrade --install kubetorch ./kubetorch -n kubetorch --create-namespace
Option 2: Install from OCI
Skip downloading and install directly from OCI:
helm upgrade --install kubetorch oci://ghcr.io/run-house/charts/kubetorch \ --version <VERSION> -n kubetorch --create-namespace
Option 3: Install with Helmfile
If you prefer Helmfile, define the release in helmfile.yaml:
releases: - name: kubetorch namespace: kubetorch chart: oci://ghcr.io/run-house/charts/kubetorch version: <VERSION> values: - ./values.yaml # Adjust the path as needed
Then sync your releases:
helmfile sync
Install Knative (Recommended)
Autoscaling Kubetorch services requires Knative to be present on your cluster. You may skip this step if you are not planning to use autoscaling.
If Knative isnโt already installed, you can add the Operator by running:
helm repo add knative-operator https://knative.github.io/operator helm repo update helm install knative-operator --create-namespace --namespace knative-operator knative-operator/knative-operator
Note
If your Kubernetes cluster is version < 1.31.0, install Knative Operator < 1.18.0 with the --version flag
Next, we'll create a KnativeServing custom resource that configures and enables Knative Serving in the
knative-serving namespace by applying the YAML in the Helm chart:
kubectl create namespace knative-serving kubectl apply -f ./kubetorch/knative/serving.yaml
Install Ray (Optional)
Kubetorch supports Ray out of the box. To enable Ray, install the KubeRay Operator by running the following commands:
helm repo add kuberay https://ray-project.github.io/kuberay-helm/ helm repo update # Install both CRDs and KubeRay operator v1.4.0. helm install kuberay-operator kuberay/kuberay-operator --version 1.4.0
For more information on installation and usage, see the KubeRay Operator documentation.
New Kubernetes Cluster
If you want to create a new Kubernetes cluster with Kubetorch installed, please use the Terraform script provided to you by the Kubetorch team.
This script will:
- Create a new Kubernetes cluster
- Install the Kubetorch Helm chart
- Set up all necessary dependencies (including log streaming)
Additional Configuration
The following sections are optional and generally not necessary for a minimal working setup.
DNS Resolver
By default, Kubetorch will use the kube-dns resolver, which is the EKS/GKE default. If your cluster is using a
different DNS resolver (like coredns), you can use the resolver field in the nginx section of the values.yaml
file to point to your DNS resolver service:
nginx: resolver: "coredns.kube-system.svc.cluster.local"
Or if running with Helm directly:
helm upgrade --install kubetorch oci://ghcr.io/run-house/charts/kubetorch \ --version <version> -n kubetorch --create-namespace \ --set nginx.resolver="coredns.kube-system.svc.cluster.local"
Code & Data Sync
Kubetorch provides a built-in mechanism for syncing your local code and data into the cluster. This sync service is deployed automatically with the Kubetorch stack and is required for running workloads.
You can configure concurrency limits, timeouts, and resource allocations (CPU, memory, ephemeral storage) to match
your workload needs in the values.yaml of the Helm chart:
rsync: image: ghcr.io/run-house/kubetorch-rsync:v5 maxConnections: 500 # Maximum concurrent rsync connections (increase for many worker pods) timeout: 600 # Connection timeout in seconds maxVerbosity: 0 # Log verbosity (0-4, use 0 for production, higher for debugging) maxConnectionsPerModule: 0 # Per-module limit (0 = unlimited, inherits global limit) cpu: request: 2 limit: 4 memory: request: 4Gi limit: 8Gi ephemeralStorage: # adjust based on expected node disk size request: xxGi # <โ update this based on your expected workload size limit: xxGi # <โ typically 2โ3ร the request cleanupCron: enabled: false # set to true to enable pod cleanup
By default, the sync service uses ephemeral storage, meaning files will not persist if the pod restarts. If you need persistence across restarts, you can attach a PersistentVolumeClaim (e.g. JuiceFS, EBS).