Ray Hyperparameter Tuning

In this example, we show you how to start a basic hyperparameter tuning using Ray Tune on remote compute. You simply need to write your Ray Tune program as you would normally, and then send it to the remote cluster. Kubetorch handles all the complexities of launching and setting up the remote Ray cluster for you.

import time from typing import Any, Dict import kubetorch as kt from ray import tune

Define a Ray Tune program

We define a Trainable class and a find_minumum function to demonstrate a basic example of using Ray Tune for hyperparameter optimization. You should simply think of this as "any regular Ray Tune program" that you would write entirely agnostic of Kubetorch.

  • Train_fn is a dummy training function that takes in a step number, width, and height as arguments and returns a score.
  • The Trainable class is a subclass of tune.Trainable that implements the training logic.
  • The find_minimum function sets up the hyperparameter search space and launches the hyperparameter optimization using Ray Tune.
def train_fn(step, width, height): time.sleep(5) return (0.1 + width * step / 100) ** (-1) + height * 0.1 class Trainable(tune.Trainable): def setup(self, config: Dict[str, Any]): self.step_num = 0 self.reset_config(config) def reset_config(self, new_config: Dict[str, Any]): self._config = new_config return True def step(self): score = train_fn(self.step_num, **self._config) self.step_num += 1 return {"score": score} def cleanup(self): super().cleanup() def load_checkpoint(self, checkpoint_dir: str): return None def save_checkpoint(self, checkpoint_dir: str): return None def find_minimum(num_concurrent_trials=None, num_samples=1, metric_name="score"): search_space = { "width": tune.uniform(0, 20), "height": tune.uniform(-100, 100), } tuner = tune.Tuner( Trainable, tune_config=tune.TuneConfig( metric=metric_name, mode="max", max_concurrent_trials=num_concurrent_trials, num_samples=num_samples, reuse_actors=True, ), param_space=search_space, ) tuner.fit() return tuner.get_results().get_best_result()

Launch Hyperparameter Tuning

We will now dispatch the program, set up Ray, and run the hyperparameter optimization on the remote compute.

if __name__ == "__main__": head = kt.Compute(num_cpus=4, image=kt.images.ray()) remote_find_minimum = kt.fn(find_minimum).to(head).distribute("ray", num_nodes=2) best_result = remote_find_minimum(num_samples=8)