In this example, we show you how to start a basic hyperparameter tuning using Ray Tune on a remote cluster. You simply need to write your Ray Tune program as you would normally, and then send it to the remote cluster using Runhouse. Runhouse handles all the complexities of launching and setting up the remote Ray cluster for you.
import time from typing import Any, Dict import runhouse as rh from ray import tune
We define a Trainable class and a find_minumum function to demonstrate a basic example of using Ray Tune for hyperparameter optimization. You should simply think of this as "any regular Ray Tune program" that you would write entirely agnostic of Runhouse.
def train_fn(step, width, height): time.sleep(5) return (0.1 + width * step / 100) ** (-1) + height * 0.1 class Trainable(tune.Trainable): def setup(self, config: Dict[str, Any]): self.step_num = 0 self.reset_config(config) def reset_config(self, new_config: Dict[str, Any]): self._config = new_config return True def step(self): score = train_fn(self.step_num, **self._config) self.step_num += 1 return {"score": score} def cleanup(self): super().cleanup() def load_checkpoint(self, checkpoint_dir: str): return None def save_checkpoint(self, checkpoint_dir: str): return None def find_minimum(num_concurrent_trials=None, num_samples=1, metric_name="score"): search_space = { "width": tune.uniform(0, 20), "height": tune.uniform(-100, 100), } tuner = tune.Tuner( Trainable, tune_config=tune.TuneConfig( metric=metric_name, mode="max", max_concurrent_trials=num_concurrent_trials, num_samples=num_samples, reuse_actors=True, ), param_space=search_space, ) tuner.fit() return tuner.get_results().get_best_result()
We will now launch the compute using Runhouse, set up the Ray Cluster, and run the hyperparameter optimization on the remote compute.
.to()
and instruct Runhouse to setup Ray with .distribute("ray")
.Note
The code to launch, dispatch, and execute should run within a if __name__ == "__main__":
block, as shown below. Otherwise,
this script code will run when Runhouse runs the code remotely.
if __name__ == "__main__": num_nodes = 2 num_cpus_per_node = 4 img = rh.Image("tune").install_packages(["pyarrow>=9.0.0", "ray[tune]>=2.38.0"]) cluster = rh.cluster( name="rh-cpu", num_nodes=num_nodes, image=img, num_cpus=num_cpus_per_node, # You have other options such as to specify memory and disk size gpus=None, # This example does not need GPUs, but you can specify GPUs like "A100:2" here to get 2 A100 GPUs per node provider="aws", # gcp, kubernetes, etc. ).up_if_not() remote_find_minimum = rh.function(find_minimum).to(cluster).distribute("ray") best_result = remote_find_minimum(num_samples=8)