You are viewing v0.0.12 version. Click here to see docs for the latest stable version.

Module

Module Factory Method

runhouse.module(cls: [Type] = None, name: Optional[str] = None, system: Optional[Union[str, Cluster]] = None, env: Optional[Union[str, Env]] = None, dryrun: bool = False)[source]

Returns a Module object, which can be used to instantiate and interact with the class remotely.

The behavior of Modules (and subclasses thereof) is as follows:
  • Any callable public method of the module is intercepted and executed remotely over rpc, with exception of certain functions Python doesn’t make interceptable (e.g. __call__, __init__), and methods of the Module class (e.g. to, fetch, etc.). Properties and private methods are not intercepted, and will be executed locally.

  • Any method which executes remotely may be called normally, e.g. model.forward(x), or asynchronously, e.g. key = model.forward.run(x) (which returns a key to retrieve the result with cluster.get(key)), or with run_obj = model.train.remote(x), which runs synchronously but returns a remote object to avoid passing heavy results back over the network.

  • Setting attributes, both public and private, will be executed remotely, with the new values only being set in the remote module and not the local one. This excludes any methods or attribtes of the Module class proper (e.g. system or name), which will be set locally.

  • Attributes, private properties can be fetched with the remote property, and the full resource can be fetched using .fetch(), e.g. model.remote.weights, model.remote.__dict__, model.fetch().

  • When a module is sent to a cluster, it’s public attribtes are serialized, sent over, and repopulated in the remote instance. This means that any changes to the module’s attributes will not be reflected in the remote

Parameters:
  • cls – The class to instantiate.

  • name (Optional[str]) – Name to give the module object, to be reused later on.

  • system (Optional[str or Cluster]) – File system or cluster name. If providing a file system this must be one of: [file, github, sftp, ssh, s3, gs, azure]. We are working to add additional file system support. If providing a cluster, this must be a cluster object or name, and whether the data is saved to the object store or filesystem depends on whether a path is specified.

  • env (Optional[str or Env]) – Environment in which the module should live on the cluster, if system is cluster.

  • dryrun (bool) – Whether to create the Blob if it doesn’t exist, or load a Blob object as a dryrun. (Default: False)

Returns:

The resulting module.

Return type:

Module

Example - creating a module by defining an rh.Module subclass:
>>> import runhouse as rh >>> import transformers >>> >>> # Sample rh.Module class >>> class Model(rh.Module): >>> def __init__(self, model_id, device="cpu", system=None, env=None): >>> # Note that the code here will be run in your local environment prior to being sent to >>> # to a cluster. For loading large models/datasets that are only meant to be used remotely, >>> # we recommend using lazy initialization (see tokenizer and model attributes below). >>> super().__init__(system=system, env=env) >>> self.model_id = model_id >>> self.device = device >>> >>> @property >>> def tokenizer(self): >>> # Lazily initialize the tokenizer remotely only when it is needed >>> if not hasattr(self, '_tokenizer'): >>> self._tokenizer = transformers.AutoTokenizer.from_pretrained(self.model_id) >>> return self._tokenizer >>> >>> @property >>> def model(self): >>> if not hasattr(self, '_model'): >>> self._model = transformers.AutoModel.from_pretrained(self.model_id).to(self.device) >>> return self._model >>> >>> def predict(self, x): >>> x = self.tokenizer(x, return_tensors="pt") >>> return self.model(x)
>>> # Creating rh.Module instance >>> model = Model(model_id="bert-base-uncased", device="cuda", system="my_gpu", env="my_env") >>> model.predict("Hello world!") # Runs on system in env >>> tok = model.remote.tokenizer # Returns remote tokenizer >>> id = model.local.model_id # Returns local model_id, if any >>> model_id = model.model_id # Returns local model_id (not remote) >>> model.fetch() # Returns full remote module, including model and tokenizer >>>
Example - creating a Module from an existing class, via the rh.module() factory method:
>>> other_model = Model(model_id="bert-base-uncased", device="cuda").to("my_gpu", "my_env") >>> >>> # Another method: Create a module instance from an existing non-Module class using rh.module() >>> RemoteModel = rh.module(cls=BERTModel, system="my_gpu", env="my_env") >>> remote_model = RemoteModel(model_id="bert-base-uncased", device="cuda") >>> remote_model.predict("Hello world!") # Runs on system in env >>> >>> # You can also call remote class methods >>> other_model = RemoteModel.get_model_size("bert-base-uncased")
>>> # Loading a module >>> my_local_module = rh.module(name="~/my_module") >>> my_s3_module = rh.module(name="@/my_module")

Module Class

class runhouse.Module(cls_pointers: Optional[Tuple] = None, name: Optional[str] = None, system: Optional[Cluster] = None, env: Optional[Env] = None, dryrun: bool = False, provenance: Optional[dict] = None, **kwargs)[source]
__init__(cls_pointers: Optional[Tuple] = None, name: Optional[str] = None, system: Optional[Cluster] = None, env: Optional[Env] = None, dryrun: bool = False, provenance: Optional[dict] = None, **kwargs)[source]

Runhouse Module object

fetch(item: Optional[str] = None, stream_logs: bool = False)[source]

Helper method to allow for access to remote state, both public and private. Fetching functions is not advised. system.get(module.name).resolved_state() is roughly equivalent to module.fetch().

Example

>>> my_module.fetch("my_property") >>> my_module.fetch("my_private_property")
>>> MyRemoteClass = rh.module(my_class).to(system) >>> MyRemoteClass(*args).fetch() # Returns a my_class instance, populated with the remote state
>>> my_blob.fetch() # Returns the data of the blob, due to overloaded ``resolved_state`` method
>>> class MyModule(rh.Module): >>> # ... >>> >>> MyModule(*args).to(system).fetch() # Returns the full remote module, including private and public state
async fetch_async(key: str, remote: bool = False, stream_logs: bool = False)[source]

Async version of fetch. Can’t be a property like fetch because __getattr__ can’t be awaited.

Example

>>> await my_module.fetch_async("my_property") >>> await my_module.fetch_async("_my_private_property")
get_or_to(system: Union[str, Cluster], env: Optional[Union[str, List[str], Env]] = None, name: Optional[str] = None)[source]

Check if the module already exists on the cluster, and if so return the module object. If not, put the module on the cluster and return the remote module.

Example

>>> remote_df = Model().get_or_to(my_cluster, name="remote_model")
property local

Helper property to allow for access to local properties, both public and private.

Example

>>> my_module.local.my_property >>> my_module.local._my_private_property
>>> my_module.local.size = 14
refresh()[source]

Update the resource in the object store.

property remote

Helper property to allow for access to remote properties, both public and private. Returning functions is not advised.

Example

>>> my_module.remote.my_property >>> my_module.remote._my_private_property >>> my_module.remote.size = 14
rename(name: str)[source]

Rename the module.

resolve()[source]

Specify that the module should resolve to a particular state when passed into a remote method. This is useful if you want to revert the module’s state to some “Runhouse-free” state once it is passed into a Runhouse-unaware function. For example, if you call a Runhouse-unaware function with .remote(), you will be returned a Blob which wraps your data. If you want to pass that Blob into another function that operates on the original data (e.g. a function that takes a numpy array), you can call my_second_fn(my_blob.resolve()), and my_blob will be replaced with the contents of its .data on the cluster before being passed into my_second_fn.

Resolved state is defined by the resolved_state method. By default, modules created with the rh.module factory constructor will be resolved to their original non-module-wrapped class (or best attempt). Modules which are defined as a subclass of Module will be returned as-is, as they have no other “original class.”

Example

>>> my_module = rh.module(my_class) >>> my_remote_fn(my_module.resolve()) # my_module will be replaced with the original class `my_class`
>>> my_result_blob = my_remote_fn.remote(args) >>> my_other_remote_fn(my_result_blob.resolve()) # my_result_blob will be replaced with its data
resolved_state()[source]

Return the resolved state of the module. By default, this is the original class of the module if it was created with the module factory constructor.

save(name: Optional[str] = None, overwrite: bool = True)[source]

Register the resource and save to local working_dir config and RNS config store.

async set_async(key: str, value)[source]

Async version of property setter.

Example

>>> await my_module.set_async("my_property", my_value) >>> await my_module.set_async("_my_private_property", my_value)
to(system: Union[str, Cluster], env: Optional[Union[str, List[str], Env]] = None, name: Optional[str] = None)[source]

Put a copy of the module on the destination system and env, and return the new module.

Example

>>> local_module = rh.module(my_class) >>> cluster_module = local_module.to("my_cluster")