The Runhouse Folder makes it easy to send folders and files between your local environment, a cluster, or your cloud storage (using your own credentials), without needing to learn and provider-specific APIs.
To install base runhouse:
!pip install runhouse
Runhouse supports sending folders to/from cloud storage such as s3, gcs,
azure. To download provider-specific libraries that are used under the
hood, you can install "runhouse[aws/gcp/azure]"
. In this tutorial we
demonstrate with s3 and gcs, and install "runhouse[aws, gcp]"
.
!pip install "runhouse[aws, gcp]"
If you would like to use s3
or gcs
, please make sure to also set
up your credentials locally. You can see the instructions for this by
running sky check
.
import runhouse as rh
Here we define a simple folder structure in our current directory, a
simple sample-folder
consisting of 5 files, 1-5.txt
.
import os folder_name = "sample-folder" os.makedirs(folder_name, exist_ok=True) for i in range(5): with open(f'{folder_name}/{i}.txt', 'w') as f: f.write('i') local_path = f"{os.getcwd()}/{folder_name}" local_path
'/Users/caroline/Documents/runhouse/notebooks/docs/sample-folder'
Launch a basic cluster, as the tutorial will demonstrate sending the local folder to the cluster. You can learn more about clusters in the Cluster tutorial.
cluster = rh.cluster( name="rh-cluster", instance_type="CPU:2+", provider="aws", ) cluster.up_if_not()
Construct a Runhouse folder object with rh.folder
, passing in the
path of the folder you’d like it to represent. Optionally pass in a
system=<cluster>/s3/gcs/azure
that the folder lives on.
Here, we construct a Runhouse folder object that represents the
sample-folder
that we created earlier.
local_folder = rh.folder(path=local_path)
To print the full paths, call .ls()
, or for relative paths, call
.ls(full_paths=False)
.
local_folder.ls(full_paths=False)
['4.txt', '3.txt', '2.txt', '0.txt', '1.txt']
To send it to a cluster, call .to(system=cluster)
, and optionally
pass in a path. If no path is provided, it will be automatically
generated. The path can be retrieved by calling .path
on the
resulting object.
cluster_folder = local_folder.to(system=cluster, path=folder_name)
INFO | 2024-03-06 04:35:08.517625 | Copying folder from file:///Users/caroline/Documents/runhouse/notebooks/docs/sample-folder to: rh-cluster, with path: sample-folder
cluster_folder.ls()
['sample-folder/3.txt',
'sample-folder/0.txt',
'sample-folder/4.txt',
'sample-folder/2.txt',
'sample-folder/1.txt']
cluster_folder.path
'sample-folder'
Sending to S3/GCS is similar, call .to(system=s3/gcs)
.
gs_folder = local_folder.to(system="gs")
INFO | 2024-03-06 04:35:38.607986 | Copying folder from file:///Users/caroline/Documents/runhouse/notebooks/docs/sample-folder to: gs, with path: /runhouse-folder/bd489bb276734f7f8c23e401e6bb2b51
gs_folder.ls(full_paths=False)
['0.txt', '1.txt', '2.txt', '3.txt', '4.txt']
Similarly, for s3:
s3_folder = local_folder.to(system="s3")
INFO | 2024-03-06 04:36:04.390441 | Copying folder from file:///Users/caroline/Documents/runhouse/notebooks/docs/sample-folder to: s3, with path: /runhouse-folder/dae8c16b71a744cb976da0dace7c4db2
s3_folder.ls(full_paths=False)
['0.txt', '1.txt', '2.txt', '3.txt', '4.txt']
The keyword for sending to local is .to("here")
.
new_local_folder = s3_folder.to("here", path="new-sample-folder")
INFO | 2024-03-06 04:38:01.269441 | Copying folder from s3://runhouse-folder/dae8c16b71a744cb976da0dace7c4db2 to: file, with path: new-sample-folder
new_local_folder.ls(full_paths=False)
['4.txt', '3.txt', '2.txt', '0.txt', '1.txt']
Folders can be sent between any pair of local, cluster, or cloud storage, including between different clusters, or within the same cloud storage but duplicating the folder to a second location in storage.