Getting Started
API Basics
API Reference
Other Topics
main
version. Click here to see docs for the latest stable version.The Runhouse Folder makes it easy to send folders and files between your local environment, a cluster, or your cloud storage (using your own credentials), without needing to learn and provider-specific APIs.
To install base runhouse:
!pip install runhouse
Runhouse supports sending folders to/from cloud storage such as s3, gcs,
azure. To download provider-specific libraries that are used under the
hood, you can install "runhouse[aws/gcp/azure]"
. In this tutorial we
demonstrate with s3 and gcs, and install "runhouse[aws, gcp]"
.
!pip install "runhouse[aws, gcp]"
If you would like to use s3
or gcs
, please make sure to also set
up your credentials locally. You can see the instructions for this by
running sky check
.
import runhouse as rh
Here we define a simple folder structure in our current directory, a
simple sample-folder
consisting of 5 files, 1-5.txt
.
import os folder_name = "sample-folder" os.makedirs(folder_name, exist_ok=True) for i in range(5): with open(f'{folder_name}/{i}.txt', 'w') as f: f.write('i') local_path = f"{os.getcwd()}/{folder_name}" local_path
'/Users/caroline/Documents/runhouse/notebooks/docs/sample-folder'
Launch a basic cluster, as the tutorial will demonstrate sending the local folder to the cluster. You can learn more about clusters in the Cluster tutorial.
cluster = rh.cluster( name="rh-cluster", instance_type="CPU:2+", provider="aws", ) cluster.up_if_not()
Construct a Runhouse folder object with rh.folder
, passing in the
path of the folder you’d like it to represent. Optionally pass in a
system=<cluster>/s3/gcs/azure
that the folder lives on.
Here, we construct a Runhouse folder object that represents the
sample-folder
that we created earlier.
local_folder = rh.folder(path=local_path)
To print the full paths, call .ls()
, or for relative paths, call
.ls(full_paths=False)
.
local_folder.ls(full_paths=False)
['4.txt', '3.txt', '2.txt', '0.txt', '1.txt']
To send it to a cluster, call .to(system=cluster)
, and optionally
pass in a path. If no path is provided, it will be automatically
generated. The path can be retrieved by calling .path
on the
resulting object.
cluster_folder = local_folder.to(system=cluster, path=folder_name)
INFO | 2024-03-06 04:35:08.517625 | Copying folder from file:///Users/caroline/Documents/runhouse/notebooks/docs/sample-folder to: rh-cluster, with path: sample-folder
cluster_folder.ls()
['sample-folder/3.txt',
'sample-folder/0.txt',
'sample-folder/4.txt',
'sample-folder/2.txt',
'sample-folder/1.txt']
cluster_folder.path
'sample-folder'
Sending to S3/GCS is similar, call .to(system=s3/gcs)
.
gs_folder = local_folder.to(system="gs")
INFO | 2024-03-06 04:35:38.607986 | Copying folder from file:///Users/caroline/Documents/runhouse/notebooks/docs/sample-folder to: gs, with path: /runhouse-folder/bd489bb276734f7f8c23e401e6bb2b51
gs_folder.ls(full_paths=False)
['0.txt', '1.txt', '2.txt', '3.txt', '4.txt']
Similarly, for s3:
s3_folder = local_folder.to(system="s3")
INFO | 2024-03-06 04:36:04.390441 | Copying folder from file:///Users/caroline/Documents/runhouse/notebooks/docs/sample-folder to: s3, with path: /runhouse-folder/dae8c16b71a744cb976da0dace7c4db2
s3_folder.ls(full_paths=False)
['0.txt', '1.txt', '2.txt', '3.txt', '4.txt']
The keyword for sending to local is .to("here")
.
new_local_folder = s3_folder.to("here", path="new-sample-folder")
INFO | 2024-03-06 04:38:01.269441 | Copying folder from s3://runhouse-folder/dae8c16b71a744cb976da0dace7c4db2 to: file, with path: new-sample-folder
new_local_folder.ls(full_paths=False)
['4.txt', '3.txt', '2.txt', '0.txt', '1.txt']
Folders can be sent between any pair of local, cluster, or cloud storage, including between different clusters, or within the same cloud storage but duplicating the folder to a second location in storage.