You are viewing v0.0.12 version. Click here to see docs for the latest stable version.


A Table is a Runhouse primitive used for abstracting a particular tabular data storage configuration.

Table Factory Method

runhouse.table(data=None, name: Optional[str] = None, path: Optional[str] = None, system: Optional[str] = None, data_config: Optional[dict] = None, partition_cols: Optional[list] = None, mkdir: bool = False, dryrun: bool = False, stream_format: Optional[str] = None, metadata: Optional[dict] = None) Table[source]

Constructs a Table object, which can be used to interact with the table at the given path.

  • data – Data to be stored in the table.

  • name (Optional[str]) – Name for the table, to reuse it later on.

  • path (Optional[str]) – Full path to the data file.

  • system (Optional[str]) – File system. Currently this must be one of: [file, github, sftp, ssh, s3, gs, azure].

  • data_config (Optional[dict]) – The data config to pass to the underlying fsspec handler.

  • partition_cols (Optional[list]) – List of columns to partition the table by.

  • mkdir (bool) – Whether to create a remote folder for the table. (Default: False)

  • dryrun (bool) – Whether to create the Table if it doesn’t exist, or load a Table object as a dryrun. (Default: False)

  • stream_format (Optional[str]) – Format to stream the Table as. Currently this must be one of: [pyarrow, torch, tf, pandas]

  • metadata (Optional[dict]) – Metadata to store for the table.


The resulting Table object.

Return type:



>>> import runhouse as rh >>> # Create and save (pandas) table >>> rh.table( >>> data=data, >>> name="~/my_test_pandas_table", >>> path="table_tests/test_pandas_table.parquet", >>> system="file", >>> mkdir=True, >>> ).save() >>> >>> # Load table from above >>> reloaded_table = rh.table(name="~/my_test_pandas_table")

Table Class

class runhouse.Table(path: str, name: Optional[str] = None, file_name: Optional[str] = None, system: Optional[str] = None, data_config: Optional[dict] = None, dryrun: bool = False, partition_cols: Optional[List] = None, stream_format: Optional[str] = None, metadata: Optional[Dict] = None, **kwargs)[source]
__init__(path: str, name: Optional[str] = None, file_name: Optional[str] = None, system: Optional[str] = None, data_config: Optional[dict] = None, dryrun: bool = False, partition_cols: Optional[List] = None, stream_format: Optional[str] = None, metadata: Optional[Dict] = None, **kwargs)[source]

The Runhouse Table object.


To build a Table, please use the factory method table().

property data: Dataset

Get the table data. If data is not already cached, return a Ray dataset.

With the dataset object we can stream or convert to other types, for example:

data.iter_batches() data.to_pandas() data.to_dask()

Whether the table exists in file system.


>>> table.exists_in_system()
fetch(columns: Optional[list] = None) Table[source]

Returns the complete table contents.


>>> table = rh.table(data) >>> fomratted_data = table.fetch()
read_table_from_file(columns: Optional[list] = None)[source]

Read a table from it’s path.


>>> table = rh.table(path="path/to/table") >>> table_data = table.read_table_from_file()
rm(recursive: bool = True)[source]

Delete table, including its partitioned files where relevant.


>>> table = rh.table(path="path/to/table") >>> table.rm()
stream(batch_size: int, drop_last: bool = False, shuffle_seed: Optional[int] = None, shuffle_buffer_size: Optional[int] = None, prefetch_batches: Optional[int] = None)[source]

Return a local batched iterator over the ray dataset.


>>> table = rh.table(data) >>> batches = >>> for _, batch in batches: >>> print(batch)
to(system, path=None, data_config=None)[source]

Copy and return the table on the given filesystem and path.


>>> local_table = rh.table(data, path="local/path") >>> s3_table ="s3") >>> cluster_table =

Write underlying table data to fsspec URL.


>>> rh.table(data, path="path/to/write").write()