You are viewing v0.0.12 version. Click here to see docs for the latest stable version.

Table

A Table is a Runhouse primitive used for abstracting a particular tabular data storage configuration.

Table Factory Method

runhouse.table(data=None, name: Optional[str] = None, path: Optional[str] = None, system: Optional[str] = None, data_config: Optional[dict] = None, partition_cols: Optional[list] = None, mkdir: bool = False, dryrun: bool = False, stream_format: Optional[str] = None, metadata: Optional[dict] = None) → Table[source]

Constructs a Table object, which can be used to interact with the table at the given path.

Parameters:

data – Data to be stored in the table.
name (Optional[str]) – Name for the table, to reuse it later on.
path (Optional[str]) – Full path to the data file.
system (Optional[str]) – File system. Currently this must be one of: [file, github, sftp, ssh, s3, gs, azure].
data_config (Optional[dict]) – The data config to pass to the underlying fsspec handler.
partition_cols (Optional[list]) – List of columns to partition the table by.
mkdir (bool) – Whether to create a remote folder for the table. (Default: False)
dryrun (bool) – Whether to create the Table if it doesn’t exist, or load a Table object as a dryrun. (Default: False)
stream_format (Optional[str]) – Format to stream the Table as. Currently this must be one of: [pyarrow, torch, tf, pandas]
metadata (Optional[dict]) – Metadata to store for the table.

Returns:

The resulting Table object.

Return type:

Table

Example

>>> import runhouse as rh
>>> # Create and save (pandas) table
>>> rh.table(
>>>    data=data,
>>>    name="~/my_test_pandas_table",
>>>    path="table_tests/test_pandas_table.parquet",
>>>    system="file",
>>>    mkdir=True,
>>> ).save()
>>>
>>> # Load table from above
>>> reloaded_table = rh.table(name="~/my_test_pandas_table")

Table Class

class runhouse.Table(path: str, name: Optional[str] = None, file_name: Optional[str] = None, system: Optional[str] = None, data_config: Optional[dict] = None, dryrun: bool = False, partition_cols: Optional[List] = None, stream_format: Optional[str] = None, metadata: Optional[Dict] = None, **kwargs)[source]

__init__(path: str, name: Optional[str] = None, file_name: Optional[str] = None, system: Optional[str] = None, data_config: Optional[dict] = None, dryrun: bool = False, partition_cols: Optional[List] = None, stream_format: Optional[str] = None, metadata: Optional[Dict] = None, **kwargs)[source]

The Runhouse Table object.

Note

To build a Table, please use the factory method table().

property data: Dataset

Get the table data. If data is not already cached, return a Ray dataset.

With the dataset object we can stream or convert to other types, for example:

data.iter_batches()
data.to_pandas()
data.to_dask()

exists_in_system()[source]

Whether the table exists in file system.

Example

>>> table.exists_in_system()

fetch(columns: Optional[list] = None) → Table[source]

Returns the complete table contents.

Example

>>> table = rh.table(data)
>>> fomratted_data = table.fetch()

read_table_from_file(columns: Optional[list] = None)[source]

Read a table from it’s path.

Example

>>> table = rh.table(path="path/to/table")
>>> table_data = table.read_table_from_file()

rm(recursive: bool = True)[source]

Delete table, including its partitioned files where relevant.

Example

>>> table = rh.table(path="path/to/table")
>>> table.rm()

stream(batch_size: int, drop_last: bool = False, shuffle_seed: Optional[int] = None, shuffle_buffer_size: Optional[int] = None, prefetch_batches: Optional[int] = None)[source]

Return a local batched iterator over the ray dataset.

Example

>>> table = rh.table(data)
>>> batches = table.stream(batch_size=4)
>>> for _, batch in batches:
>>>     print(batch)

to(system, path=None, data_config=None)[source]

Copy and return the table on the given filesystem and path.

Example

>>> local_table = rh.table(data, path="local/path")
>>> s3_table = local_table.to("s3")
>>> cluster_table = local_table.to(my_cluster)

write()[source]

Write underlying table data to fsspec URL.

Example

>>> rh.table(data, path="path/to/write").write()

Previous
Folder

Next
Blob