v0.0.12
version. Click here to see docs for the latest stable version.A Table is a Runhouse primitive used for abstracting a particular tabular data storage configuration.
Constructs a Table object, which can be used to interact with the table at the given path.
data – Data to be stored in the table.
name (Optional[str]) – Name for the table, to reuse it later on.
path (Optional[str]) – Full path to the data file.
system (Optional[str]) – File system. Currently this must be one of:
[file
, github
, sftp
, ssh
, s3
, gs
, azure
].
data_config (Optional[dict]) – The data config to pass to the underlying fsspec handler.
partition_cols (Optional[list]) – List of columns to partition the table by.
mkdir (bool) – Whether to create a remote folder for the table. (Default: False
)
dryrun (bool) – Whether to create the Table if it doesn’t exist, or load a Table object as a dryrun.
(Default: False
)
stream_format (Optional[str]) – Format to stream the Table as.
Currently this must be one of: [pyarrow
, torch
, tf
, pandas
]
metadata (Optional[dict]) – Metadata to store for the table.
The resulting Table object.
Example
>>> import runhouse as rh >>> # Create and save (pandas) table >>> rh.table( >>> data=data, >>> name="~/my_test_pandas_table", >>> path="table_tests/test_pandas_table.parquet", >>> system="file", >>> mkdir=True, >>> ).save() >>> >>> # Load table from above >>> reloaded_table = rh.table(name="~/my_test_pandas_table")
- __init__(path: str, name: Optional[str] = None, file_name: Optional[str] = None, system: Optional[str] = None, data_config: Optional[dict] = None, dryrun: bool = False, partition_cols: Optional[List] = None, stream_format: Optional[str] = None, metadata: Optional[Dict] = None, **kwargs)[source]
The Runhouse Table object.
Note
To build a Table, please use the factory method
table()
.
Get the table data. If data is not already cached, return a Ray dataset.
With the dataset object we can stream or convert to other types, for example:
data.iter_batches() data.to_pandas() data.to_dask()
Whether the table exists in file system.
Example
>>> table.exists_in_system()
Returns the complete table contents.
Example
>>> table = rh.table(data) >>> fomratted_data = table.fetch()
Read a table from it’s path.
Example
>>> table = rh.table(path="path/to/table") >>> table_data = table.read_table_from_file()
Delete table, including its partitioned files where relevant.
Example
>>> table = rh.table(path="path/to/table") >>> table.rm()
Return a local batched iterator over the ray dataset.
Example
>>> table = rh.table(data) >>> batches = table.stream(batch_size=4) >>> for _, batch in batches: >>> print(batch)
Copy and return the table on the given filesystem and path.
Example
>>> local_table = rh.table(data, path="local/path") >>> s3_table = local_table.to("s3") >>> cluster_table = local_table.to(my_cluster)
Write underlying table data to fsspec URL.
Example
>>> rh.table(data, path="path/to/write").write()