SOMADataFrame

SOMADataFrame is a multi-column table that must contain a column called soma_joinid of type int64, which contains a unique value for each row and is intended to act as a join key for other objects, such as SOMASparseNDArray. (lifecycle: maturing)

Super classes

tiledbsoma::TileDBObject -> tiledbsoma::TileDBArray -> tiledbsoma::SOMAArrayBase -> SOMADataFrame

Methods

Public methods

SOMADataFrame$create()
SOMADataFrame$write()
SOMADataFrame$read()
SOMADataFrame$update()
SOMADataFrame$levels()
SOMADataFrame$shape()
SOMADataFrame$maxshape()
SOMADataFrame$domain()
SOMADataFrame$maxdomain()
SOMADataFrame$tiledbsoma_has_upgraded_domain()
SOMADataFrame$tiledbsoma_resize_soma_joinid_shape()
SOMADataFrame$tiledbsoma_upgrade_domain()
SOMADataFrame$change_domain()
SOMADataFrame$clone()

Inherited methods

Method `create()`

Create (lifecycle: maturing)

Usage

SOMADataFrame$create(
  schema,
  index_column_names = c("soma_joinid"),
  domain = NULL,
  platform_config = NULL,
  internal_use_only = NULL
)

Arguments

schema: an arrow::schema.
index_column_names: A vector of column names to use as user-defined index columns. All named columns must exist in the schema, and at least one index column name is required.
domain: An optional list specifying the domain of each index column. Each slot in the list must have its name being the name of an index column, and its value being be a length-two vector consisting of the minimum and maximum values storable in the index column. For example, if there is a single int64-valued index column soma_joinid, then domain might be list(soma_joinid=c(100, 200)) to indicate that values between 100 and 200, inclusive, can be stored in that column. If provided, this sequence must have the same length as index_column_names, and the index-column domain will be as specified. If omitted entirely, or if NULL in a given dimension, the corresponding index-column domain will use an empty range, and data writes after that will fail with "A range was set outside of the current domain". Unless you have a particular reason not to, you should always provide the desired domain at create time: this is an optional but strongly recommended parameter. See also change_domain which allows you to expand the domain after create.
platform_config: A platform configuration object
internal_use_only: Character value to signal this is a 'permitted' call, as create() is considered internal and should not be called directly.

Method `write()`

Write (lifecycle: maturing)

Usage

SOMADataFrame$write(values)

Arguments

values: An arrow::Table or arrow::RecordBatch containing all columns, including any index columns. The schema for values must match the schema for the SOMADataFrame.

Method `read()`

Read (lifecycle: maturing) Read a user-defined subset of data, addressed by the dataframe indexing column, and optionally filtered.

Usage

SOMADataFrame$read(
  coords = NULL,
  column_names = NULL,
  value_filter = NULL,
  result_order = "auto",
  iterated = FALSE,
  log_level = "auto"
)

Arguments

coords: Optional named list of indices specifying the rows to read; each (named) list element corresponds to a dimension of the same name.
column_names: Optional character vector of column names to return.
value_filter: Optional string containing a logical expression that is used to filter the returned values. See tiledb::parse_query_condition for more information.
result_order: Optional order of read results. This can be one of either "ROW_MAJOR, "COL_MAJOR", or "auto"` (default).
iterated: Option boolean indicated whether data is read in call (when FALSE, the default value) or in several iterated steps.
log_level: Optional logging level with default value of "warn".

Returns

arrow::Table or TableReadIter

Method `update()`

Update (lifecycle: maturing)

Usage

SOMADataFrame$update(values, row_index_name = NULL)

Arguments

values: A data.frame, arrow::Table, or arrow::RecordBatch.
row_index_name: An optional scalar character. If provided, and if the values argument is a data.frame with row names, then the row names will be extracted and added as a new column to the data.frame prior to performing the update. The name of this new column will be set to the value specified by row_index_name.

Details

Update the existing SOMADataFrame to add or remove columns based on the input:

columns present in the current the SOMADataFrame but absent from the new values will be dropped
columns absent in current SOMADataFrame but present in the new values will be added
any columns present in both will be left alone, with the exception that if values has a different type for the column, the entire update will fail because attribute types cannot be changed.

Furthermore, values must contain the same number of rows as the current SOMADataFrame.

Method `levels()`

Get the levels for an enumerated (factor) column

Usage

SOMADataFrame$levels(column_names = NULL, simplify = TRUE)

Arguments

column_names: Optional character vector of column names to pull enumeration levels for; defaults to all enumerated columns
simplify: Simplify the result down to a vector or matrix

Returns

If simplify returns one of the following:

a vector of there is only one enumerated column
a matrix if there are multiple enumerated columns with the same number of levels
a named list if there are multiple enumerated columns with differing numbers of levels

Otherwise, returns a named list

Method `shape()`

Retrieve the shape; as SOMADataFrames are shapeless, simply raises an error

Usage

SOMADataFrame$shape()

Returns

None, instead a .NotYetImplemented() error is raised

Method `maxshape()`

Retrieve the maxshape; as SOMADataFrames are shapeless, simply raises an error

Usage

SOMADataFrame$maxshape()

Returns

None, instead a .NotYetImplemented() error is raised

Method `domain()`

Returns a named list of minimum/maximum pairs, one per index column, currently storable on each index column of the dataframe. These can be resized up to maxdomain. (lifecycle: maturing)

Usage

SOMADataFrame$domain()

Returns

Named list of minimum/maximum values.

Method `maxdomain()`

Returns a named list of minimum/maximum pairs, one per index column, which are the limits up to which the dataframe can have its domain resized. (lifecycle: maturing)

Usage

SOMADataFrame$maxdomain()

Returns

Named list of minimum/maximum values.

Method `tiledbsoma_has_upgraded_domain()`

Returns TRUE if the array has the upgraded resizeable domain feature from TileDB-SOMA 1.15: the array was created with this support, or it has had upgrade_domain applied to it. (lifecycle: maturing)

Usage

SOMADataFrame$tiledbsoma_has_upgraded_domain()

Returns

Logical

Method `tiledbsoma_resize_soma_joinid_shape()`

Increases the shape of the dataframe on the soma_joinid index column, if it indeed is an index column, leaving all other index columns as-is. If the soma_joinid is not an index column, no change is made. This is a special case of upgrade_domain (WIP for 1.15), but simpler to keystroke, and handles the most common case for dataframe domain expansion. Raises an error if the dataframe doesn't already have a domain: in that case please call tiledbsoma_upgrade_domain (WIP for 1.15).

Usage

SOMADataFrame$tiledbsoma_resize_soma_joinid_shape(new_shape)

Arguments

new_shape: An integer, greater than or equal to 1 + the soma_joinid domain slot.

Returns

No return value

Method `tiledbsoma_upgrade_domain()`

Allows you to set the domain of a SOMADataFrame, when the SOMADataFrame does not have a domain set yet. The argument must be a tuple of pairs of low/high values for the desired domain, one pair per index column. For string index columns, you must offer the low/high pair as ("", ""), or as NULL. If check_only is True, returns whether the operation would succeed if attempted, and a reason why it would not. The domain being requested must be contained within what maxdomain returns.

Usage

SOMADataFrame$tiledbsoma_upgrade_domain(new_domain, check_only = FALSE)

Arguments

new_domain: A named list, keyed by index-column name, with values being two-element vectors containing the desired lower and upper bounds for the domain.
check_only: If true, does not apply the operation, but only reports whether it would have succeeded.

Returns

No return value if check_only is FALSE. If check_only is TRUE, returns the empty string if no error is detected, else a description of the error.

Method `change_domain()`

Allows you to set the domain of a SOMADataFrame, when the SOMADataFrame already has a domain set yet. The argument must be a tuple of pairs of low/high values for the desired domain, one pair per index column. For string index columns, you must offer the low/high pair as ("", ""), or as NULL. If check_only is True, returns whether the operation would succeed if attempted, and a reason why it would not. The return value from domain must be contained within the requested new_domain, and the requested new_domain must be contained within the return value from maxdomain. (lifecycle: maturing)

Usage

SOMADataFrame$change_domain(new_domain, check_only = FALSE)

Arguments

new_domain: A named list, keyed by index-column name, with values being two-element vectors containing the desired lower and upper bounds for the domain.
check_only: If true, does not apply the operation, but only reports whether it would have succeeded.

Returns

No return value if check_only is FALSE. If check_only is TRUE, returns the empty string if no error is detected, else a description of the error.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

SOMADataFrame$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Super classes

Methods

Public methods

Method create()

Usage

Arguments

Method write()

Usage

Arguments

Method read()

Usage

Arguments

Returns

Method update()

Usage

Arguments

Details

Method levels()

Usage

Arguments

Returns

Method shape()

Usage

Returns

Method maxshape()

Usage

Returns

Method domain()

Usage

Returns

Method maxdomain()

Usage

Returns

Method tiledbsoma_has_upgraded_domain()

Usage

Returns

Method tiledbsoma_resize_soma_joinid_shape()

Usage

Arguments

Returns

Method tiledbsoma_upgrade_domain()

Usage

Arguments

Returns

Method change_domain()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Method `create()`

Method `write()`

Method `read()`

Method `update()`

Method `levels()`

Method `shape()`

Method `maxshape()`

Method `domain()`

Method `maxdomain()`

Method `tiledbsoma_has_upgraded_domain()`

Method `tiledbsoma_resize_soma_joinid_shape()`

Method `tiledbsoma_upgrade_domain()`

Method `change_domain()`

Method `clone()`