Skip to contents

Write a Seurat object to a SOMA

Usage

# S3 method for class 'Seurat'
write_soma(
  x,
  uri,
  ...,
  ingest_mode = "write",
  platform_config = NULL,
  tiledbsoma_ctx = NULL
)

Arguments

x

A Seurat object.

uri

URI for resulting SOMA object.

...

Arguments passed to other methods

ingest_mode

Ingestion mode when creating the SOMA; choose from:

  • write”: create a new SOMA and error if it already exists.

  • resume”: attempt to create a new SOMA; if it already exists, simply open it for writing.

platform_config

Optional platform configuration.

tiledbsoma_ctx

Optional SOMATileDBContext.

Value

The URI to the resulting SOMAExperiment generated from the data contained in x.

Writing Cell-Level Metadata

Cell-level metadata is written out as a data frame called “obs” at the experiment level.

Writing v3 Assays

Seurat Assay objects are written out as individual measurements:

  • the “data” matrix is written out as a sparse array called “data” within the “X” group.

  • the “counts” matrix, if not empty, is written out as a sparse array called “counts” within the “X” group.

  • the “scale.data” matrix, if not empty, is written out as a sparse array called “scale_data” within the “X” group.

  • feature-level metadata is written out as a data frame called “var”.

Expression matrices are transposed (cells as rows) prior to writing. All other slots, including results from extended assays (eg. SCTAssay, ChromatinAssay) are lost.

Performance Considerations

Ingestion of very large dense layers, such as scale.data, can be memory intensive. For better performance, users can remove these layers prior to ingestion and regenerate them after export, or ingest them separately as dense arrays for those who need to persist the exact matrix


# Using SeuratObject v5 syntax on a v3 `Assay`
# Cache the layer for separate ingestion, skip if planning to regenerate
mat <- object[["ASSAY"]]$scale.data

# Remove the `scale.data` layer
object[["ASSAY"]]$scale.data <- NULL

# Ingest the smaller object
uri <- write_soma(object, "/path/to/soma")

# Ingest the `scale.data` layer densely; needed only if persistence
# of the data is paramount
# Pad the `scale.data` layer so that its soma join IDs match the experiment
padded <- matrix(
  data = vector("numeric", length = prod(dim(object[["ASSAY"]]))),
  nrow = nrow(object[["ASSAY"]]),
  ncol = ncol(object[["ASSAY"]])
)
rowidx <- match(rownames(mat), rownames(object[["ASSAY"]]))
colidx <- match(colnames(mat), colnames(object[["ASSAY"]]))
padded[rowidx, colidx] <- mat

# Use `write_soma()` to ingest densely and register it within the `uns`
# collection; this may need to be created manually if the original
# object does not contain command logs
exp <- SOMAExperimentOpen(uri, "WRITE")
if (!match("uns", exp$names(), nomatch = 0L)) {
  # For `tiledb://` URIs, set the URI for the new collection manually rather
  # than relying on `file.path()`
  uns <- SOMACollectionCreate(file.path(exp$uri, "uns"))
  exp$add_new_collection(uns, "uns")
}
arr <- write_soma(
  padded,
  "scale_data",
  soma_parent = exp$get("uns"),
  sparse = FALSE,
  key = "scale_data"
)
arr$close()
exp$close()

Please note that dense arrays cannot be read in using the SOMAExperimentAxisQuery mechanism; use SOMADenseNDArray$read_dense_matrix, remembering to transpose before adding back to a Seurat object

Writing v5 Assays

Seurat v5 Assayss are written out as individual measurements:

  • the layer matrices are written out as sparse arrays within the “X” group.

  • feature-level metadata is written out as a data frame called “var”.

Expression matrices are transposed (cells as rows) prior to writing. All other slots, including results from extended assays (eg. SCTAssay, ChromatinAssay) are lost.
The following bits of metadata are written in various parts of the measurement

  • soma_ecosystem_seurat_assay_version”: written at the measurement level; indicates the Seurat assay version. Set to “v5”.

  • soma_ecosystem_seurat_v5_default_layers”: written at the “X” group level; indicates the default layers.

  • soma_ecosystem_seurat_v5_ragged”: written at the “X/<layer>” array level; with a value of “ragged”, indicates whether or not the layer is ragged.

  • soma_r_type_hint”: written at the “X/<layer>” array level; indicates the R class and defining package (for S4 classes) of the original layer.

Writing DimReducs

Seurat DimReduc objects are written out to the “obsm” and “varm” groups of a measurement:

  • cell embeddings are written out as a sparse matrix in the “obsm” group.

  • feature loadings, if not empty, are written out as a sparse matrix in the “varm” groups; loadings are padded with NAs to include all features.

Dimensional reduction names are translated to AnnData-style names (eg. “pca” becomes X_pca for embeddings and “PCs” for loadings). All other slots, including projected feature loadings and jackstraw information, are lost.

Writing Graphs

Seurat Graph objects are written out as sparse matrices to the “obsp” group of a measurement.

Writing SeuratCommands

Seurat command logs are written out as data frames to the “seurat_commands” group of a collection.

Examples

# \donttest{
uri <- withr::local_tempfile(pattern = "pbmc-small")

data("pbmc_small", package = "SeuratObject")
suppressWarnings(pbmc_small <- SeuratObject::UpdateSeuratObject(pbmc_small))
#> Validating object structure
#> Updating object slots
#> Ensuring keys are in the proper structure
#> Updating matrix keys for DimReduc ‘pca’
#> Updating matrix keys for DimReduc ‘tsne’
#> Ensuring keys are in the proper structure
#> Ensuring feature names don't have underscores or pipes
#> Updating slots in RNA
#> Updating slots in RNA_snn
#> Setting default assay of RNA_snn to RNA
#> Updating slots in pca
#> Updating slots in tsne
#> Setting tsne DimReduc to global
#> Setting assay used for NormalizeData.RNA to RNA
#> Setting assay used for ScaleData.RNA to RNA
#> Setting assay used for RunPCA.RNA to RNA
#> Setting assay used for BuildSNN.RNA.pca to RNA
#> No assay information could be found for FindClusters
#> Setting assay used for RunTSNE.pca to RNA
#> Setting assay used for JackStraw.RNA.pca to RNA
#> Setting assay used for ScoreJackStraw.pca to RNA
#> Setting assay used for ProjectDim.RNA.pca to RNA
#> Setting assay used for FindVariableFeatures.RNA to RNA
#> Validating object structure for Assay ‘RNA’
#> Validating object structure for Graph ‘RNA_snn’
#> Validating object structure for DimReduc ‘pca’
#> Validating object structure for DimReduc ‘tsne’
#> Object representation is consistent with the most current Seurat version

uri <- write_soma(pbmc_small, uri)

(exp <- SOMAExperimentOpen(uri))
#> <SOMAExperiment>
#>   uri: /tmp/RtmpzG0UPS/pbmc-small29d238f5ec53
exp$obs
#> <SOMADataFrame>
#>   uri: file:///tmp/RtmpzG0UPS/pbmc-small29d238f5ec53/obs
#>   dimensions: soma_joinid 
#>   attributes: orig.ident, nCount_RNA, nFeature_RNA, RNA_snn_res.0.8, letter.idents, groups,... 
exp$get("uns")$get("seurat_commands")$names()
#>  [1] "BuildSNN.RNA.pca"         "FindClusters"            
#>  [3] "FindVariableFeatures.RNA" "JackStraw.RNA.pca"       
#>  [5] "NormalizeData.RNA"        "ProjectDim.RNA.pca"      
#>  [7] "RunPCA.RNA"               "RunTSNE.pca"             
#>  [9] "ScaleData.RNA"            "ScoreJackStraw.pca"      
(ms <- exp$ms$get("RNA"))
#> <SOMAMeasurement>
#>   uri: file:///tmp/RtmpzG0UPS/pbmc-small29d238f5ec53/ms/RNA
ms$var
#> <SOMADataFrame>
#>   uri: file:///tmp/RtmpzG0UPS/pbmc-small29d238f5ec53/ms/RNA/var
#>   dimensions: soma_joinid 
#>   attributes: vst.mean, vst.variance, vst.variance.expected, vst.variance.standardized, vst... 
ms$X$names()
#> [1] "counts"     "data"       "scale_data"
ms$obsm$names()
#> [1] "X_pca"  "X_tsne"
ms$varm$names()
#> [1] "PCs"
ms$obsp$names()
#> [1] "RNA_snn"

exp$close()
# }