Create a dataset. — init • Rcongas

This function creates a dataset (an object of class rcongasplus) by assembling multiple single-cell input measurements (ATAC and/or RNA data modalities), the input segmentation (from bulk DNA sequencing), and the per-cell normalisation factors for the data.

All input data are passed as tibbles; the input formats are as follows:

for single-cell ATAC/RNA data, the cell identifier, the genomic coordinates (chr, from, to) which refer either to an ATAC peak, or an RNA gene identifier, and a value reporting the reads mapped.
for the input segmentation, the genomic coordinates (chr, from, to) which refer to the segment, and the number of copies (i.e., DNA ploidy) of the segment.
for normalization factors the cell identifier, the actual normalisation_factor and the modality to wihch the factor refers to

This function receives also other parameters - e.g., the models likelihoods - which will determine the overall behaviour of the underlying model, and how data are preared for inference.

A Negative Binomial likelihood ("NB"), which works directly from raw counts data
A Gaussian likelihood ("G"), which requires a z-score transformation of the data. This consists in :
- scaling raw counts by the input normalization factors;
- computing z-scores per cell;
- summing up z-scores per segment;
- computing z-scores per segment;
- center the z-scores mean to the input ploidy.

init(
  rna,
  atac,
  segmentation,
  rna_normalisation_factors = rna %>% auto_normalisation_factor(),
  atac_normalisation_factors = atac %>% auto_normalisation_factor(),
  rna_likelihood = "NB",
  atac_likelihood = "NB",
  reference_genome = "GRCh38",
  description = "(R)CONGAS+ model",
  smooth = FALSE,
  multiome = FALSE
)

Arguments

rna: A tibble with single-cell RNA data.
atac: A tibble with single-cell ATAC data.
segmentation: A tibble with the input segmentation.
rna_normalisation_factors: The RNA tibble with the input per-cell normalisation factors. By default these are computed by function auto_normalisation_factor.
atac_normalisation_factors: The ATAC tibble with the input per-cell normalisation factors. By default these are computed by function auto_normalisation_factor.
rna_likelihood: Type of likelihood used for RNA data ("G" for Gaussian and ""NB for Negative Binomial). The RNA default is "G".
atac_likelihood: Type of likelihood used for ATAC data, with default "NB".
reference_genome: Either "GRCh38" or "hg19".
description: A model in-words description.
smooth: If yes, input segments are smootheed by joining per chromosome segments that have the same ploidy.
mutiome: Default to FALSE. Flag indicating whether the RNA and ATAC observations are the result of a matched RNA-ATAC sequencing assay such as 10x multiome assay. (i.e., there is a 1:1 correspondence between barcodes of the two modalities.)

Value

An object of class rcongasplus

Examples

data("example_input")

# For instance, RNA data
example_input$x_rna %>% print
#> NULL

# .. or ATAC data
example_input$x_atac %>% print
#> NULL

# .. and segmentation
example_input$x_segmentation %>% print
#> NULL

# .. and normalisation factors can be computed (default)
example_input$x_rna %>% auto_normalisation_factor()
#> NULL

x = init(
  rna = example_input$x_rna,
  atac = example_input$x_atac,
  segmentation = example_input$x_segmentation,
  rna_likelihood = "G",
  atac_likelihood = 'NB',
  description = 'My model')
#> Error in init(rna = example_input$x_rna, atac = example_input$x_atac,     segmentation = example_input$x_segmentation, rna_likelihood = "G",     atac_likelihood = "NB", description = "My model"): Cannot have both assays null.

print(x)
#> Error in eval(expr, envir, enclos): object 'x' not found