This function creates a dataset (an object of class rcongasplus
) by assembling multiple single-cell input measurements
(ATAC and/or RNA data modalities), the input segmentation (from bulk DNA sequencing),
and the per-cell normalisation factors for the data.
All input data are passed as tibbles; the input formats are as follows:
for single-cell ATAC/RNA data, the cell
identifier, the genomic coordinates
(chr
, from
, to
) which refer either to an ATAC peak, or an RNA gene
identifier, and a value
reporting the reads mapped.
for the input segmentation, the genomic coordinates
(chr
, from
, to
) which refer to the segment, and the number of
copies
(i.e., DNA ploidy) of the segment.
for normalization factors the cell
identifier, the actual normalisation_factor
and the modality
to wihch the factor refers to
This function receives also other parameters - e.g., the models likelihoods - which will determine the overall behaviour of the underlying model, and how data are preared for inference.
A Negative Binomial likelihood ("NB"
), which works directly from raw counts data
A Gaussian likelihood ("G"
), which requires a z-score transformation of the data. This consists
in :
scaling raw counts by the input normalization factors;
computing z-scores per cell;
summing up z-scores per segment;
computing z-scores per segment;
center the z-scores mean to the input ploidy.
init(
rna,
atac,
segmentation,
rna_normalisation_factors = rna %>% auto_normalisation_factor(),
atac_normalisation_factors = atac %>% auto_normalisation_factor(),
rna_likelihood = "NB",
atac_likelihood = "NB",
reference_genome = "GRCh38",
description = "(R)CONGAS+ model",
smooth = FALSE,
multiome = FALSE
)
A tibble with single-cell RNA data.
A tibble with single-cell ATAC data.
A tibble with the input segmentation.
The RNA tibble with the input per-cell normalisation factors.
By default these are computed by function auto_normalisation_factor
.
The ATAC tibble with the input per-cell normalisation factors.
By default these are computed by function auto_normalisation_factor
.
Type of likelihood used for RNA data ("G"
for Gaussian and
""NB
for Negative Binomial). The RNA default is "G"
.
Type of likelihood used for ATAC data, with default "NB"
.
Either "GRCh38"
or "hg19"
.
A model in-words description.
If yes, input segments are smootheed by joining per chromosome segments that have the same ploidy.
Default to FALSE. Flag indicating whether the RNA and ATAC observations are the result of a matched RNA-ATAC sequencing assay such as 10x multiome assay. (i.e., there is a 1:1 correspondence between barcodes of the two modalities.)
An object of class rcongasplus
data("example_input")
# For instance, RNA data
example_input$x_rna %>% print
#> NULL
# .. or ATAC data
example_input$x_atac %>% print
#> NULL
# .. and segmentation
example_input$x_segmentation %>% print
#> NULL
# .. and normalisation factors can be computed (default)
example_input$x_rna %>% auto_normalisation_factor()
#> NULL
x = init(
rna = example_input$x_rna,
atac = example_input$x_atac,
segmentation = example_input$x_segmentation,
rna_likelihood = "G",
atac_likelihood = 'NB',
description = 'My model')
#> Error in init(rna = example_input$x_rna, atac = example_input$x_atac, segmentation = example_input$x_segmentation, rna_likelihood = "G", atac_likelihood = "NB", description = "My model"): Cannot have both assays null.
print(x)
#> Error in eval(expr, envir, enclos): object 'x' not found