This function creates a dataset (an object of class rcongasplus) by assembling multiple single-cell input measurements (ATAC and/or RNA data modalities), the input segmentation (from bulk DNA sequencing), and the per-cell normalisation factors for the data.

All input data are passed as tibbles; the input formats are as follows:

  • for single-cell ATAC/RNA data, the cell identifier, the genomic coordinates (chr, from, to) which refer either to an ATAC peak, or an RNA gene identifier, and a value reporting the reads mapped.

  • for the input segmentation, the genomic coordinates (chr, from, to) which refer to the segment, and the number of copies (i.e., DNA ploidy) of the segment.

  • for normalization factors the cell identifier, the actual normalisation_factor and the modality to wihch the factor refers to

This function receives also other parameters - e.g., the models likelihoods - which will determine the overall behaviour of the underlying model, and how data are preared for inference.

  • A Negative Binomial likelihood ("NB"), which works directly from raw counts data

  • A Gaussian likelihood ("G"), which requires a z-score transformation of the data. This consists in :

    • scaling raw counts by the input normalization factors;

    • computing z-scores per cell;

    • summing up z-scores per segment;

    • computing z-scores per segment;

    • center the z-scores mean to the input ploidy.

init(
  rna,
  atac,
  segmentation,
  rna_normalisation_factors = rna %>% auto_normalisation_factor(),
  atac_normalisation_factors = atac %>% auto_normalisation_factor(),
  rna_likelihood = "NB",
  atac_likelihood = "NB",
  reference_genome = "GRCh38",
  description = "(R)CONGAS+ model",
  smooth = FALSE,
  out.rm = T
)

Arguments

rna

A tibble with single-cell RNA data.

atac

A tibble with single-cell ATAC data.

segmentation

A tibble with the input segmentation.

rna_normalisation_factors

The RNA tibble with the input per-cell normalisation factors. By default these are computed by function auto_normalisation_factor.

atac_normalisation_factors

The ATAC tibble with the input per-cell normalisation factors. By default these are computed by function auto_normalisation_factor.

rna_likelihood

Type of likelihood used for RNA data ("G" for Gaussian and ""NB for Negative Binomial). The RNA default is "G".

atac_likelihood

Type of likelihood used for ATAC data, with default "NB".

reference_genome

Either "GRCh38" or "hg19".

description

A model in-words description.

smooth

If yes, input segments are smootheed by joining per chromosome segments that have the same ploidy.

Value

An object of class rcongasplus

Examples

data("example_input")

# For instance, RNA data
example_input$x_rna %>% print
#> NULL

# .. or ATAC data
example_input$x_atac %>% print
#> NULL

# .. and segmentation
example_input$x_segmentation %>% print
#> NULL

# .. and normalisation factors can be computed (default)
example_input$x_rna %>% auto_normalisation_factor()
#> NULL

x = init(
  rna = example_input$x_rna,
  atac = example_input$x_atac,
  segmentation = example_input$x_segmentation,
  rna_likelihood = "G",
  atac_likelihood = 'NB',
  description = 'My model')
#> Error in init(rna = example_input$x_rna, atac = example_input$x_atac,     segmentation = example_input$x_segmentation, rna_likelihood = "G",     atac_likelihood = "NB", description = "My model"): Cannot have both assays null.

print(x)
#> Error in eval(expr, envir, enclos): object 'x' not found