After mapping counts data to segments, this function can be used to determine quantiles of mapped data, and identify outliers in each segment and modality.

An outlier is an entry for a cell/segment pair; with this function we compute how often a certain cell is marked as an outlier. Then the function removes cells that are flagged as containing outliers more then a certain input cutoff. This helps picking up which cells are often showing counts that seem deviating from the main signal in the data. In case of multiome data, cells flagged as outliers in one modality are also removed from the other modality.

The function requires and returns an (R)CONGAS+ object.

filter_outliers(
  x,
  frequency_cutoff = 0.2 * stat(x)$nsegments,
  lower_quantile = 0.03,
  upper_quantile = 0.97
)

Arguments

x

An rcongasplus object.

frequency_cutoff

The cutoff to determine if a cell should be removed or not from the data because it has too many outliers. By default, this cut is 20% of the input number of segments.

lower_quantile

The lower quantile, default 3%.

upper_quantile

The upper quantile, default 97%.

Value

The object x where outlier cells have been identified and removed.

Examples

data('example_object')

# Default
print(example_object)
#> ── [ (R)CONGAS+ ] SU008 TUMOR 30 segments (73.66% genome) ──────────────────────
#> 
#> ── CNA segments (reference: GRCh38) 
#> → Input 30 CNA segments, mean ploidy 3.2.
#> 
#> 	 | | |  | | | |  | | | |  | | |  | |  |   |  |  | 
#> 
#> 	 Ploidy:    0     1     2     3     4     5     *   
#> 
#> ── Modalities 
#> → RNA: 714 cells with 8613 mapped genes, 1401728 non-zero values. Likelihood: Negative Binomial.
#> → ATAC: 259 cells with 284316 mapped peaks, 3083691 non-zero values. Likelihood: Negative Binomial.
#> ! Clusters: not available.
#> 
#> ──  LOG  ──
#> 
#> - 2021-03-30 17:58:41 Created input object.
#> - 2021-03-30 17:58:43 Filtered outliers: [6|0.05|0.95]
#> [1] 0

example_object %>% 
  filter_outliers() %>% 
  print()
#> ── RNA outliers detection via quantiles: lower 0.03, upper 0.97. 
#> → Normalising RNA counts using input normalisation factors.
#>18 out of 714 will be removed (3%)
#> 
#> ── ATAC outliers detection via quantiles: lower 0.03, upper 0.97. 
#> → Normalising ATAC counts using input normalisation factors.
#>10 out of 259 will be removed (4%)
#> Error in if (x$input$multiome) {    cli::cli_h3("Multiome data: removing outliers from both modalities.")    multiome_remove = x$input$dataset %>% filter(cell %in% to_remove) %>%         pull(multiome_barcode) %>% unique    nrem = length(multiome_remove)    tot = length(unique(x$input$dataset$multiome_barcode))    nprop = ((nrem/tot) * 100) %>% round    cli::cli_alert("MULTIOME: after pairing RNA and ATAC outliers, {.field {nrem}} out of {.field {tot}} will be removed ({.field {nprop}%})")    x$input$dataset = x$input$dataset %>% filter(!(multiome_barcode %in%         !!multiome_remove))    x$input$normalisation = x$input$normalisation %>% filter(!(multiome_barcode %in%         !!multiome_remove))} else {    x$input$dataset = x$input$dataset %>% filter(cell %in% retained)    x$input$normalisation = x$input$normalisation %>% filter(cell %in%         retained)}: argument is of length zero

example_object %>% 
  filter_outliers(, action = 'remove') %>% 
  print()
#> Error in filter_outliers(., , action = "remove"): unused argument (action = "remove")