After mapping counts data to segments, this function can be used to determine quantiles of mapped data, and identify outliers in each segment and modality.
An outlier can then be removed or capped to the median cell value.
The former option introduced 0-counts in the data, which we suggest
to check with the stat
function, and possibly remove by using
the filter_missing_data
function. Removal can be important as
an excess of 0-counts cells (missing data) will drive the fit
to use 0-mean components.
Capping does not introduce any 0-count cell, and is the suggested choice. The capped values is either a count value or a z-score, depending on the modality type of likelihood.
In both cases pre-filtering normalisation factors are no longer adequate
after filtering, and have to be recomputed. If the modality adopts a
Gaussian likelihood this is not a problem, since those are set to 1
when the object is created, and remain 1 afterwards. In the case of counts
based likelihood like Negative Binomials these are re-computed for all
input cells by using the auto_normalisation_factor
function.
Therefore, if custom factors have been computing this function might affect the general signal in the data, and factors should be handled explicitly by the user.
The function requires and returns an (R)CONGAS+ object.
After mapping counts data to segments, this function can be used to determine cells with missing data, and remove them The function requires and returns an (R)CONGAS+ object.
This filter works by a proportion, as reported by the stat
function.
If these cells are not removed, during inference missing values
are imputed to be 0
. This can create an excess of mixture components
fitting 0-counts data.
filter_missing_data(x, proportion_RNA = 0.05, proportion_ATAC = 0.05)
An rcongasplus
object.
The RNA proportion cut for a cell to be removed, default 5%.
The ATAC proportion cut for a cell to be removed, default 5%.
The lower quantile, default 1%.
The upper quantile, default 99%.
If "remove"
, outliers will be set to 0. If "cap"
,
outliers will be capped at the median per-cell counts.
The object x
where outliers have been identified and removec
or capped according to the parameters.
The object x
where 0-counts cells have been removed.