After mapping counts data to segments, this function can be used to determine quantiles of mapped data, and identify outliers in each segment and modality.
An outlier can then be removed or capped to the median cell value.
The former option introduced 0-counts in the data, which we suggest
to check with the
stat function, and possibly remove by using
filter_missing_data function. Removal can be important as
an excess of 0-counts cells (missing data) will drive the fit
to use 0-mean components.
Capping does not introduce any 0-count cell, and is the suggested choice. The capped values is either a count value or a z-score, depending on the modality type of likelihood.
In both cases pre-filtering normalisation factors are no longer adequate
after filtering, and have to be recomputed. If the modality adopts a
Gaussian likelihood this is not a problem, since those are set to 1
when the object is created, and remain 1 afterwards. In the case of counts
based likelihood like Negative Binomials these are re-computed for all
input cells by using the
Therefore, if custom factors have been computing this function might affect the general signal in the data, and factors should be handled explicitly by the user.
The function requires and returns an (R)CONGAS+ object.
After mapping counts data to segments, this function can be used to determine cells with missing data, and remove them The function requires and returns an (R)CONGAS+ object.
This filter works by a proportion, as reported by the
If these cells are not removed, during inference missing values
are imputed to be
0. This can create an excess of mixture components
fitting 0-counts data.
filter_missing_data(x, proportion_RNA = 0.05, proportion_ATAC = 0.05)
The RNA proportion cut for a cell to be removed, default 5%.
The ATAC proportion cut for a cell to be removed, default 5%.
The lower quantile, default 1%.
The upper quantile, default 99%.
"remove", outliers will be set to 0. If
outliers will be capped at the median per-cell counts.
x where outliers have been identified and removec
or capped according to the parameters.
x where 0-counts cells have been removed.