This function can filter out the clusters computed by MOBSTER based on two criteria: the mixing proportion value, the number of mutations assigned and the variance of the Beta clusters.
For all criteria a scalar should be given as input. The return object will contain only the clusters that pass all filters. If any cluster is dropped the latent variables are re-computed, as well as the clustering assignments and the mixing proportions (all mutations will be still assigned after clusters' removal).
choose_clusters( x, pi_cutoff = 0.02, N_cutoff = 10, Beta_variance_cutoff = 1e-04, verbose = FALSE )
x | A MOBSTER fit object. |
---|---|
pi_cutoff | The cutoff on the mixing proportions, default is 0.02. |
N_cutoff | The cutoff on the number of mutations assigned to a cluster, default is 10. |
Beta_variance_cutoff | Minimum variance for a Beta peak. |
verbose | If outputs should be reported to screen or not, default is no. |
A MOBSTER fit object where clusters are larger than pi_cutoff
and contain
at least N_cutoff
. If no such cluster exists an error is generated.
data('fit_example', package = 'mobster') # Does not change anything (no filter triggered) choose_clusters(fit_example$best)#>#> C1], 31% [Tail], and 14% [C2], with π > 0.#>#> C1 [n = 2784, 55%] with mean = 0.48.#> C2 [n = 846, 14%] with mean = 0.15.#> ℹ Score(s): NLL = -5671.5; ICL = -10359.09 (-11266.35), H = 907.26 (0). Fit #> converged by MM in 75 steps.# Remove one Beta component because it has less than 100 points (renders the fit very poor) choose_clusters(fit_example$best, N_cutoff = 100)#> [ MOBSTER ] My MOBSTER model n = 5000 with k = 2 Beta(s) and a tail ─────────#> C1], 31% [Tail], and 14% [C2], with π > 0.#>#> C1 [n = 2784, 55%] with mean = 0.48.#> C2 [n = 846, 14%] with mean = 0.15.#> ℹ Score(s): NLL = -5671.5; ICL = -10359.09 (-11266.35), H = 907.26 (0). Fit #> converged by MM in 75 steps.