Variational fit for Binomial mixtures — variational

Variational fit for a semi-parametric Dirichelt mixture of Binomial distributions. The fit convergency can be monitored through the ELBO, can be run either sequentially (single core) or in parallel. You need to provide an upper bound on the number of clusters that you want to obtain, through parameters K. You can explicit the Dirichlet prior for the concentration of the mixture (alpha_0), as well as the hyperparmeters of the Beta priors for each mixture component.

variational_fit(
  x,
  y,
  data = NULL,
  K = 10,
  alpha_0 = 1e-06,
  a_0 = 1,
  b_0 = 1,
  max_iter = 5000,
  epsilon_conv = 1e-10,
  samples = 10,
  q_init = "prior",
  trace = FALSE,
  description = "My VIBER model"
)

Arguments

x: A matrix where each column is a dimension of the multivariate Binomial, and each row is an input point. Values of this matrix represent the number of successes of independent Bernoulli trials. This matrix and y should have the same dimension (N x K, N points, K dimensions).
y: A matrix where each column is a dimension of the multivariate Binomial, and each row is an input point. Values of this matrix represent the number of attempts of independent Bernoulli trials. This matrix and x should have the same dimension (N x K, N points, K dimensions).
data: Extra data.frame (N x K, N points, W attributes) to store inside the output object W annotations for each one of the N input points. This parameter can also be NULL, in this case there is no extra annotation associated to the input. The annotations are necessary if one seeks to use VIBER to analyse cancer multi-sample sequencing data (the Binomial counts are in that case "cancer sequencing read counts"); in that case in the annotations there must be two columns, gene and driver reporting a gene identifier for the input mutation, and its boolean driver status. The extra annotation data will be stored in the data field of the output.
K: The maximum number of clusters returned, it should be lower than the number of rows of x and y. Default is K = 10; lower values speed up convergence.
alpha_0: The concentration parameter of the Dirichlet mixture. The default is a stringent fit with alpha = 1e-6.
a_0: Prior Beta hyperparameter. If this values is a scalar than all the mixture components have the same prior. The default is scalar a_0 = 1.
b_0: Prior Beta hyperparameter. If this values is a scalar than all the mixture components have the same prior. The default is scalar b_0 = 1.
max_iter: Maximum number of fit iterations. The fit is interrupted when this number of iterations is performed. Default max_iter = 5000
epsilon_conv: Epsilon to measure convergence (ELBO absolute difference).
samples: Number of fits computed by the algorithm. Only the best fit is returned. This value must be greater or equal than 1.
q_init: Initialization of the q-distribution to compute the approximation of the posterior distributions. This can be set in three different waysL equal to the prior (q_init = 'prior'), via kmeans clustering (q_init = 'kmeans') and capturing points which are private to each dimension (q_init = 'private'). The default is equal to the prior.
trace: If true the trace computed during the fit is returned (this allows to check fits a posterirori, make animations etc.). Default is FALSE; this feature can slow down quite substantially the fit.

Value

An object of class vb_bmm which contains S3 methods to extract the fit, plots the results, compute summary statistics etc.

Examples

data(mvbmm_example)
f = variational_fit(mvbmm_example$successes, mvbmm_example$trials)
#>  [ VIBER - variational fit ] 
#> 
#> ℹ Input n = 231, with k < 10. Dirichlet concentration α = 1e-06.
#> ℹ Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-10 or 5000 steps, r = 10 starts.
#> 
#> ✔ VIBER fit completed in 0.09 mins (status: converged)
#> 
#> ── [ VIBER ] My VIBER model n = 231 (w = 2 dimensions). Fit with k = 10 clusters
#> • Clusters: π = 45% [C9], 28% [C4], 20% [C1], and 7% [C10], with π > 0.
#> • Binomials: θ = <0.5, 0.49> [C9], <0, 0.2> [C4], <0.25, 0.25> [C1], and <0.22,
#> 0> [C10].
#> ℹ Score(s): ELBO = -47073.31. Fit converged in 21 steps, ε = 1e-10.
print(f)
#> ── [ VIBER ] My VIBER model n = 231 (w = 2 dimensions). Fit with k = 10 clusters
#> • Clusters: π = 45% [C9], 28% [C4], 20% [C1], and 7% [C10], with π > 0.
#> • Binomials: θ = <0.5, 0.49> [C9], <0, 0.2> [C4], <0.25, 0.25> [C1], and <0.22,
#> 0> [C10].
#> ℹ Score(s): ELBO = -47073.31. Fit converged in 21 steps, ε = 1e-10.