Variational fit for a semi-parametric Dirichelt mixture of Binomial
distributions. The fit convergency can be monitored through the ELBO,
can be run either sequentially (single core) or in parallel. You need
to provide an upper bound on the number of clusters that you want to
obtain, through parameters K
. You can explicit the Dirichlet
prior for the concentration of the mixture (alpha_0
), as well as
the hyperparmeters of the Beta priors for each mixture component.
variational_fit(
x,
y,
data = NULL,
K = 10,
alpha_0 = 1e-06,
a_0 = 1,
b_0 = 1,
max_iter = 5000,
epsilon_conv = 1e-10,
samples = 10,
q_init = "prior",
trace = FALSE,
description = "My VIBER model"
)
A matrix where each column is a dimension of the multivariate Binomial,
and each row is an input point. Values of this matrix represent the number of
successes of independent Bernoulli trials. This matrix and y
should have
the same dimension (N x K
, N
points, K
dimensions).
A matrix where each column is a dimension of the multivariate Binomial,
and each row is an input point. Values of this matrix represent the number of
attempts of independent Bernoulli trials. This matrix and x
should have
the same dimension (N x K
, N
points, K
dimensions).
Extra data.frame (N x K
, N
points, W
attributes)
to store inside the output object W
annotations for each one of the
N
input points. This parameter can also be NULL
, in this case
there is no extra annotation associated to the input. The annotations are necessary
if one seeks to use VIBER to analyse cancer multi-sample sequencing data (the
Binomial counts are in that case "cancer sequencing read counts"); in that
case in the annotations there must be two columns, gene
and driver
reporting a gene identifier for the input mutation, and its boolean driver status.
The extra annotation data will be stored in the data
field of the output.
The maximum number of clusters returned, it should be lower than the
number of rows of x
and y
. Default is K = 10
; lower values
speed up convergence.
The concentration parameter of the Dirichlet mixture. The default
is a stringent fit with alpha = 1e-6
.
Prior Beta hyperparameter. If this values is a scalar than all the
mixture components have the same prior. The default is scalar a_0 = 1
.
Prior Beta hyperparameter. If this values is a scalar than all the
mixture components have the same prior. The default is scalar b_0 = 1
.
Maximum number of fit iterations. The fit is interrupted when
this number of iterations is performed. Default max_iter = 5000
Epsilon to measure convergence (ELBO absolute difference).
Number of fits computed by the algorithm. Only the best fit is returned. This value must be greater or equal than 1.
Initialization of the q-distribution to compute the approximation
of the posterior distributions. This can be set in three different waysL
equal to the prior (q_init = 'prior'
), via kmeans clustering
(q_init = 'kmeans'
) and capturing points which are private to each
dimension (q_init = 'private'
). The default is equal to the prior.
If true the trace computed during the fit is returned (this allows
to check fits a posterirori, make animations etc.). Default is FALSE
; this
feature can slow down quite substantially the fit.
An object of class vb_bmm
which contains S3 methods to extract
the fit, plots the results, compute summary statistics etc.
data(mvbmm_example)
f = variational_fit(mvbmm_example$successes, mvbmm_example$trials)
#> [ VIBER - variational fit ]
#>
#> ℹ Input n = 231, with k < 10. Dirichlet concentration α = 1e-06.
#> ℹ Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-10 or 5000 steps, r = 10 starts.
#>
#> ✔ VIBER fit completed in 0.09 mins (status: converged)
#>
#> ── [ VIBER ] My VIBER model n = 231 (w = 2 dimensions). Fit with k = 10 clusters
#> • Clusters: π = 45% [C9], 28% [C4], 20% [C1], and 7% [C10], with π > 0.
#> • Binomials: θ = <0.5, 0.49> [C9], <0, 0.2> [C4], <0.25, 0.25> [C1], and <0.22,
#> 0> [C10].
#> ℹ Score(s): ELBO = -47073.31. Fit converged in 21 steps, ε = 1e-10.
print(f)
#> ── [ VIBER ] My VIBER model n = 231 (w = 2 dimensions). Fit with k = 10 clusters
#> • Clusters: π = 45% [C9], 28% [C4], 20% [C1], and 7% [C10], with π > 0.
#> • Binomials: θ = <0.5, 0.49> [C9], <0, 0.2> [C4], <0.25, 0.25> [C1], and <0.22,
#> 0> [C10].
#> ℹ Score(s): ELBO = -47073.31. Fit converged in 21 steps, ε = 1e-10.