This function is a general interface for fitting a congas Python model in R. The model briefly consist in a joint mixture model over two modalities, currently scATAC and scRNA-seq. For more information about the theoretical fundations of the approach refer to the vignette. This function performs modele selection over a specified number of clusters, using a specific information criterium (IC). ICs and results for all the runs are, however, reported in the object.
The functions assume a list of model hyperparameters. As the the model formulation isquite complex, and those hyperparameters are extremely difficult to
set by hand we suggest the usage of the function Rcongas::auto_config_run()
fit_congas(
x,
K,
lambdas,
model_parameters,
learning_rate = 0.01,
latent_variables = "G",
CUDA = FALSE,
steps = 500,
samples = 1,
parallel = FALSE,
model_selection = "ICL",
temperature = 10,
equal_variance = TRUE,
threshold = learning_rate * 0.1,
patience = 5,
same_mixing = FALSE
)
An rcongasplus
object with the input dataset, constructed with Rcongas::init
.
a vector of integers with the number of clusters we want to test
Float (Optional). Default 0.5. Value of the hyperparameter that controls the weight given to RNA and ATAC modalities during the inference. Values closer to 0 give more weight to the ATAC likelihood, while values closer to 1 result in higher weight given to the RNA likelihood.
a list with model hyperparameters. As errors coming from wrong hyperparameters initialization
are quite hard to troubleshoot is higly suggested to use Rcongas::auto_config_run()
to generate
a template and eventually modify it.
a learning rate for the Adam optimizer
specify the nature of the latent variable modelling the copy number profile. Currently only "G" is available,
use GPU if avilable for training
number of steps of optimization
Number of times a model is fit for each value of K
.
information criteria to which perform the model selection (one of ICL, NLL, BIC, AIC)
Integer. Number of steps to wait before stopping the inference. See threshold
for more details.
boolean that indicates whether to use the same mixing proportions for both RNA and ATAC or use different vectors for the two modalities. Default is FALSE.
Float, default is learning_rate * 0.1
. It corresponds to the threshold that determines the early stopping of the training procedure. When the difference between parameters in step t and step t+1 is
lower than this threshold for a number of steps equal to the parameter patience
the inference is stopped.
An object ot class rcongasplus
with a slot bset_fit
with the learned parameters for the selected model in tiblle format. A slot runs
with all the runs performed ordered by the selectde IC and a slot model_selection
with all the information to perform model selection.
library(Rcongas)
if (FALSE) {
K <- 1:4
hyperparams <- auto_config_run(example_object, 1:4)
fit <- fit_congas(example_object, K = 1:4,learning_rate = 0.05, model_parameters = hyperparams)
}