Fit an (R)CONGAS+ model — fit

This function is a general interface for fitting a congas Python model in R. The model briefly consist in a joint mixture model over two modalities, currently scATAC and scRNA-seq. For more information about the theoretical fundations of the approach refer to the vignette. This function performs modele selection over a specified number of clusters, using a specific information criterium (IC). ICs and results for all the runs are, however, reported in the object.

The functions assume a list of model hyperparameters. As the the model formulation isquite complex, and those hyperparameters are extremely difficult to set by hand we suggest the usage of the function Rcongas::auto_config_run()

fit_congas(
  x,
  K,
  lambdas,
  model_parameters,
  learning_rate = 0.01,
  latent_variables = "G",
  CUDA = FALSE,
  steps = 500,
  samples = 1,
  parallel = FALSE,
  model_selection = "ICL",
  temperature = 10,
  equal_variance = TRUE,
  threshold = learning_rate * 0.1,
  patience = 5,
  same_mixing = FALSE
)

Arguments

x: An rcongasplus object with the input dataset, constructed with Rcongas::init.
K: a vector of integers with the number of clusters we want to test
lambdas: Float (Optional). Default 0.5. Value of the hyperparameter that controls the weight given to RNA and ATAC modalities during the inference. Values closer to 0 give more weight to the ATAC likelihood, while values closer to 1 result in higher weight given to the RNA likelihood.
model_parameters: a list with model hyperparameters. As errors coming from wrong hyperparameters initialization are quite hard to troubleshoot is higly suggested to use Rcongas::auto_config_run() to generate a template and eventually modify it.
learning_rate: a learning rate for the Adam optimizer
latent_variables: specify the nature of the latent variable modelling the copy number profile. Currently only "G" is available,
CUDA: use GPU if avilable for training
steps: number of steps of optimization
samples: Number of times a model is fit for each value of K.
model_selection: information criteria to which perform the model selection (one of ICL, NLL, BIC, AIC)
patience: Integer. Number of steps to wait before stopping the inference. See threshold for more details.
same_mixing: boolean that indicates whether to use the same mixing proportions for both RNA and ATAC or use different vectors for the two modalities. Default is FALSE.
threshold.: Float, default is learning_rate * 0.1. It corresponds to the threshold that determines the early stopping of the training procedure. When the difference between parameters in step t and step t+1 is lower than this threshold for a number of steps equal to the parameter patience the inference is stopped.

Value

An object ot class rcongasplus with a slot bset_fit with the learned parameters for the selected model in tiblle format. A slot runs

with all the runs performed ordered by the selectde IC and a slot model_selection with all the information to perform model selection.

Examples

library(Rcongas)
if (FALSE) {
K <-  1:4
hyperparams <- auto_config_run(example_object, 1:4)

fit <- fit_congas(example_object, K = 1:4,learning_rate = 0.05, model_parameters = hyperparams)
}