This function is a general interface for fitting a congas Python model in R. The model briefly consist in a joint mixture model over two modalities, currently scATAC and scRNA-seq. For more information about the theoretical fundations of the approach refer to the vignette. This function performs modele selection over a specified number of clusters, using a specific information criterium (IC). ICs and results for all the runs are, however, reported in the object.

The functions assume a list of model hyperparameters. As the the model formulation isquite complex, and those hyperparameters are extremely difficult to set by hand we suggest the usage of the function Rcongas::auto_config_run()

fit_congas(
  x,
  K,
  lambdas,
  model_parameters,
  learning_rate = 0.01,
  latent_variables = "G",
  CUDA = FALSE,
  steps = 500,
  samples = 1,
  parallel = FALSE,
  model_selection = "ICL",
  temperature = 10,
  equal_variance = TRUE,
  threshold = learning_rate * 0.1,
  patience = 5,
  same_mixing = FALSE
)

Arguments

x

An rcongasplus object with the input dataset, constructed with Rcongas::init.

K

a vector of integers with the number of clusters we want to test

lambdas

Float (Optional). Default 0.5. Value of the hyperparameter that controls the weight given to RNA and ATAC modalities during the inference. Values closer to 0 give more weight to the ATAC likelihood, while values closer to 1 result in higher weight given to the RNA likelihood.

model_parameters

a list with model hyperparameters. As errors coming from wrong hyperparameters initialization are quite hard to troubleshoot is higly suggested to use Rcongas::auto_config_run() to generate a template and eventually modify it.

learning_rate

a learning rate for the Adam optimizer

latent_variables

specify the nature of the latent variable modelling the copy number profile. Currently only "G" is available,

CUDA

use GPU if avilable for training

steps

number of steps of optimization

samples

Number of times a model is fit for each value of K.

model_selection

information criteria to which perform the model selection (one of ICL, NLL, BIC, AIC)

patience

Integer. Number of steps to wait before stopping the inference. See threshold for more details.

same_mixing

boolean that indicates whether to use the same mixing proportions for both RNA and ATAC or use different vectors for the two modalities. Default is FALSE.

threshold.

Float, default is learning_rate * 0.1. It corresponds to the threshold that determines the early stopping of the training procedure. When the difference between parameters in step t and step t+1 is lower than this threshold for a number of steps equal to the parameter patience the inference is stopped.

Value

An object ot class rcongasplus with a slot bset_fit with the learned parameters for the selected model in tiblle format. A slot runs

with all the runs performed ordered by the selectde IC and a slot model_selection with all the information to perform model selection.

Examples

library(Rcongas)
if (FALSE) {
K <-  1:4
hyperparams <- auto_config_run(example_object, 1:4)

fit <- fit_congas(example_object, K = 1:4,learning_rate = 0.05, model_parameters = hyperparams)
}