Fits a statistical model to count data, particularly designed for RNA sequencing data analysis. The function estimates multiple parameters including regression coefficients (beta), overdispersion parameters, and normalizes data using size factors. It supports both CPU and GPU-based computation with parallel processing capabilities.

fit_devil(
  input_matrix,
  design_matrix,
  overdispersion = TRUE,
  init_overdispersion = NULL,
  do_cox_reid_adjustment = TRUE,
  offset = 1e-06,
  size_factors = TRUE,
  verbose = FALSE,
  max_iter = 100,
  tolerance = 0.001,
  CUDA = FALSE,
  batch_size = 1024L,
  parallel.cores = NULL
)

Arguments

input_matrix

A numeric matrix of count data (genes × samples). Rows represent genes/features, columns represent samples/cells.

design_matrix

A numeric matrix of predictor variables (samples × predictors). Each row corresponds to a sample, each column to a predictor variable.

overdispersion

Logical. Whether to estimate the overdispersion parameter. Set to FALSE for Poisson regression. Default: TRUE

init_overdispersion

Numeric or NULL. Initial value for overdispersion parameter. If NULL, estimates initial value from data. Recommended value if specified: 100. Default: NULL

do_cox_reid_adjustment

Logical. Whether to apply Cox-Reid adjustment in overdispersion estimation. Default: TRUE

offset

Numeric. Value added to counts to avoid numerical issues with zero counts. Default: 1e-6

size_factors

Logical. Whether to compute normalization factors for different sequencing depths. Default: TRUE

verbose

Logical. Whether to print progress messages during execution. Default: FALSE

max_iter

Integer. Maximum number of iterations for parameter optimization. Default: 100

tolerance

Numeric. Convergence criterion for parameter optimization. Default: 1e-3

CUDA

Logical. Whether to use GPU acceleration (requires CUDA support). Default: FALSE

batch_size

Integer. Number of genes to process per batch in GPU mode. Only relevant if CUDA = TRUE. Default: 1024

parallel.cores

Integer or NULL. Number of CPU cores for parallel processing. If NULL, uses all available cores. Default: NULL

Value

A list containing:

beta

Matrix of fitted coefficients (genes × predictors)

overdispersion

Vector of fitted overdispersion parameters (one per gene)

iterations

Vector of iteration counts for convergence (one per gene)

size_factors

Vector of computed size factors (one per sample)

offset_vector

Vector of offset values used in the model

design_matrix

Input design matrix (as provided)

input_matrix

Input count matrix (as provided)

input_parameters

List of used parameter values (max_iter, tolerance, parallel.cores)

Details

The function implements a negative binomial regression model with the following steps:

  1. Computes size factors for data normalization (if requested)

  2. Initializes model parameters including beta coefficients and overdispersion

  3. Fits the model using either CPU (parallel) or GPU computation

  4. Optionally estimates overdispersion parameters

The model fitting process uses iterative optimization with configurable convergence criteria and maximum iterations. For large datasets, the GPU implementation processes genes in batches for improved memory efficiency.

Examples

if (FALSE) { # \dontrun{
# Basic usage with default parameters
fit <- fit_devil(counts, design)

# Using GPU acceleration with custom batch size
fit <- fit_devil(counts, design, CUDA = TRUE, batch_size = 2048)

# Disable overdispersion estimation (Poisson model)
fit <- fit_devil(counts, design, overdispersion = FALSE)
} # }