Fits a statistical model to count data, particularly designed for RNA sequencing data analysis. The function estimates multiple parameters including regression coefficients (beta), overdispersion parameters, and normalizes data using size factors. It supports both CPU and GPU-based computation with parallel processing capabilities.
fit_devil(
input_matrix,
design_matrix,
overdispersion = TRUE,
init_overdispersion = NULL,
do_cox_reid_adjustment = TRUE,
offset = 1e-06,
size_factors = TRUE,
verbose = FALSE,
max_iter = 100,
tolerance = 0.001,
CUDA = FALSE,
batch_size = 1024L,
parallel.cores = NULL
)
A numeric matrix of count data (genes × samples). Rows represent genes/features, columns represent samples/cells.
A numeric matrix of predictor variables (samples × predictors). Each row corresponds to a sample, each column to a predictor variable.
Logical. Whether to estimate the overdispersion parameter. Set to FALSE for Poisson regression. Default: TRUE
Numeric or NULL. Initial value for overdispersion parameter. If NULL, estimates initial value from data. Recommended value if specified: 100. Default: NULL
Logical. Whether to apply Cox-Reid adjustment in overdispersion estimation. Default: TRUE
Numeric. Value added to counts to avoid numerical issues with zero counts. Default: 1e-6
Logical. Whether to compute normalization factors for different sequencing depths. Default: TRUE
Logical. Whether to print progress messages during execution. Default: FALSE
Integer. Maximum number of iterations for parameter optimization. Default: 100
Numeric. Convergence criterion for parameter optimization. Default: 1e-3
Logical. Whether to use GPU acceleration (requires CUDA support). Default: FALSE
Integer. Number of genes to process per batch in GPU mode. Only relevant if CUDA = TRUE. Default: 1024
Integer or NULL. Number of CPU cores for parallel processing. If NULL, uses all available cores. Default: NULL
A list containing:
Matrix of fitted coefficients (genes × predictors)
Vector of fitted overdispersion parameters (one per gene)
Vector of iteration counts for convergence (one per gene)
Vector of computed size factors (one per sample)
Vector of offset values used in the model
Input design matrix (as provided)
Input count matrix (as provided)
List of used parameter values (max_iter, tolerance, parallel.cores)
The function implements a negative binomial regression model with the following steps:
Computes size factors for data normalization (if requested)
Initializes model parameters including beta coefficients and overdispersion
Fits the model using either CPU (parallel) or GPU computation
Optionally estimates overdispersion parameters
The model fitting process uses iterative optimization with configurable convergence criteria and maximum iterations. For large datasets, the GPU implementation processes genes in batches for improved memory efficiency.
if (FALSE) { # \dontrun{
# Basic usage with default parameters
fit <- fit_devil(counts, design)
# Using GPU acceleration with custom batch size
fit <- fit_devil(counts, design, CUDA = TRUE, batch_size = 2048)
# Disable overdispersion estimation (Poisson model)
fit <- fit_devil(counts, design, overdispersion = FALSE)
} # }