Fits a statistical model to count data, particularly designed for RNA sequencing data analysis. The function estimates multiple parameters including regression coefficients (beta), overdispersion parameters, and normalizes data using size factors. It supports both CPU and GPU-based computation with parallel processing capabilities.
Usage
fit_devil(
input_matrix,
design_matrix,
overdispersion = TRUE,
init_overdispersion = NULL,
do_cox_reid_adjustment = TRUE,
offset = 1e-06,
size_factors = TRUE,
verbose = FALSE,
max_iter = 100,
tolerance = 0.001,
CUDA = FALSE,
batch_size = 1024L,
parallel.cores = NULL
)
Arguments
- input_matrix
A numeric matrix of count data (genes × samples). Rows represent genes/features, columns represent samples/cells.
- design_matrix
A numeric matrix of predictor variables (samples × predictors). Each row corresponds to a sample, each column to a predictor variable.
- overdispersion
Logical. Whether to estimate the overdispersion parameter. Set to FALSE for Poisson regression. Default: TRUE
- init_overdispersion
Numeric or NULL. Initial value for overdispersion parameter. If NULL, estimates initial value from data. Recommended value if specified: 100. Default: NULL
- do_cox_reid_adjustment
Logical. Whether to apply Cox-Reid adjustment in overdispersion estimation. Default: TRUE
- offset
Numeric. Value added to counts to avoid numerical issues with zero counts. Default: 1e-6
- size_factors
Logical. Whether to compute normalization factors for different sequencing depths. Default: TRUE
- verbose
Logical. Whether to print progress messages during execution. Default: FALSE
- max_iter
Integer. Maximum number of iterations for parameter optimization. Default: 100
- tolerance
Numeric. Convergence criterion for parameter optimization. Default: 1e-3
- CUDA
Logical. Whether to use GPU acceleration (requires CUDA support). Default: FALSE
- batch_size
Integer. Number of genes to process per batch in GPU mode. Only relevant if CUDA = TRUE. Default: 1024
- parallel.cores
Integer or NULL. Number of CPU cores for parallel processing. If NULL, uses all available cores. Default: NULL
Value
A list containing:
- beta
Matrix of fitted coefficients (genes × predictors)
- overdispersion
Vector of fitted overdispersion parameters (one per gene)
- iterations
Vector of iteration counts for convergence (one per gene)
- size_factors
Vector of computed size factors (one per sample)
- offset_vector
Vector of offset values used in the model
- design_matrix
Input design matrix (as provided)
- input_matrix
Input count matrix (as provided)
- input_parameters
List of used parameter values (max_iter, tolerance, parallel.cores)
Details
The function implements a negative binomial regression model with the following steps:
Computes size factors for data normalization (if requested)
Initializes model parameters including beta coefficients and overdispersion
Fits the model using either CPU (parallel) or GPU computation
Optionally estimates overdispersion parameters
The model fitting process uses iterative optimization with configurable convergence criteria and maximum iterations. For large datasets, the GPU implementation processes genes in batches for improved memory efficiency.
Examples
if (FALSE) { # \dontrun{
# Basic usage with default parameters
fit <- fit_devil(counts, design)
# Using GPU acceleration with custom batch size
fit <- fit_devil(counts, design, CUDA = TRUE, batch_size = 2048)
# Disable overdispersion estimation (Poisson model)
fit <- fit_devil(counts, design, overdispersion = FALSE)
} # }