Classify mutations using a Beta-Binomial model-based test.

Usage

classify(
  x,
  priors = pcawg_priors,
  entropy_cutoff = NULL,
  rho = 0.01,
  parallel = FALSE,
  num_cores = NULL,
  karyotypes = c("1:0", "1:1", "2:0", "2:1", "2:2")
)

Arguments

x: An object of class 'INCOMMON' generated with function init.
priors: A dplyr::tibble or data frame with columns gene, tumor_type, label and p indicating tumor-specific or pan-cancer (PANCA) prior probabilities.
entropy_cutoff: Entropy cut-off for Tier-1 vs Tier-2 assignment.
rho: Over-dispersion parameter.
parallel: Whether to run the classification in parallel (default: FALSE)
num_cores: The number of cores to use for parallel classification. By default, it takes 80% of the available cores.
karyotypes: Karyotypes to be included among the possible classes.

Value

An object of class INCOMMON containing the original input plus the classification data and parameters.

Examples

# First load example data
data(MSK_genomic_data)
data(MSK_clinical_data)
# Initialize the INCOMMON object for a single sample (note the outputs to screen)
sample = 'P-0002081'
x = init(genomic_data = MSK_genomic_data[MSK_genomic_data$sample == sample,], clinical_data = MSK_clinical_data[MSK_clinical_data$sample == sample,])
#> ── INCOMMON - Inference of copy number and mutation multiplicity in oncology ───
#> 
#> ── Genomic data ──
#> 
#> ✔ Found 1 samples, with 4 mutations in 4 genes
#> 
#> ── Clinical data ──
#> 
#> ℹ Provided clinical features:
#> 
#> ✔ sample (required for classification)
#> ✔ purity (required for classification)
#> ✔ tumor_type
#> ✔ OS_MONTHS
#> ✔ OS_STATUS
#> ✔ SAMPLE_TYPE
#> ✔ MET_COUNT
#> ✔ METASTATIC_SITE
#> ✔ MET_SITE_COUNT
#> ✔ PRIMARY_SITE
#> ✔ SUBTYPE_ABBREVIATION
#> ✔ GENE_PANEL
#> ✔ SEX
#> ✔ TMB_NONSYNONYMOUS
#> ✔ FGA
#> ✔ AGE_AT_SEQUENCING
#> ✔ RACE
#> 
#> ✔ Found 1 matching samples
#> ✔ No mismatched samples
# Run INCOMMON classification
x = classify(x = x, priors = pcawg_priors, entropy_cutoff = NULL, rho = 0.01)
#> 
#> ── INCOMMON inference of copy number and mutation multiplicity for sample  ─────
#> 
#> ℹ Performing classification
#> → No LUAD-specific prior probability specified for KRAS
#> → Using a pan-cancer prior
#> ✔ Loading CNAqc, 'Copy Number Alteration quality check'. Support : <https://caravagn.github.io/CNAqc/>
#> → No LUAD-specific prior probability specified for TP53
#> → Using a pan-cancer prior
#> → No LUAD-specific prior probability specified for STK11
#> → Using a pan-cancer prior
#> → No LUAD-specific prior probability specified for SMARCA4
#> → Using a pan-cancer prior
#> ℹ There are: 
#> • N = 0 mutations (HMD)
#> • N = 3 mutations (LOH)
#> • N = 0 mutations (CNLOH)
#> • N = 1 mutations (AM)
#> • N = 0 mutations (Tier-2)
#> ℹ The mean classification entropy is 0.04 (min: 0.01, max: 0.06)
# An S3 method can be used to report to screen what is in the object
print(x)
#> ── [ INCOMMON ]  4 PASS mutations across 1 samples, with 4 mutant genes across 1
#> ℹ Average sample purity: 0.6
#> ℹ Average sequencing depth: 380
#> ── [ INCOMMON ]  Classified mutations with overdispersion parameter 0.01 and ent
#> ℹ There are: 
#> • N = 0 mutations (HMD)
#> • N = 3 mutations (LOH)
#> • N = 0 mutations (CNLOH)
#> • N = 1 mutations (AM)
#> • N = 0 mutations (Tier-2)
#> # A tibble: 4 × 18
#>   sample    tumor_type purity chr      from     to ref   alt      DP    NV   VAF
#>   <chr>     <chr>       <dbl> <chr>   <dbl>  <dbl> <chr> <chr> <int> <int> <dbl>
#> 1 P-0002081 LUAD          0.6 chr12  2.54e7 2.54e7 C     A       743   378 0.509
#> 2 P-0002081 LUAD          0.6 chr17  7.58e6 7.58e6 G     A       246   116 0.472
#> 3 P-0002081 LUAD          0.6 chr19  1.22e6 1.22e6 C     A       260   122 0.469
#> 4 P-0002081 LUAD          0.6 chr19  1.11e7 1.11e7 -     C       271   133 0.491
#> # ℹ 7 more variables: gene <chr>, gene_role <chr>, id <chr>, label <chr>,
#> #   state <chr>, posterior <dbl>, entropy <dbl>