Classify mutations using a Beta-Binomial model-based test.
Usage
classify(
x,
priors = pcawg_priors,
entropy_cutoff = NULL,
rho = 0.01,
parallel = FALSE,
num_cores = NULL,
karyotypes = c("1:0", "1:1", "2:0", "2:1", "2:2")
)
Arguments
- x
An object of class
'INCOMMON'
generated with functioninit
.- priors
A dplyr::tibble or data frame with columns
gene
,tumor_type
,label
andp
indicating tumor-specific or pan-cancer (PANCA) prior probabilities.- entropy_cutoff
Entropy cut-off for Tier-1 vs Tier-2 assignment.
- rho
Over-dispersion parameter.
- parallel
Whether to run the classification in parallel (default: FALSE)
- num_cores
The number of cores to use for parallel classification. By default, it takes 80% of the available cores.
- karyotypes
Karyotypes to be included among the possible classes.
Value
An object of class INCOMMON
containing the original input plus
the classification data and parameters.
Examples
# First load example data
data(MSK_genomic_data)
data(MSK_clinical_data)
# Initialize the INCOMMON object for a single sample (note the outputs to screen)
sample = 'P-0002081'
x = init(genomic_data = MSK_genomic_data[MSK_genomic_data$sample == sample,], clinical_data = MSK_clinical_data[MSK_clinical_data$sample == sample,])
#> ── INCOMMON - Inference of copy number and mutation multiplicity in oncology ───
#>
#> ── Genomic data ──
#>
#> ✔ Found 1 samples, with 4 mutations in 4 genes
#>
#> ── Clinical data ──
#>
#> ℹ Provided clinical features:
#>
#> ✔ sample (required for classification)
#> ✔ purity (required for classification)
#> ✔ tumor_type
#> ✔ OS_MONTHS
#> ✔ OS_STATUS
#> ✔ SAMPLE_TYPE
#> ✔ MET_COUNT
#> ✔ METASTATIC_SITE
#> ✔ MET_SITE_COUNT
#> ✔ PRIMARY_SITE
#> ✔ SUBTYPE_ABBREVIATION
#> ✔ GENE_PANEL
#> ✔ SEX
#> ✔ TMB_NONSYNONYMOUS
#> ✔ FGA
#> ✔ AGE_AT_SEQUENCING
#> ✔ RACE
#>
#> ✔ Found 1 matching samples
#> ✔ No mismatched samples
# Run INCOMMON classification
x = classify(x = x, priors = pcawg_priors, entropy_cutoff = NULL, rho = 0.01)
#>
#> ── INCOMMON inference of copy number and mutation multiplicity for sample ─────
#>
#> ℹ Performing classification
#> → No LUAD-specific prior probability specified for KRAS
#> → Using a pan-cancer prior
#> ✔ Loading CNAqc, 'Copy Number Alteration quality check'. Support : <https://caravagn.github.io/CNAqc/>
#> → No LUAD-specific prior probability specified for TP53
#> → Using a pan-cancer prior
#> → No LUAD-specific prior probability specified for STK11
#> → Using a pan-cancer prior
#> → No LUAD-specific prior probability specified for SMARCA4
#> → Using a pan-cancer prior
#> ℹ There are:
#> • N = 0 mutations (HMD)
#> • N = 3 mutations (LOH)
#> • N = 0 mutations (CNLOH)
#> • N = 1 mutations (AM)
#> • N = 0 mutations (Tier-2)
#> ℹ The mean classification entropy is 0.04 (min: 0.01, max: 0.06)
# An S3 method can be used to report to screen what is in the object
print(x)
#> ── [ INCOMMON ] 4 PASS mutations across 1 samples, with 4 mutant genes across 1
#> ℹ Average sample purity: 0.6
#> ℹ Average sequencing depth: 380
#> ── [ INCOMMON ] Classified mutations with overdispersion parameter 0.01 and ent
#> ℹ There are:
#> • N = 0 mutations (HMD)
#> • N = 3 mutations (LOH)
#> • N = 0 mutations (CNLOH)
#> • N = 1 mutations (AM)
#> • N = 0 mutations (Tier-2)
#> # A tibble: 4 × 18
#> sample tumor_type purity chr from to ref alt DP NV VAF
#> <chr> <chr> <dbl> <chr> <dbl> <dbl> <chr> <chr> <int> <int> <dbl>
#> 1 P-0002081 LUAD 0.6 chr12 2.54e7 2.54e7 C A 743 378 0.509
#> 2 P-0002081 LUAD 0.6 chr17 7.58e6 7.58e6 G A 246 116 0.472
#> 3 P-0002081 LUAD 0.6 chr19 1.22e6 1.22e6 C A 260 122 0.469
#> 4 P-0002081 LUAD 0.6 chr19 1.11e7 1.11e7 - C 271 133 0.491
#> # ℹ 7 more variables: gene <chr>, gene_role <chr>, id <chr>, label <chr>,
#> # state <chr>, posterior <dbl>, entropy <dbl>