Skip to contents

Classify mutations using a Beta-Binomial model-based test.

Usage

classify(
  x,
  priors = pcawg_priors,
  entropy_cutoff = NULL,
  rho = 0.01,
  parallel = FALSE,
  num_cores = NULL,
  karyotypes = c("1:0", "1:1", "2:0", "2:1", "2:2")
)

Arguments

x

An object of class 'INCOMMON' generated with function init.

priors

A dplyr::tibble or data frame with columns gene, tumor_type, label and p indicating tumor-specific or pan-cancer (PANCA) prior probabilities.

entropy_cutoff

Entropy cut-off for Tier-1 vs Tier-2 assignment.

rho

Over-dispersion parameter.

parallel

Whether to run the classification in parallel (default: FALSE)

num_cores

The number of cores to use for parallel classification. By default, it takes 80% of the available cores.

karyotypes

Karyotypes to be included among the possible classes.

Value

An object of class INCOMMON containing the original input plus the classification data and parameters.

Examples

# First load example data
data(MSK_genomic_data)
data(MSK_clinical_data)
# Initialize the INCOMMON object for a single sample (note the outputs to screen)
sample = 'P-0002081'
x = init(genomic_data = MSK_genomic_data[MSK_genomic_data$sample == sample,], clinical_data = MSK_clinical_data[MSK_clinical_data$sample == sample,])
#> ── INCOMMON - Inference of copy number and mutation multiplicity in oncology ───
#> 
#> ── Genomic data ──
#> 
#>  Found 1 samples, with 4 mutations in 4 genes
#> 
#> ── Clinical data ──
#> 
#>  Provided clinical features:
#> 
#>  sample (required for classification)
#>  purity (required for classification)
#>  tumor_type
#>  OS_MONTHS
#>  OS_STATUS
#>  SAMPLE_TYPE
#>  MET_COUNT
#>  METASTATIC_SITE
#>  MET_SITE_COUNT
#>  PRIMARY_SITE
#>  SUBTYPE_ABBREVIATION
#>  GENE_PANEL
#>  SEX
#>  TMB_NONSYNONYMOUS
#>  FGA
#>  AGE_AT_SEQUENCING
#>  RACE
#> 
#>  Found 1 matching samples
#>  No mismatched samples
# Run INCOMMON classification
x = classify(x = x, priors = pcawg_priors, entropy_cutoff = NULL, rho = 0.01)
#> 
#> ── INCOMMON inference of copy number and mutation multiplicity for sample  ─────
#> 
#>  Performing classification
#> → No LUAD-specific prior probability specified for KRAS
#> → Using a pan-cancer prior
#>  Loading CNAqc, 'Copy Number Alteration quality check'. Support : <https://caravagn.github.io/CNAqc/>
#> → No LUAD-specific prior probability specified for TP53
#> → Using a pan-cancer prior
#> → No LUAD-specific prior probability specified for STK11
#> → Using a pan-cancer prior
#> → No LUAD-specific prior probability specified for SMARCA4
#> → Using a pan-cancer prior
#>  There are: 
#>  N = 0 mutations (HMD)
#>  N = 3 mutations (LOH)
#>  N = 0 mutations (CNLOH)
#>  N = 1 mutations (AM)
#>  N = 0 mutations (Tier-2)
#>  The mean classification entropy is 0.04 (min: 0.01, max: 0.06)
# An S3 method can be used to report to screen what is in the object
print(x)
#> ── [ INCOMMON ]  4 PASS mutations across 1 samples, with 4 mutant genes across 1
#>  Average sample purity: 0.6
#>  Average sequencing depth: 380
#> ── [ INCOMMON ]  Classified mutations with overdispersion parameter 0.01 and ent
#>  There are: 
#>  N = 0 mutations (HMD)
#>  N = 3 mutations (LOH)
#>  N = 0 mutations (CNLOH)
#>  N = 1 mutations (AM)
#>  N = 0 mutations (Tier-2)
#> # A tibble: 4 × 18
#>   sample    tumor_type purity chr      from     to ref   alt      DP    NV   VAF
#>   <chr>     <chr>       <dbl> <chr>   <dbl>  <dbl> <chr> <chr> <int> <int> <dbl>
#> 1 P-0002081 LUAD          0.6 chr12  2.54e7 2.54e7 C     A       743   378 0.509
#> 2 P-0002081 LUAD          0.6 chr17  7.58e6 7.58e6 G     A       246   116 0.472
#> 3 P-0002081 LUAD          0.6 chr19  1.22e6 1.22e6 C     A       260   122 0.469
#> 4 P-0002081 LUAD          0.6 chr19  1.11e7 1.11e7 -     C       271   133 0.491
#> # ℹ 7 more variables: gene <chr>, gene_role <chr>, id <chr>, label <chr>,
#> #   state <chr>, posterior <dbl>, entropy <dbl>