Skip to contents

Classify mutations using a Beta-Binomial model-based test.

Usage

classify(
  x,
  k_max = 8,
  priors_k_m = priors_pcawg_hmf,
  priors_eta = priors_eta,
  purity_error = 0.05,
  num_cores = NULL,
  iter_warmup = 500,
  iter_sampling = 1000,
  num_chains = 4,
  results_dir = "~/INCONNON_inference/results",
  generate_report_plot = F,
  reports_dir = "~/INCONNON_inference/reports",
  stan_fit_dump = F,
  stan_fit_dir = "~/INCONNON_inference/fits"
)

Arguments

x

An object of class 'INCOMMON' generated with function init.

k_max

The maximum value of total copy number to be included in the model.

priors_k_m

A dplyr::tibble or data frame with columns gene, tumor_type, k, m, N and n and p indicating tumor-specific or pan-cancer (PANCA) prior probabilities.

priors_eta

A dplyr::tibble or data frame with columns tumor_type,mean_eta, var_eta, N,alpha_eta and beta_eta providing parameters of the Gamma prior distribution over the per copy sequencing rate.

purity_error

The expected error on the input sample purity estimate.

num_cores

The number of cores to use for parallel stan sampling.

iter_warmup

The number of iterations of the stan warmup phase.

iter_sampling

The number of iterations of the stan sampling phase.

num_chains

The number of MCMC chains to be run in parallel.

results_dir

The directory path where to store the fit in rds format for each analysed sample.

generate_report_plot

Whether to generate png files with the full fit report.

reports_dir

If generate_report_plot = TRUE, the directory path where to store the reports.

stan_fit_dump

Whether to store stan fit objects.

stan_fit_dir

If stan_fit_dump = TRUE, the directory path where to store the stan fit objects.

Value

An object of class INCOMMON containing the original input plus the classification data and parameters.

Examples

# First load example data
data(MSK_genomic_data)
data(MSK_clinical_data)
data(priors_pcawg_hmf)
data(priors_eta)
# Initialize the INCOMMON object for a single sample (note the outputs to screen)
sample = 'P-0002081'
x = init(genomic_data = MSK_genomic_data[MSK_genomic_data$sample == sample,], clinical_data = MSK_clinical_data[MSK_clinical_data$sample == sample,])
#> ── INCOMMON - Inference of copy number and mutation multiplicity in oncology ───
#> 
#> ── Genomic data ──
#> 
#>  Found 1 samples, with 4 mutations in 4 genes
#> 
#> ── Clinical data ──
#> 
#>  Provided clinical features:
#> 
#>  sample (required for classification)
#>  purity (required for classification)
#>  tumor_type
#>  OS_MONTHS
#>  OS_STATUS
#>  SAMPLE_TYPE
#>  MET_COUNT
#>  METASTATIC_SITE
#>  MET_SITE_COUNT
#>  PRIMARY_SITE
#>  SUBTYPE_ABBREVIATION
#>  GENE_PANEL
#>  SEX
#>  TMB_NONSYNONYMOUS
#>  FGA
#>  AGE_AT_SEQUENCING
#>  RACE
#> 
#>  Found 1 matching samples
#>  No mismatched samples
# Run INCOMMON classification
x = classify(x = x, priors_k_m = priors_pcawg_hmf, priors_eta = priors_eta, num_cores = 1, iter_warmup = 10, iter_sampling = 10, num_chains = 1)
#>  Performing classification
#> 
#> ── INCOMMON inference of copy number and mutation multiplicity for sample P-0002
#> 
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 30, column 18: Found
#>     int division:
#>       k_max * (k_max + 1) / 2
#>     Values will be rounded towards zero. If rounding is not desired you can
#>     write
#>     the division as
#>       k_max * (k_max + 1) / 2.0
#>     If rounding is intended please use the integer division operator %/%.
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 34, column 19: Found
#>     int division:
#>       k_max * (k_max + 1) / 2
#>     Values will be rounded towards zero. If rounding is not desired you can
#>     write
#>     the division as
#>       k_max * (k_max + 1) / 2.0
#>     If rounding is intended please use the integer division operator %/%.
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 51, column 11: Found
#>     int division:
#>       k_max * (k_max + 1) / 2
#>     Values will be rounded towards zero. If rounding is not desired you can
#>     write
#>     the division as
#>       k_max * (k_max + 1) / 2.0
#>     If rounding is intended please use the integer division operator %/%.
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 65, column 18: Found
#>     int division:
#>       k_max * (k_max + 1) / 2
#>     Values will be rounded towards zero. If rounding is not desired you can
#>     write
#>     the division as
#>       k_max * (k_max + 1) / 2.0
#>     If rounding is intended please use the integer division operator %/%.
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 67, column 18: Found
#>     int division:
#>       k_max * (k_max + 1) / 2
#>     Values will be rounded towards zero. If rounding is not desired you can
#>     write
#>     the division as
#>       k_max * (k_max + 1) / 2.0
#>     If rounding is intended please use the integer division operator %/%.
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 84, column 11: Found
#>     int division:
#>       k_max * (k_max + 1) / 2
#>     Values will be rounded towards zero. If rounding is not desired you can
#>     write
#>     the division as
#>       k_max * (k_max + 1) / 2.0
#>     If rounding is intended please
#>  use the integer division operator %/%.
#> Running MCMC with 1 chain...
#> 
#> Chain 1 WARNING: No variance estimation is 
#> Chain 1          performed for num_warmup < 20 
#> Chain 1 Iteration:  1 / 20 [  5%]  (Warmup) 
#> Chain 1 Iteration: 11 / 20 [ 55%]  (Sampling) 
#> Chain 1 Iteration: 20 / 20 [100%]  (Sampling) 
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 43, column 2 to column 29)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1 
#> Chain 1 finished in 0.1 seconds.
# An S3 method can be used to report to screen what is in the object
print(x)
#> ── [ INCOMMON ]  4 PASS mutations across 1 samples, with 4 mutant genes across 1
#>  Average sample purity: 0.6
#>  Average sequencing depth: 380
#> # A tibble: 4 × 36
#>   sample    tumor_type purity purity_map eta_map chr     from     to gene  ref  
#>   <chr>     <chr>       <dbl>      <dbl>   <dbl> <chr>  <dbl>  <dbl> <chr> <chr>
#> 1 P-0002081 LUAD          0.6      0.796    127. chr12 2.54e7 2.54e7 KRAS  C    
#> 2 P-0002081 LUAD          0.6      0.796    127. chr17 7.58e6 7.58e6 TP53  G    
#> 3 P-0002081 LUAD          0.6      0.796    127. chr19 1.22e6 1.22e6 STK11 C    
#> 4 P-0002081 LUAD          0.6      0.796    127. chr19 1.11e7 1.11e7 SMAR… -    
#> # ℹ 26 more variables: alt <chr>, NV <int>, DP <int>, VAF <dbl>, map_k <int>,
#> #   map_m <int>, gene_role <chr>, OS_MONTHS <dbl>, OS_STATUS <dbl>,
#> #   SAMPLE_TYPE <chr>, MET_COUNT <dbl>, METASTATIC_SITE <chr>,
#> #   MET_SITE_COUNT <dbl>, PRIMARY_SITE <chr>, SUBTYPE_ABBREVIATION <chr>,
#> #   GENE_PANEL <chr>, SEX <chr>, TMB_NONSYNONYMOUS <dbl>, FGA <dbl>,
#> #   AGE_AT_SEQUENCING <dbl>, RACE <chr>, bayes_p_purity <dbl>,
#> #   bayes_p_eta <dbl>, post_pred_p.value_DP <dbl>, …