Classify mutations using a Beta-Binomial model-based test.
Usage
classify(
x,
k_max = 8,
priors_k_m = priors_pcawg_hmf,
priors_eta = priors_eta,
purity_error = 0.05,
num_cores = NULL,
iter_warmup = 500,
iter_sampling = 1000,
num_chains = 4,
results_dir = "~/INCONNON_inference/results",
generate_report_plot = F,
reports_dir = "~/INCONNON_inference/reports",
stan_fit_dump = F,
stan_fit_dir = "~/INCONNON_inference/fits"
)
Arguments
- x
An object of class
'INCOMMON'
generated with functioninit
.- k_max
The maximum value of total copy number to be included in the model.
- priors_k_m
A dplyr::tibble or data frame with columns
gene
,tumor_type
,k
,m
,N
andn
andp
indicating tumor-specific or pan-cancer (PANCA) prior probabilities.- priors_eta
A dplyr::tibble or data frame with columns
tumor_type
,mean_eta
,var_eta
,N
,alpha_eta
andbeta_eta
providing parameters of the Gamma prior distribution over the per copy sequencing rate.- purity_error
The expected error on the input sample purity estimate.
- num_cores
The number of cores to use for parallel stan sampling.
- iter_warmup
The number of iterations of the stan warmup phase.
- iter_sampling
The number of iterations of the stan sampling phase.
- num_chains
The number of MCMC chains to be run in parallel.
- results_dir
The directory path where to store the fit in rds format for each analysed sample.
- generate_report_plot
Whether to generate png files with the full fit report.
- reports_dir
If
generate_report_plot = TRUE
, the directory path where to store the reports.- stan_fit_dump
Whether to store stan fit objects.
- stan_fit_dir
If
stan_fit_dump = TRUE
, the directory path where to store the stan fit objects.
Value
An object of class INCOMMON
containing the original input plus
the classification data and parameters.
Examples
# First load example data
data(MSK_genomic_data)
data(MSK_clinical_data)
data(priors_pcawg_hmf)
data(priors_eta)
# Initialize the INCOMMON object for a single sample (note the outputs to screen)
sample = 'P-0002081'
x = init(genomic_data = MSK_genomic_data[MSK_genomic_data$sample == sample,], clinical_data = MSK_clinical_data[MSK_clinical_data$sample == sample,])
#> ── INCOMMON - Inference of copy number and mutation multiplicity in oncology ───
#>
#> ── Genomic data ──
#>
#> ✔ Found 1 samples, with 4 mutations in 4 genes
#>
#> ── Clinical data ──
#>
#> ℹ Provided clinical features:
#>
#> ✔ sample (required for classification)
#> ✔ purity (required for classification)
#> ✔ tumor_type
#> ✔ OS_MONTHS
#> ✔ OS_STATUS
#> ✔ SAMPLE_TYPE
#> ✔ MET_COUNT
#> ✔ METASTATIC_SITE
#> ✔ MET_SITE_COUNT
#> ✔ PRIMARY_SITE
#> ✔ SUBTYPE_ABBREVIATION
#> ✔ GENE_PANEL
#> ✔ SEX
#> ✔ TMB_NONSYNONYMOUS
#> ✔ FGA
#> ✔ AGE_AT_SEQUENCING
#> ✔ RACE
#>
#> ✔ Found 1 matching samples
#> ✔ No mismatched samples
# Run INCOMMON classification
x = classify(x = x, priors_k_m = priors_pcawg_hmf, priors_eta = priors_eta, num_cores = 1, iter_warmup = 10, iter_sampling = 10, num_chains = 1)
#> ℹ Performing classification
#>
#> ── INCOMMON inference of copy number and mutation multiplicity for sample P-0002
#>
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 30, column 18: Found
#> int division:
#> k_max * (k_max + 1) / 2
#> Values will be rounded towards zero. If rounding is not desired you can
#> write
#> the division as
#> k_max * (k_max + 1) / 2.0
#> If rounding is intended please use the integer division operator %/%.
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 34, column 19: Found
#> int division:
#> k_max * (k_max + 1) / 2
#> Values will be rounded towards zero. If rounding is not desired you can
#> write
#> the division as
#> k_max * (k_max + 1) / 2.0
#> If rounding is intended please use the integer division operator %/%.
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 51, column 11: Found
#> int division:
#> k_max * (k_max + 1) / 2
#> Values will be rounded towards zero. If rounding is not desired you can
#> write
#> the division as
#> k_max * (k_max + 1) / 2.0
#> If rounding is intended please use the integer division operator %/%.
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 65, column 18: Found
#> int division:
#> k_max * (k_max + 1) / 2
#> Values will be rounded towards zero. If rounding is not desired you can
#> write
#> the division as
#> k_max * (k_max + 1) / 2.0
#> If rounding is intended please use the integer division operator %/%.
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 67, column 18: Found
#> int division:
#> k_max * (k_max + 1) / 2
#> Values will be rounded towards zero. If rounding is not desired you can
#> write
#> the division as
#> k_max * (k_max + 1) / 2.0
#> If rounding is intended please use the integer division operator %/%.
#> Warning in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 84, column 11: Found
#> int division:
#> k_max * (k_max + 1) / 2
#> Values will be rounded towards zero. If rounding is not desired you can
#> write
#> the division as
#> k_max * (k_max + 1) / 2.0
#> If rounding is intended please
#> use the integer division operator %/%.
#> Running MCMC with 1 chain...
#>
#> Chain 1 WARNING: No variance estimation is
#> Chain 1 performed for num_warmup < 20
#> Chain 1 Iteration: 1 / 20 [ 5%] (Warmup)
#> Chain 1 Iteration: 11 / 20 [ 55%] (Sampling)
#> Chain 1 Iteration: 20 / 20 [100%] (Sampling)
#> Chain 1 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
#> Chain 1 Exception: gamma_lpdf: Random variable is inf, but must be positive finite! (in '/tmp/RtmpfR74H2/model-200f63a2106f.stan', line 43, column 2 to column 29)
#> Chain 1 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
#> Chain 1 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
#> Chain 1
#> Chain 1 finished in 0.1 seconds.
# An S3 method can be used to report to screen what is in the object
print(x)
#> ── [ INCOMMON ] 4 PASS mutations across 1 samples, with 4 mutant genes across 1
#> ℹ Average sample purity: 0.6
#> ℹ Average sequencing depth: 380
#> # A tibble: 4 × 36
#> sample tumor_type purity purity_map eta_map chr from to gene ref
#> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr> <chr>
#> 1 P-0002081 LUAD 0.6 0.796 127. chr12 2.54e7 2.54e7 KRAS C
#> 2 P-0002081 LUAD 0.6 0.796 127. chr17 7.58e6 7.58e6 TP53 G
#> 3 P-0002081 LUAD 0.6 0.796 127. chr19 1.22e6 1.22e6 STK11 C
#> 4 P-0002081 LUAD 0.6 0.796 127. chr19 1.11e7 1.11e7 SMAR… -
#> # ℹ 26 more variables: alt <chr>, NV <int>, DP <int>, VAF <dbl>, map_k <int>,
#> # map_m <int>, gene_role <chr>, OS_MONTHS <dbl>, OS_STATUS <dbl>,
#> # SAMPLE_TYPE <chr>, MET_COUNT <dbl>, METASTATIC_SITE <chr>,
#> # MET_SITE_COUNT <dbl>, PRIMARY_SITE <chr>, SUBTYPE_ABBREVIATION <chr>,
#> # GENE_PANEL <chr>, SEX <chr>, TMB_NONSYNONYMOUS <dbl>, FGA <dbl>,
#> # AGE_AT_SEQUENCING <dbl>, RACE <chr>, bayes_p_purity <dbl>,
#> # bayes_p_eta <dbl>, post_pred_p.value_DP <dbl>, …