3. Understanding Timing of Clonal Peaks
a3_Understanding_single_segment_timing.Rmd
This vignette delves into how the tickTack
package
obtains timing results for clonal peaks using the fit
function. It explains the underlying processes, including the
computation of clonal peaks and the Stan model used for inference.
Key Components of the Analysis
The timing results are obtained in the following steps:
- Extracting clonal peaks: Theoretical peaks in the Variant Allele Frequency (VAF) spectrum are calculated for specific karyotypes and tumor purity.
- Filtering mutations: Mutations within confidence intervals around these peaks are selected for analysis.
- Fitting the model: Bayesian inference is performed using a Stan model to estimate the timing of clonal expansions.
Step 1: Calculating Clonal Peaks
The clonal peaks are calculated using the
get_clonal_peaks
function. The function computes the
theoretical VAF peaks based on the karyotype and tumor purity. They are
computed using the following equation:
Where:
- : The multiplicity of the allele of interest (e.g., the major allele or a single copy for heterozygosity).
- : The tumor purity (a value between 0 and 1).
- : The total number of copies in the karyotype (e.g., sum of the major and minor alleles).
- : Accounts for the two copies in normal diploid cells.
Step 2: Filtering Mutations
Once the peaks are obtained, mutations within a genomic segment are filtered based on whether their VAF values fall within a confidence interval around these peaks.
The confidence intervals are computed using either:
- Binomial distribution: for standard VAF estimation
- Beta-binomial distribution: to account for overdispersion
Filtering logic
For each mutation, the number of variant reads (NV) is compared to the confidence interval calculated for the depth (DP) at each peak:
# Example pseudo-code for filtering
probs <- c(alpha / 2, 1 - alpha / 2)
for (p in peaks) {
quantiles <- qbinom(probs, DP, p) # or qbb for beta-binomial
if (NV >= quantiles[1] && NV <= quantiles[2]) {
# Mutation is accepted
}
}
Only mutations passing this test are used for model fitting.
Step 3: Bayesian Inference with Stan
The final step is to fit a Stan model to the filtered mutations. The
model estimates the posterior distribution of tau
, which
represents the timing of clonal expansions.
The stan code used for the model is divided in the following pieces.
Data block
Here, the data block defines inputs that the model uses:
- N: Total number of mutations to analyze.
- NV: Number of variant reads observed for each mutation.
- DP: Total sequencing depth for each mutation.
- peaks: An array of two real values representing the theoretical clonal peaks (e.g., 2:1 and 2:2 ratios or other hypothesized clonal frequencies).
Parameters block
The parameters block contains:
- omega: A vector representing the probability contributions of each theoretical peak. It follows a Dirichlet distribution (a standard approach for modeling probabilities that sum to 1).
Model block
model {
vector[2] contributions;
omega ~ dirichlet(rep_vector(2.0, 2));
for (i in 1:N) {
for (k in 1:2) {
contributions[k] = log(omega[k]) + binomial_lpmf(NV[i] | DP[i], peaks[k]);
}
target += log_sum_exp(contributions);
}
}
The model
block contains three components:
-
Prior: The line
omega ~ dirichlet(rep_vector(2.0, 2));
applies a uniform Dirichlet prior to the karyotype mixture probabilities (omega). This means that we do not a priori favor one clonal peak over the other unless there is data evidence. -
Likelihood: The likelihood is computed using the
binomial_lpmf(NV[i] | DP[i], peaks[k])
(or the Negative binomial). This term computes the probability of observingNV[i]
variant reads given sequencing depthDP[i]
and a given theoretical clonal peak valuepeaks[k]
. -
Marginalization: The line
log_sum_exp(contributions)
computes the log marginal likelihood by summing over possible contributions (karyotypes) using the softmax approach, allowing the model to integrate over possible karyotype contributions (omega
).
Final Step: Conversion from omega
to
tau
After fitting the Bayesian model, the posterior probabilities
(omega
) represent the contribution of theoretical clonal
peaks. We convert these probabilities into clonal timing estimates
(tau
) using the following equations.
For 2:1
karyotypes:
For 2:2
and 2:0
karyotypes: