Runs a TINC analysis. — autofit • TINC

This function is a wrapper to run the main analysis of TINC.

The steps are as follows:

1) Input data is loaded from a `file`, or from a `dataframe`.
2) Clonal mutations are estimated for the tumour, together with the tumour purity (Tumour in Tumour).
3) From putative clonal mutations of the tumour, the Tumour in Normal contamination level is estimated.

An S3 object is returned that contains the results of the analysis.

autofit(
  input,
  cna,
  VAF_range_tumour = c(0, 0.7),
  cutoff_miscalled_clonal = 0.6,
  cutoff_lv_assignment = 0.75,
  N = 20000,
  FAST = FALSE
)

Arguments

input: A`dataframe` of the iput mutations. Must be in a certain format, see the vignette for more information.
cna: Copy Number data in the format of package CNAqc.
VAF_range_tumour: A range `[x, y]` so that only mutations with VAF in that range are actually used to determine the TIN/ TIT levels of the input.
cutoff_miscalled_clonal: An upper bound on the VAF of a cluster in the tumour data. Clusters above this value will be considered miscalled clonal clusters (e.g., due to LOH etc.).
cutoff_lv_assignment: Consider only latent variables with responsibilities above this cutoff.
N: If there are more than `N` mutations in VAF range `VAF_range_tumour`, a random subset of size `N` is retained.
FAST: If `TRUE`, it runs the analysis with reduced sampling power and accuracy. Use this to obtain a result for preliminary inspection of your data, and then run `autofit` with this parameter set to `FALSE`.

Value

An S3 object that contains the results of this analysis.

Examples


# Random
rt = random_TIN()
#> ✔ Generated TINC dataset (n = 998 mutations), TIN (0.05) and TIT (1), normal and tumour coverage 30x and 120x.
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_bar()`).
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_bar()`).

x = autofit(input = rt$data, cna = rt$cna, FAST = TRUE)
#>  [ TINC ] 
#> 
#> 
#> ── Loading TINC input data ─────────────────────────────────────────────────────
#> ✔ Input data contains n = 998 mutations, selecting operation mode.
#> ! Found CNA data, retaining only mutations that map to segments with predominant karyotype ...
#> 
#> 
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#> 
#> ℹ Using reference genome coordinates for: GRCh38.
#> ✔ Fortified calls for 998 somatic mutations: 998 SNVs (100%) and 0 indels.
#> ! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
#> ! Added segments length (in basepairs) to CNA segments.
#> ✔ Fortified CNAs for 998 segments: 998 clonal and 0 subclonal.
#> Warning: [CNAqc] a karyotype column is present in CNA calls, and will be overwritten
#> ✔ 998 mutations mapped to clonal CNAs.
#> 
#> 
#> ── Genome coverage by karyotype, in basepairs. ──
#> 
#> # A tibble: 1 × 4
#>   minor Major     n karyotype
#>   <dbl> <dbl> <dbl> <chr>    
#> 1     1     1  2994 1:1      
#> ✔ n = 998 mutations mapped to CNA segments with karyotype 1:1 (largest available in basepairs).
#> ✔ Mutation with VAF within 0 and 0.7 ~ n = 996.
#> 
#> ── Analysing tumour sample with MOBSTER ────────────────────────────────────────
#> 
#>  [ MOBSTER fit ] 
#> 
#> ✔ Loaded input data, n = 996.
#> ❯ n = 996. Mixture with k = 1,2 Beta(s). Pareto tail: TRUE and FALSE. Output
#> clusters with π > 0.02 and n > 10.
#> ! mobster automatic setup FAST for the analysis.
#> ❯ Scoring (without parallel) 2 x 2 x 2 = 8 models by reICL.
#> 
#> 
#> 
#> ℹ MOBSTER fits completed in 6s.
#> 
#> ── [ MOBSTER ] My MOBSTER model n = 996 with k = 1 Beta(s) and a tail ──────────
#> ● Clusters: π = 79% [C1] and 21% [Tail], with π > 0.
#> ● Tail [n = 197, 21%] with alpha = 1.1.
#> ● Beta C1 [n = 799, 79%] with mean = 0.5.
#> ℹ Score(s): NLL = -1018.66; ICL = -1937.99 (-1995.89), H = 57.9 (0). Fit
#> converged by MM in 14 steps.
#> 
#> ℹ With CNA, TINC will estimating tumour purity adjusting by copy number and mutation multiplicity.
#> ℹ Mutant allele copies 1 for karyotype 1:1
#> Warning: You did not pass enough input colours, adding a gray colour
#> Available: C1, Tail
#> Missing: NA
#> Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
#> of ggplot2 3.3.4.
#> ℹ The deprecated feature was likely used in the mobster package.
#>   Please report the issue at <https://github.com/caravagnalab/mobster/issues>.
#> Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
#> ℹ Please use `after_stat(count)` instead.
#> ℹ The deprecated feature was likely used in the mobster package.
#>   Please report the issue at <https://github.com/caravagnalab/mobster/issues>.
#> 
#> ✔ MOBSTER found n = 796 clonal mutations from cluster C1
#> 
#> ── Analysing normal sample with BMix ───────────────────────────────────────────
#> 
#> 
#> ── BMix fit ────────────────────────────────────────────────────────────────────
#> 
#> ℹ Binomials k_B = 1 and 2, Beta-Binomials k_BB = 0; 4 fits to run.
#> 
#> ℹ Bmix best fit completed in 0 mins
#> 
#> ── [ BMix ] My BMix model n = 796 with k = 2 component(s) (2 + 0) ──────────────
#> • Clusters: π = 96% [Bin 2] and 4% [Bin 1], with π > 0.
#> • Binomial Bin 1 with mean = 0.0124061042253297.
#> • Binomial Bin 2 with mean = 0.00235670756683061.
#> ℹ Score (model selection): ICL = 652.17.
#> Scale for x is already present.
#> Adding another scale for x, which will replace the existing scale.
#> Scale for fill is already present.
#> Adding another scale for fill, which will replace the existing scale.
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_bar()`).
#> Warning: Removed 551 rows containing missing values or values outside the scale range
#> (`geom_raster()`).
#> ✔ Binomial peaks 0.0124061042253297 and 0.00235670756683061 with proportions 0.0419763425970468 and 0.958023657402953. Clonal score 0.00277854448386139 with TINN 0.00555708896772278
#> 
#> ── Analysing tumour and normal samples with VIBER ──────────────────────────────
#> 
#>  [ VIBER - variational fit ] 
#> 
#> ℹ Input n = 996, with k < 5. Dirichlet concentration α = 1e-06.
#> ℹ Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-06 or 1000 steps, r = 3 starts.
#> [easypar] 2024-04-25 09:22:50.701546 - Overriding parallel execution setup [TRUE] with global option : FALSE
#> 
#> ✔ VIBER fit completed in 0.01 mins (status: converged)
#> 
#> ── [ VIBER ] My VIBER model n = 996 (w = 2 dimensions). Fit with k = 5 clusters.
#> • Clusters: π = 81% [C2], 16% [C1], and 3% [C5], with π > 0.
#> • Binomials: θ = <0, 0.5> [C2], <0, 0.08> [C1], and <0, 0.24> [C5].
#> ℹ Score(s): ELBO = -148909.447. Fit converged in 29 steps, ε = 1e-06.
#> 
#> ✔ Reduced to k = 3 (from 5) selecting VIBER cluster(s) with π > 0.02, and Binomial p > 0 in w > 0 dimension(s).
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if
#> `.name_repair` is omitted as of tibble 2.0.0.
#> ℹ Using compatibility `.name_repair`.
#> ℹ The deprecated feature was likely used in the VIBER package.
#>   Please report the issue at <https://github.com/caravagnalab/VIBER/issues>.

print(x)
#> 
#> ── TINC profiler for bulk WGS ──────────────────────────────────────────────────
#> 
#> ℹ Copy Number data has been used for this analysis (karyotype 1:1)
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> ── [ CNAqc ] MySample 998 mutations in 998 segments (998 clonal, 0 subclonal). G
#> 
#> ── Clonal CNAs 
#> 
#>  1:1  [n = 998, L =   0 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■
#> 
#> ℹ Sample Purity: 80% ~ Ploidy: 2.
#> ────────────────────────────────────────────────────────────────────────────────
#> → Mutations data: n = 996 out of 998 within range (100%).
#> 
#>         TIT :  101% (RF 101)  ~ n = 796 clonal mutations, cluster C1 
#>         TIN :  1% (RF 1)  ~ n = 67 with VAF > 0 
#> 
#>    QC Tumour   High purity (>85%)
#>    QC Normal   No Contamination (<1%)

# Fit in the package
data('fit_example', package = 'TINC')

print(fit_example)
#> 
#> ── TINC profiler for bulk WGS ──────────────────────────────────────────────────
#> 
#> ℹ Copy Number data has been used for this analysis (karyotype 1:1)
#> 
#> ────────────────────────────────────────────────────────────────────────────────
#> ── [ CNAqc ]   mutations in 985 segments (985 clonal, 0 subclonal). Genome refer
#> 
#> ── Clonal CNAs 
#> 
#>  1:1  [n = 985, L =   0 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■
#> 
#> ℹ Sample Purity: 80% ~ Ploidy: 2.
#> ────────────────────────────────────────────────────────────────────────────────
#> → Mutations data: n = 985 out of 985 within range (100%).
#> 
#>         TIT :  78% (RF 78)  ~ n = 714 clonal mutations, cluster C1 
#>         TIN :  7% (RF 7)  ~ n = 694 with VAF > 0 
#> 
#>    QC Tumour   Good purity (65-85%)
#>    QC Normal   Some contamination (3-7%)