Skip to contents
library(lineaGT)
#> Warning: replacing previous import 'cli::num_ansi_colors' by
#> 'crayon::num_ansi_colors' when loading 'VIBER'
#> Warning: replacing previous import 'cli::num_ansi_colors' by
#> 'crayon::num_ansi_colors' when loading 'easypar'
#>  Loading ctree, 'Clone trees in cancer'. Support : <https://caravagn.github.io/ctree/>
#> Warning: replacing previous import 'crayon::%+%' by 'ggplot2::%+%' when loading
#> 'VIBER'
#>  Loading VIBER, 'Variational inference for multivariate Binomial mixtures'. Support : <https://caravagn.github.io/VIBER/>
#>  Loading lineaGT, 'Lineage inference from gene therapy'. Support : <https://caravagnalab.github.io/lineaGT/>
#> ! The 'lineagt-env' environment is already loaded!
library(magrittr)

The coverage dataset can be filtered calling the filter_dataset() function.

data(cov.df.example)
data(vaf.df.example)
cov.example.filt = cov.df.example %>%
  filter_dataset(min_cov=5, min_frac=0.05)
#>  Filtering the input dataset with minimum coverage 5 and minimum clusters frac…
#>  Filtering the input dataset with minimum coverage 5 and minimum clusters frac…
#> 

cov.example.filt
#> # A tibble: 264 × 4
#>    IS    timepoints lineage coverage
#>    <chr> <chr>      <chr>      <int>
#>  1 IS100 t1         l1             0
#>  2 IS100 t2         l1           418
#>  3 IS100 t1         l2             0
#>  4 IS100 t2         l2            74
#>  5 IS101 t1         l1           502
#>  6 IS101 t2         l1           186
#>  7 IS101 t1         l2            62
#>  8 IS101 t2         l2           640
#>  9 IS11  t1         l1           128
#> 10 IS11  t2         l1           196
#> # ℹ 254 more rows

Fitting the model

x = fit(
  cov.df = cov.example.filt,
  vaf.df = vaf.df.example,
  steps = 500,
  # n_runs = 1,
  k_interval = c(5, 15),
  timepoints_to_int = unlist(list("t1"=60, "t2"=150))
  )
#>  Starting lineaGT model selection to retrieve the optimal number of clones
#>  Starting lineaGT model selection to retrieve the optimal number of clones ...…
#> 
#>  Fitting model to cluster ISs
#>  Found 8 clones of ISs!
#> 
#>  Fitting model to cluster mutations
#>  Starting clustering of clone C0 mutations
#>  [ VIBER - variational fit ] 
#> 
#>  Input n = 3, with k < 3. Dirichlet concentration α = 1e-06.
#>  Starting clustering of clone C0 mutations Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-10 or 5000 steps, r = 10 starts.
#>  Starting clustering of clone C0 mutations
#>  VIBER fit completed in 0.03 mins (status: converged)
#>  Starting clustering of clone C0 mutations
#> ── [ VIBER ] My VIBER model n = 3 (w = 4 dimensions). Fit with k = 3 clusters. ─
#>  Starting clustering of clone C0 mutations• Clusters: π = 67% [C2] and 33% [C3], with π > 0.
#>  Starting clustering of clone C0 mutations• Binomials: θ = <0.09, 0.19, 0.01, 0> [C2] and <0.01, 0.36, 0.03, 0.4> [C3].
#>  Starting clustering of clone C0 mutations Score(s): ELBO = -1461.821. Fit converged in 6 steps, ε = 1e-10.
#>  Starting clustering of clone C0 mutations Reduced to k = 2 (from 3) selecting VIBER cluster(s) with π > 0.166666666666667, and Binomial p > 0 in w > 0 dimension(s).
#>  Starting clustering of clone C0 mutations Starting clustering of clone C0 mutations ... done
#> 
#>  Fitting model to cluster mutations Starting phylogeny inference of clone C0
#>  [ ctree ~ clone trees generator for C0 ] 
#> 
#> # A tibble: 3 × 8
#>   cluster   t1.l1 t2.l1  t1.l2 t2.l2 nMuts is.clonal is.driver
#>   <chr>     <dbl> <dbl>  <dbl> <dbl> <dbl> <lgl>     <lgl>    
#> 1 S1      0.0856  0.190 0      0         2 FALSE     TRUE     
#> 2 S2      0.00681 0.362 0.0287 0.396     1 FALSE     FALSE    
#> 3 C0      1       1     1      1         1 TRUE      FALSE
#>  Trees per region 1, 2, 1, 1
#>  Starting phylogeny inference of clone C0 Total 2 tree structures - search is exahustive
#>  Starting phylogeny inference of clone C0
#>  Starting phylogeny inference of clone C0── Ranking trees 
#>  Starting phylogeny inference of clone C0 2  trees with non-zero score, storing 2
#>  Starting phylogeny inference of clone C0 Starting phylogeny inference of clone C0 ... done
#> 
#>  Fitting model to cluster mutations Starting clustering of clone C1 mutations
#>  [ VIBER - variational fit ] 
#> 
#>  Input n = 8, with k < 8. Dirichlet concentration α = 1e-06.
#>  Starting clustering of clone C1 mutations Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-10 or 5000 steps, r = 10 starts.
#>  Starting clustering of clone C1 mutations
#>  VIBER fit completed in 0.03 mins (status: converged)
#>  Starting clustering of clone C1 mutations
#> ── [ VIBER ] My VIBER model n = 8 (w = 4 dimensions). Fit with k = 8 clusters. ─
#>  Starting clustering of clone C1 mutations• Clusters: π = 25% [C2], 25% [C5], 25% [C7], and 25% [C8], with π > 0.
#>  Starting clustering of clone C1 mutations• Binomials: θ = <0, 0.08, 0.01, 0.15> [C2], <0.31, 0, 0.01, 0.27> [C5], <0.18,
#> 0, 0.01, 0> [C7], and <0, 0.14, 0.01, 0> [C8].
#>  Starting clustering of clone C1 mutations Score(s): ELBO = -4567.679. Fit converged in 8 steps, ε = 1e-10.
#>  Starting clustering of clone C1 mutations Reduced to k = 4 (from 8) selecting VIBER cluster(s) with π > 0.0625, and Binomial p > 0 in w > 0 dimension(s).
#>  Starting clustering of clone C1 mutations Starting clustering of clone C1 mutations ... done
#> 
#>  Fitting model to cluster mutations Starting phylogeny inference of clone C1
#>  [ ctree ~ clone trees generator for C1 ] 
#> 
#> # A tibble: 5 × 8
#>   cluster   t1.l1   t2.l1   t1.l2 t2.l2 nMuts is.clonal is.driver
#>   <chr>     <dbl>   <dbl>   <dbl> <dbl> <dbl> <lgl>     <lgl>    
#> 1 S1      0.00162 0.0756  0.0122  0.147     2 FALSE     TRUE     
#> 2 S2      0.314   0.00279 0.00864 0.271     2 FALSE     FALSE    
#> 3 S3      0.184   0.00280 0       0         2 FALSE     FALSE    
#> 4 S4      0.00163 0.142   0       0         2 FALSE     FALSE    
#> 5 C1      1       1       1       1         1 TRUE      FALSE
#>  Trees per region 2, 2, 1, 2
#>  Starting phylogeny inference of clone C1 Total 6 tree structures - search is exahustive
#>  Starting phylogeny inference of clone C1
#>  Starting phylogeny inference of clone C1── Ranking trees 
#>  Starting phylogeny inference of clone C1 6  trees with non-zero score, storing 6
#>  Starting phylogeny inference of clone C1 Starting phylogeny inference of clone C1 ... done
#> 
#>  Fitting model to cluster mutations Starting clustering of clone C4 mutations
#>  [ VIBER - variational fit ] 
#> 
#>  Input n = 6, with k < 6. Dirichlet concentration α = 1e-06.
#>  Starting clustering of clone C4 mutations Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-10 or 5000 steps, r = 10 starts.
#>  Starting clustering of clone C4 mutations
#>  VIBER fit completed in 0.03 mins (status: converged)
#>  Starting clustering of clone C4 mutations
#> ── [ VIBER ] My VIBER model n = 6 (w = 4 dimensions). Fit with k = 6 clusters. ─
#>  Starting clustering of clone C4 mutations• Clusters: π = 50% [C2], 17% [C1], 17% [C4], and 17% [C6], with π > 0.
#>  Starting clustering of clone C4 mutations• Binomials: θ = <0.15, 0, 0.01, 0> [C2], <0.19, 0.22, 0.3, 0.2> [C1], <0, 0,
#> 0.02, 0.29> [C4], and <0.36, 0, 0.02, 0.28> [C6].
#>  Starting clustering of clone C4 mutations Score(s): ELBO = -3594.654. Fit converged in 6 steps, ε = 1e-10.
#>  Starting clustering of clone C4 mutations Reduced to k = 4 (from 6) selecting VIBER cluster(s) with π > 0.0833333333333333, and Binomial p > 0 in w > 0 dimension(s).
#>  Starting clustering of clone C4 mutations Starting clustering of clone C4 mutations ... done
#> 
#>  Fitting model to cluster mutations Starting phylogeny inference of clone C4
#>  [ ctree ~ clone trees generator for C4 ] 
#> 
#> # A tibble: 5 × 8
#>   cluster t1.l1   t2.l1  t1.l2 t2.l2 nMuts is.clonal is.driver
#>   <chr>   <dbl>   <dbl>  <dbl> <dbl> <dbl> <lgl>     <lgl>    
#> 1 S1      0.192 0.222   0.296  0.198     1 FALSE     FALSE    
#> 2 S2      0.146 0.00159 0      0         3 FALSE     FALSE    
#> 3 S3      0     0       0.0186 0.292     1 FALSE     TRUE     
#> 4 S4      0.358 0.00474 0.0171 0.275     1 FALSE     FALSE    
#> 5 C4      1     1       1      1         1 TRUE      FALSE
#>  Trees per region 6, 1, 6, 5
#>  Starting phylogeny inference of clone C4 Total 54 tree structures - search is exahustive
#>  Starting phylogeny inference of clone C4
#>  Starting phylogeny inference of clone C4── Ranking trees 
#>  Starting phylogeny inference of clone C4 33  trees with non-zero score, storing 33
#>  Starting phylogeny inference of clone C4 Starting phylogeny inference of clone C4 ... done
#> 
#>  Fitting model to cluster mutations Starting clustering of clone C7 mutations
#>  [ VIBER - variational fit ] 
#> 
#>  Input n = 2, with k < 2. Dirichlet concentration α = 1e-06.
#>  Starting clustering of clone C7 mutations Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-10 or 5000 steps, r = 10 starts.
#>  Starting clustering of clone C7 mutations
#>  VIBER fit completed in 0.03 mins (status: converged)
#>  Starting clustering of clone C7 mutations
#> ── [ VIBER ] My VIBER model n = 2 (w = 4 dimensions). Fit with k = 2 clusters. ─
#>  Starting clustering of clone C7 mutations• Clusters: π = 50% [C1] and 50% [C2], with π > 0.
#>  Starting clustering of clone C7 mutations• Binomials: θ = <0.4, 0, 0.31, 0.35> [C1] and <0.01, 0.11, 0, 0> [C2].
#>  Starting clustering of clone C7 mutations Score(s): ELBO = -1584.505. Fit converged in 5 steps, ε = 1e-10.
#>  Starting clustering of clone C7 mutations Starting clustering of clone C7 mutations ... done
#> 
#>  Fitting model to cluster mutations Starting phylogeny inference of clone C7
#>  [ ctree ~ clone trees generator for C7 ] 
#> 
#> # A tibble: 3 × 8
#>   cluster  t1.l1   t2.l1 t1.l2 t2.l2 nMuts is.clonal is.driver
#>   <chr>    <dbl>   <dbl> <dbl> <dbl> <dbl> <lgl>     <lgl>    
#> 1 S1      0.396  0.00234 0.308 0.348     1 FALSE     TRUE     
#> 2 S2      0.0104 0.112   0     0         1 FALSE     FALSE    
#> 3 C7      1      1       1     1         1 TRUE      FALSE
#>  Trees per region 2, 1, 1, 1
#>  Starting phylogeny inference of clone C7 Total 2 tree structures - search is exahustive
#>  Starting phylogeny inference of clone C7
#>  Starting phylogeny inference of clone C7── Ranking trees 
#>  Starting phylogeny inference of clone C7 2  trees with non-zero score, storing 2
#>  Starting phylogeny inference of clone C7 Starting phylogeny inference of clone C7 ... done
#> 
#>  Fitting model to cluster mutations Fitting model to cluster mutations ... done
#> 
#>  Fitting model to estimate population growth rates
#>  Starting growth models inference of clone C0
#>  Starting growth models inference of clone C0 ... done
#> 
#>  Fitting model to estimate population growth rates Starting growth models inference of clone C1
#>  Starting growth models inference of clone C1 ... done
#> 
#>  Fitting model to estimate population growth rates Starting growth models inference of clone C2
#>  Starting growth models inference of clone C2 ... done
#> 
#>  Fitting model to estimate population growth rates Starting growth models inference of clone C3
#>  Starting growth models inference of clone C3 ... done
#> 
#>  Fitting model to estimate population growth rates Starting growth models inference of clone C4
#>  Starting growth models inference of clone C4 ... done
#> 
#>  Fitting model to estimate population growth rates Starting growth models inference of clone C5
#>  Starting growth models inference of clone C5 ... done
#> 
#>  Fitting model to estimate population growth rates Starting growth models inference of clone C6
#>  Starting growth models inference of clone C6 ... done
#> 
#>  Fitting model to estimate population growth rates Starting growth models inference of clone C7
#>  Starting growth models inference of clone C7 ... done
#> 
#>  Fitting model to estimate population growth rates Fitting model to estimate population growth rates ... done

Printing the fitted object information regarding the data:

  • lineages and timpoints present in the data,

  • number of integration sites,

  • number of inferred clones of ISs, estimated via model selection on the input range of number of clusters,

  • for each clone, the number of assigned ISs and the mean coverage, per timepoint and lineage.

data(x.example)
x.example
#> ── [ lineaGT ]  ──── Python: /usr/share/miniconda/envs/lineagt-env/bin/python ──
#> → Lineages: l1 and l2.
#> → Timepoints: t1 and t2.
#> → Number of Insertion Sites: 66.
#> 
#> ── Optimal IS model with k = 8.
#> 
#>     C4 (19 ISs) : l1 [285, 209]; l2 [ 51, 492] 
#>     C1 (15 ISs) : l1 [245, 177]; l2 [ 23, 289] 
#>      C0 (6 ISs) : l1 [145, 240]; l2 [ 32, 373] 
#>      C2 (6 ISs) : l1 [  1, 547]; l2 [  1, 388] 
#>      C3 (6 ISs) : l1 [ 92, 109]; l2 [245, 751] 
#>      C5 (6 ISs) : l1 [  0, 551]; l2 [  1, 828] 
#>      C6 (4 ISs) : l1 [330,  16]; l2 [ 17,  38] 
#>      C7 (4 ISs) : l1 [  0, 426]; l2 [  1, 198]