Lineage inference • lineaGT

library(lineaGT)
#> Warning: replacing previous import 'cli::num_ansi_colors' by
#> 'crayon::num_ansi_colors' when loading 'VIBER'
#> Warning: replacing previous import 'cli::num_ansi_colors' by
#> 'crayon::num_ansi_colors' when loading 'easypar'
#> ✔ Loading ctree, 'Clone trees in cancer'. Support : <https://caravagn.github.io/ctree/>
#> Warning: replacing previous import 'crayon::%+%' by 'ggplot2::%+%' when loading
#> 'VIBER'
#> ✔ Loading VIBER, 'Variational inference for multivariate Binomial mixtures'. Support : <https://caravagn.github.io/VIBER/>
#> ✔ Loading lineaGT, 'Lineage inference from gene therapy'. Support : <https://caravagnalab.github.io/lineaGT/>
#> ! The 'lineagt-env' environment is already loaded!
library(magrittr)

The coverage dataset can be filtered calling the filter_dataset() function.

data(cov.df.example)
data(vaf.df.example)

cov.example.filt = cov.df.example %>%
  filter_dataset(min_cov=5, min_frac=0.05)
#> ℹ Filtering the input dataset with minimum coverage 5 and minimum clusters frac…
#> ✔ Filtering the input dataset with minimum coverage 5 and minimum clusters frac…
#> 

cov.example.filt
#> # A tibble: 264 × 4
#>    IS    timepoints lineage coverage
#>    <chr> <chr>      <chr>      <int>
#>  1 IS100 t1         l1             0
#>  2 IS100 t2         l1           418
#>  3 IS100 t1         l2             0
#>  4 IS100 t2         l2            74
#>  5 IS101 t1         l1           502
#>  6 IS101 t2         l1           186
#>  7 IS101 t1         l2            62
#>  8 IS101 t2         l2           640
#>  9 IS11  t1         l1           128
#> 10 IS11  t2         l1           196
#> # ℹ 254 more rows

Fitting the model

x = fit(
  cov.df = cov.example.filt,
  vaf.df = vaf.df.example,
  steps = 500,
  # n_runs = 1,
  k_interval = c(5, 15),
  timepoints_to_int = unlist(list("t1"=60, "t2"=150))
  )
#> ℹ Starting lineaGT model selection to retrieve the optimal number of clones
#> ✔ Starting lineaGT model selection to retrieve the optimal number of clones ...…
#> 
#> ℹ Fitting model to cluster ISs
#> ✔ Found 8 clones of ISs!
#> 
#> ℹ Fitting model to cluster mutations
#> ℹ Starting clustering of clone C0 mutations
#>  [ VIBER - variational fit ] 
#> 
#> ℹ Input n = 3, with k < 3. Dirichlet concentration α = 1e-06.
#> ℹ Starting clustering of clone C0 mutationsℹ Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-10 or 5000 steps, r = 10 starts.
#> ℹ Starting clustering of clone C0 mutations
#> ✔ VIBER fit completed in 0.03 mins (status: converged)
#> ℹ Starting clustering of clone C0 mutations
#> ── [ VIBER ] My VIBER model n = 3 (w = 4 dimensions). Fit with k = 3 clusters. ─
#> ℹ Starting clustering of clone C0 mutations• Clusters: π = 67% [C3] and 33% [C1], with π > 0.
#> ℹ Starting clustering of clone C0 mutations• Binomials: θ = <0.09, 0.19, 0.01, 0> [C3] and <0.01, 0.36, 0.03, 0.4> [C1].
#> ℹ Starting clustering of clone C0 mutationsℹ Score(s): ELBO = -1461.828. Fit converged in 6 steps, ε = 1e-10.
#> ℹ Starting clustering of clone C0 mutations✔ Reduced to k = 2 (from 3) selecting VIBER cluster(s) with π > 0.166666666666667, and Binomial p > 0 in w > 0 dimension(s).
#> ℹ Starting clustering of clone C0 mutations✔ Starting clustering of clone C0 mutations ... done
#> 
#> ℹ Fitting model to cluster mutationsℹ Starting phylogeny inference of clone C0
#>  [ ctree ~ clone trees generator for C0 ] 
#> 
#> # A tibble: 3 × 8
#>   cluster   t1.l1 t2.l1  t1.l2 t2.l2 nMuts is.clonal is.driver
#>   <chr>     <dbl> <dbl>  <dbl> <dbl> <dbl> <lgl>     <lgl>    
#> 1 S1      0.00682 0.362 0.0288 0.396     1 FALSE     FALSE    
#> 2 S2      0.0856  0.190 0      0         2 FALSE     TRUE     
#> 3 C0      1       1     1      1         1 TRUE      FALSE
#> ✔ Trees per region 1, 2, 1, 1
#> ℹ Starting phylogeny inference of clone C0ℹ Total 2 tree structures - search is exahustive
#> ℹ Starting phylogeny inference of clone C0
#> ℹ Starting phylogeny inference of clone C0── Ranking trees 
#> ℹ Starting phylogeny inference of clone C0✔ 2  trees with non-zero score, storing 2
#> ℹ Starting phylogeny inference of clone C0✔ Starting phylogeny inference of clone C0 ... done
#> 
#> ℹ Fitting model to cluster mutationsℹ Starting clustering of clone C1 mutations
#>  [ VIBER - variational fit ] 
#> 
#> ℹ Input n = 8, with k < 8. Dirichlet concentration α = 1e-06.
#> ℹ Starting clustering of clone C1 mutationsℹ Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-10 or 5000 steps, r = 10 starts.
#> ℹ Starting clustering of clone C1 mutations
#> ✔ VIBER fit completed in 0.03 mins (status: converged)
#> ℹ Starting clustering of clone C1 mutations
#> ── [ VIBER ] My VIBER model n = 8 (w = 4 dimensions). Fit with k = 8 clusters. ─
#> ℹ Starting clustering of clone C1 mutations• Clusters: π = 37% [C4], 37% [C6], 13% [C1], and 13% [C3], with π > 0.
#> ℹ Starting clustering of clone C1 mutations• Binomials: θ = <0, 0.13, 0.01, 0.12> [C4], <0.11, 0, 0.01, 0> [C6], <0.43,
#> 0.01, 0.02, 0.32> [C1], and <0.2, 0.01, 0.02, 0.23> [C3].
#> ℹ Starting clustering of clone C1 mutationsℹ Score(s): ELBO = -4612.117. Fit converged in 6 steps, ε = 1e-10.
#> ℹ Starting clustering of clone C1 mutations✔ Reduced to k = 4 (from 8) selecting VIBER cluster(s) with π > 0.0625, and Binomial p > 0 in w > 0 dimension(s).
#> ℹ Starting clustering of clone C1 mutations✔ Starting clustering of clone C1 mutations ... done
#> 
#> ℹ Fitting model to cluster mutationsℹ Starting phylogeny inference of clone C1
#>  [ ctree ~ clone trees generator for C1 ] 
#> 
#> # A tibble: 5 × 8
#>   cluster   t1.l1   t2.l1   t1.l2 t2.l2 nMuts is.clonal is.driver
#>   <chr>     <dbl>   <dbl>   <dbl> <dbl> <dbl> <lgl>     <lgl>    
#> 1 S1      0.430   0.00558 0.0170  0.318     1 FALSE     FALSE    
#> 2 S2      0.198   0.00556 0.0170  0.225     1 FALSE     FALSE    
#> 3 S3      0.00117 0.127   0.00957 0.124     3 FALSE     TRUE     
#> 4 S4      0.115   0.00187 0       0         3 FALSE     FALSE    
#> 5 C1      1       1       1       1         1 TRUE      FALSE
#> ✔ Trees per region 6, 1, 2, 5
#> ℹ Starting phylogeny inference of clone C1ℹ Total 36 tree structures - search is exahustive
#> ℹ Starting phylogeny inference of clone C1
#> ℹ Starting phylogeny inference of clone C1── Ranking trees 
#> ℹ Starting phylogeny inference of clone C1✔ 24  trees with non-zero score, storing 24
#> ℹ Starting phylogeny inference of clone C1✔ Starting phylogeny inference of clone C1 ... done
#> 
#> ℹ Fitting model to cluster mutationsℹ Starting clustering of clone C4 mutations
#>  [ VIBER - variational fit ] 
#> 
#> ℹ Input n = 6, with k < 6. Dirichlet concentration α = 1e-06.
#> ℹ Starting clustering of clone C4 mutationsℹ Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-10 or 5000 steps, r = 10 starts.
#> ℹ Starting clustering of clone C4 mutations
#> ✔ VIBER fit completed in 0.03 mins (status: converged)
#> ℹ Starting clustering of clone C4 mutations
#> ── [ VIBER ] My VIBER model n = 6 (w = 4 dimensions). Fit with k = 6 clusters. ─
#> ℹ Starting clustering of clone C4 mutations• Clusters: π = 33% [C1], 17% [C2], 17% [C3], 17% [C4], and 17% [C5], with π >
#> 0.
#> ℹ Starting clustering of clone C4 mutations• Binomials: θ = <0.22, 0, 0.01, 0> [C1], <0.36, 0, 0.02, 0.28> [C2], <0.19,
#> 0.22, 0.3, 0.2> [C3], <0, 0, 0.02, 0.29> [C4], and <0, 0, 0.02, 0> [C5].
#> ℹ Starting clustering of clone C4 mutationsℹ Score(s): ELBO = -3580.914. Fit converged in 6 steps, ε = 1e-10.
#> ℹ Starting clustering of clone C4 mutations✔ Reduced to k = 5 (from 6) selecting VIBER cluster(s) with π > 0.0833333333333333, and Binomial p > 0 in w > 0 dimension(s).
#> ℹ Starting clustering of clone C4 mutations✔ Starting clustering of clone C4 mutations ... done
#> 
#> ℹ Fitting model to cluster mutationsℹ Starting phylogeny inference of clone C4
#>  [ ctree ~ clone trees generator for C4 ] 
#> 
#> # A tibble: 6 × 8
#>   cluster t1.l1   t2.l1  t1.l2 t2.l2 nMuts is.clonal is.driver
#>   <chr>   <dbl>   <dbl>  <dbl> <dbl> <dbl> <lgl>     <lgl>    
#> 1 S1      0.219 0.00239 0      0         2 FALSE     FALSE    
#> 2 S2      0.358 0.00476 0.0170 0.275     1 FALSE     FALSE    
#> 3 S3      0.192 0.222   0.296  0.198     1 FALSE     FALSE    
#> 4 S4      0     0       0.0186 0.292     1 FALSE     TRUE     
#> 5 S5      0     0       0      0         1 FALSE     FALSE    
#> 6 C4      1     1       1      1         1 TRUE      FALSE
#> ✔ Trees per region 5, 1, 6, 5
#> ℹ Starting phylogeny inference of clone C4ℹ Total 48 tree structures - search is exahustive
#> ℹ Starting phylogeny inference of clone C4✖ Starting phylogeny inference of clone C4 ... failed
#> 
#> ℹ Fitting model to cluster mutations
#> <subscriptOutOfBoundsError in model[var, ]: subscript out of bounds>
#> ℹ Starting clustering of clone C7 mutations
#>  [ VIBER - variational fit ] 
#> 
#> ℹ Input n = 2, with k < 2. Dirichlet concentration α = 1e-06.
#> ℹ Starting clustering of clone C7 mutationsℹ Beta (a_0, b_0) = (1, 1); q_i = prior. Optimise: ε = 1e-10 or 5000 steps, r = 10 starts.
#> ℹ Starting clustering of clone C7 mutations
#> ✔ VIBER fit completed in 0.03 mins (status: converged)
#> ℹ Starting clustering of clone C7 mutations
#> ── [ VIBER ] My VIBER model n = 2 (w = 4 dimensions). Fit with k = 2 clusters. ─
#> ℹ Starting clustering of clone C7 mutations• Clusters: π = 50% [C1] and 50% [C2], with π > 0.
#> ℹ Starting clustering of clone C7 mutations• Binomials: θ = <0.01, 0.11, 0, 0> [C1] and <0.4, 0, 0.31, 0.35> [C2].
#> ℹ Starting clustering of clone C7 mutationsℹ Score(s): ELBO = -1584.511. Fit converged in 5 steps, ε = 1e-10.
#> ℹ Starting clustering of clone C7 mutations✔ Starting clustering of clone C7 mutations ... done
#> 
#> ℹ Fitting model to cluster mutationsℹ Starting phylogeny inference of clone C7
#>  [ ctree ~ clone trees generator for C7 ] 
#> 
#> # A tibble: 3 × 8
#>   cluster  t1.l1   t2.l1 t1.l2 t2.l2 nMuts is.clonal is.driver
#>   <chr>    <dbl>   <dbl> <dbl> <dbl> <dbl> <lgl>     <lgl>    
#> 1 S1      0.0104 0.112   0     0         1 FALSE     FALSE    
#> 2 S2      0.396  0.00234 0.308 0.348     1 FALSE     TRUE     
#> 3 C7      1      1       1     1         1 TRUE      FALSE
#> ✔ Trees per region 2, 1, 1, 1
#> ℹ Starting phylogeny inference of clone C7ℹ Total 2 tree structures - search is exahustive
#> ℹ Starting phylogeny inference of clone C7
#> ℹ Starting phylogeny inference of clone C7── Ranking trees 
#> ℹ Starting phylogeny inference of clone C7✔ 2  trees with non-zero score, storing 2
#> ℹ Starting phylogeny inference of clone C7✔ Starting phylogeny inference of clone C7 ... done
#> 
#> ℹ Fitting model to cluster mutations✔ Fitting model to cluster mutations ... done
#> 
#> ℹ Fitting model to estimate population growth rates
#> ℹ Starting growth models inference of clone C0
#> ✔ Starting growth models inference of clone C0 ... done
#> 
#> ℹ Fitting model to estimate population growth ratesℹ Starting growth models inference of clone C1
#> ✔ Starting growth models inference of clone C1 ... done
#> 
#> ℹ Fitting model to estimate population growth ratesℹ Starting growth models inference of clone C2
#> ✔ Starting growth models inference of clone C2 ... done
#> 
#> ℹ Fitting model to estimate population growth ratesℹ Starting growth models inference of clone C3
#> ✔ Starting growth models inference of clone C3 ... done
#> 
#> ℹ Fitting model to estimate population growth ratesℹ Starting growth models inference of clone C4
#> ✔ Starting growth models inference of clone C4 ... done
#> 
#> ℹ Fitting model to estimate population growth ratesℹ Starting growth models inference of clone C5
#> ✔ Starting growth models inference of clone C5 ... done
#> 
#> ℹ Fitting model to estimate population growth ratesℹ Starting growth models inference of clone C6
#> ✔ Starting growth models inference of clone C6 ... done
#> 
#> ℹ Fitting model to estimate population growth ratesℹ Starting growth models inference of clone C7
#> ✔ Starting growth models inference of clone C7 ... done
#> 
#> ℹ Fitting model to estimate population growth rates✔ Fitting model to estimate population growth rates ... done

Printing the fitted object information regarding the data:

lineages and timpoints present in the data,
number of integration sites,
number of inferred clones of ISs, estimated via model selection on the input range of number of clusters,
for each clone, the number of assigned ISs and the mean coverage, per timepoint and lineage.

data(x.example)
x.example
#> ── [ lineaGT ]  ──── Python: /usr/share/miniconda/envs/lineagt-env/bin/python ──
#> → Lineages: l1 and l2.
#> → Timepoints: t1 and t2.
#> → Number of Insertion Sites: 66.
#> 
#> ── Optimal IS model with k = 8.
#> 
#>     C4 (19 ISs) : l1 [285, 209]; l2 [ 51, 492] 
#>     C1 (15 ISs) : l1 [245, 177]; l2 [ 23, 289] 
#>      C0 (6 ISs) : l1 [145, 240]; l2 [ 32, 373] 
#>      C2 (6 ISs) : l1 [  1, 547]; l2 [  1, 388] 
#>      C3 (6 ISs) : l1 [ 92, 109]; l2 [245, 751] 
#>      C5 (6 ISs) : l1 [  0, 551]; l2 [  1, 828] 
#>      C6 (4 ISs) : l1 [330,  16]; l2 [ 17,  38] 
#>      C7 (4 ISs) : l1 [  0, 426]; l2 [  1, 198]