Skip to contents

Disclaimer: ProCESS/CLONES internally implements the probability distributions using the C++11 random number distribution classes. The standard does not specify their algorithms, and the class implementations are left free for the compiler. Thus, the simulation output depends on the compiler used to compile CLONES, and because of that, the results reported in this article may differ from those obtained by the reader.

Once one has familiarised oneself with how a tumour evolution simulation can be programmed using ProCESS (see the article on tissue simulation), the next step is to augment the simulation with sampling of tumour cells. This mimics a realistic experimental design in which we collect tumour sequencing data.

This article introduces sampling using different types of models; starting from simpler to more complex simulation scenarios. We consider:

  • multi-region sampling: where at every time point multiple spatially-separated samples are collected;

  • longitudinal sampling: where the sampling is repeated at multiple time-points.

Custom multi-region sampling

We consider a simple monoclonal model, without an epi-state.

library(ProCESS)

# set the seed of the random number generator
set.seed(0)

# Monoclonal model, no epigenetic state
sim <- TissueSimulation("Monoclonal")

sim$add_mutant(name = "A", growth_rates = 0.1, death_rates = 0.01)

sim$place_cell("A", 500, 500)
sim$run_up_to_size("A", 60000)
#>  [█████████████████████████---------------] 60% [00m:00s] Cells: 36130 [███████████████████████████████████████-] 95% [00m:00s] Cells: 57471 [████████████████████████████████████████] 100% [00m:01s] Saving snapshot
current <- plot_tissue(sim)
current

A monoclonal tumour consisting of 60000 cells.

A sample is defined by a name and a bounding box, which has (x,y) coordinates for the bottom-left and top-right points.

For this simulation, we define two samples with names S_1_1 and S_1_2.

# We collect a squared box of (bbox_width x bbox_width) cells
bbox_width <- 50

# Box A1
bbox1_p <- c(400, 400)
bbox1_q <- bbox1_p + bbox_width

# Box B1
bbox2_p <- c(500, 500)
bbox2_q <- bbox2_p + bbox_width

library(ggplot2)

# View the boxes
current +
  geom_rect(xmin = bbox1_p[1], xmax = bbox1_q[2],
            ymin = bbox1_p[1], ymax = bbox1_q[2],
            fill = NA, color = "black") +
  geom_rect(xmin = bbox2_p[1], xmax = bbox2_q[2],
            ymin = bbox2_p[1], ymax = bbox2_q[2],
            fill = NA, color = "black")

The rectangular regions corresponding to the samples S_1_1 and S_1_2.


# Sampling
sim$sample_cells("S_1_1", bottom_left = bbox1_p, top_right = bbox1_q)
sim$sample_cells("S_1_2", bottom_left = bbox2_p, top_right = bbox2_q)

Note: Sampling removes cells from the tissue, as if the tissue were surgically resected. Therefore, cells that are mapped to the bounding box after application of TissueSimulation$sample_cells() are no longer part of the simulation.

A new call to plot_tissue() will show the box where the cells have been removed to be white.

The sampled tissue.

This is also reflected by TissueSimulation$get_cells(), which now will not find any tumour cells in the sampled region.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# This should be empty
sim$get_cells(c(400, 400), c(400 + bbox_width, 400 + bbox_width)) %>% head
#> [1] cell_id    mutant     epistate   position_x position_y
#> <0 rows> (or 0-length row.names)

The sampling process exclusively collects tumour cells, while excluding wild-type cells.

The sample forest

Every sampled cell is linked, at the evolutionary level, to the other cells that originate from the same initial cell. It helps to visualise the evolutionary information on the cells that we have sampled as a forest of trees (if one seeded multiple initial cells). The forest is an object of the R6 class SampleForest.

forest <- sim$get_sample_forest()

forest
#> SampleForest
#>   # of trees: 1
#>   # of nodes: 20886
#>   # of leaves: 5122
#>   samples: {"S_1_1", "S_1_2"}

The forest has methods to obtain the nodes of the sampled cells.

forest$get_nodes() %>% head
#>   cell_id ancestor depth mutant epistate sample birth_time
#> 1       0       NA     0      A            <NA>   0.000000
#> 2       1        0     1      A            <NA>   5.741436
#> 3       2        0     1      A            <NA>   5.741436
#> 4       3        2     2      A            <NA>   8.545185
#> 5       4        2     2      A            <NA>   8.545185
#> 6       5        1     2      A            <NA>  11.583620

The leaves of the forest are sampled cells, while the internal nodes are their ancestors. The field sample is not available for internal nodes, and reports the sample name otherwise.

# The leaves in the forest represent sampled cells
forest$get_nodes() %>%
  filter(!is.na(.data$sample)) %>%
  head
#>   cell_id ancestor depth mutant epistate sample birth_time
#> 1   40462    24937    31      A           S_1_2   297.8990
#> 2   47930    26730    53      A           S_1_2   315.8640
#> 3   49760    48031    79      A           S_1_2   319.7190
#> 4   51293    17147    34      A           S_1_2   323.0180
#> 5   51821    17460    53      A           S_1_2   324.1881
#> 6   53934    45810    70      A           S_1_2   328.6181

The roots of the forest have no ancestors.

# If it is one cell, then the forest is a tree
forest$get_nodes() %>%
  filter(is.na(.data$ancestor))
#>   cell_id ancestor depth mutant epistate sample birth_time
#> 1       0       NA     0      A            <NA>          0

We can also query the forest about the samples used to build it.

forest$get_samples_info()
#>    name id xmin ymin xmax ymax tumour_cells tumour_cells_in_bbox    time
#> 1 S_1_1  0  400  400  450  450         2563                 2563 625.721
#> 2 S_1_2  1  500  500  550  550         2559                 2559 625.721

We can visualise the forest. This plot reports the cells and, on the y-axis, their time of birth.

plot_forest(forest)

The sample forest of S_1_1 and S_1_2.

The plot also shows sample annotations and species, but for a large number of cells, it can be difficult to view the full forest, unless a very large canvas is used. For this reason, it is possible to subset the forest.

# Extract the subforest linked to sample
S_1_1_forest <- forest$get_subforest_for("S_1_1")

plot_forest(S_1_1_forest)

The sub-forest of sample S_1_1.

In general, these plots can be annotated with extra information, such as the sampling times and the MRCAs of each sample in the forest.

# Full plot
plot_forest(forest) %>%
  annotate_forest(forest)

The annotated forest of S_1_1 and S_1_2.

# S_1_1 plot
plot_forest(S_1_1_forest) %>%
  annotate_forest(S_1_1_forest)

The annotated sub-forest of the sample S_1_1.

Randomised multi-region samples

# set the seed of the random number generator
set.seed(0)

sim <- TissueSimulation("Randomised")

sim$add_mutant(name = "A", growth_rates = 0.1, death_rates = 0.01)

sim$place_cell("A", 500, 500)
sim$run_up_to_size("A", 60000)
#>  [████████████████████████----------------] 58% [00m:00s] Cells: 35110 [█████████████████████████████████████---] 91% [00m:00s] Cells: 55082 [████████████████████████████████████████] 100% [00m:01s] Saving snapshot

We include a new mutant and let it grow. This new mutant has much higher growth rates than its ancestor.

# Add a new mutant
sim$add_mutant(name = "B", growth_rates = 1, death_rates = 0.01)
sim$mutate_progeny(sim$choose_cell_in("A"), "B")

sim$run_up_to_size("B", 10000)
#>  [████████████████████████████████████████] 100% [00m:00s] Saving snapshot

current <- plot_tissue(sim)
current

A tumour having two mutants/species with different rates.

Since the mutant start has been randomised by TissueSimulation$choose_cell_in(), we have no exact idea of where to sample to obtain, for example, 100 of its cells.We can visually inspect the simulation, but it is slow.

ProCESS provides a TissueSimulation$search_sample() function to sample bounding boxes that contain a desired number of cells. The function takes in input:

  • a bounding box size;
  • the number n of cells to sample for a species of interest.

TissueSimulation$search_sample() will attempt a fixed number of times to sample the box, starting from positions occupied by the species of interest. If a box that contains at least n cells is not found within a number of attempts, then the one with the largest number of samples is returned.

This allows for programming sampling without having a clear idea of the tissue conformation.

# A bounding box 50x50 with at least 100 cells of species B
n_w <- n_h <- 50
ncells <- 0.8 * n_w * n_h

# Sampling ncells with random box sampling of boxes of size n_w x n_h
bbox <- sim$search_sample(c("B" = ncells), n_w, n_h)

# plot the bounding box
current +
  geom_rect(xmin = bbox$lower_corner[1], xmax = bbox$upper_corner[1],
            ymin = bbox$lower_corner[2], ymax = bbox$upper_corner[2],
            fill = NA, color = "black")

The rectangle delimiting the sample S_2_1 containing 2000 cells of mutant B at least.

# sample the tissue
sim$sample_cells("S_2_1", bbox$lower_corner, bbox$upper_corner)

Something similar to species A.

bbox <- sim$search_sample(c("A" = ncells), n_w, n_h)

# plot the bounding box
current +
  geom_rect(xmin = bbox$lower_corner[1], xmax = bbox$upper_corner[1],
            ymin = bbox$lower_corner[2], ymax = bbox$upper_corner[2],
            fill = NA, color = "black")

The rectangle delimiting the sample S_2_2 containing 2000 cells of mutant A at least.

# sample the tissue
sim$sample_cells("S_2_2", bbox$lower_corner, bbox$upper_corner)

The two samples have been extracted.

The tissue of the biclonal tumour after the two samples have been collected.

We can build the sample forest.

forest <- sim$get_sample_forest()

plot_forest(forest) %>%
  annotate_forest(forest)

The tissue of the biclonal tumour after the collection of the two samples.

Randomised cell sampling (Liquid biopsy)

ProCESS supports randomised cell sampling over the full tissue or a rectangle thereof.

# collect up to 2500 tumour cells randomly selected over the whole tissue
sim$sample_cells("S_2_3", num_of_cells = 2500)

bbox <- sim$search_sample(c("A" = ncells), n_w, n_h)

# collect up to 200 tumour cells randomly selected in the provided
# bounding box
sim$sample_cells("S_2_4", bbox$lower_corner, bbox$upper_corner, 200)

forest <- sim$get_sample_forest()

plot_forest(forest) %>%
  annotate_forest(forest)

Two further samples S_2_3 and S_2_4 are collected. Their cells are randomly picked-up (like in liquid biopsy) from the whole tissue and from a bounding box, respectively. The figure depicts the sample forest of all the samples.

Two populations with an epi-state

We are now ready to simulate a model with epigenetic switches and sub-clonal expansions.

# set the seed of the random number generator
set.seed(0)

sim <- TissueSimulation("Two Populations")

sim$death_activation_level <- 20

# First mutant
sim$add_mutant(name = "A",
               epigenetic_rates = c("+-" = 0.01, "-+" = 0.01),
               growth_rates = c("+" = 0.1, "-" = 0.08),
               death_rates = c("+" = 0.1, "-" = 0.01))

sim$place_cell("A+", 500, 500)
sim$run_up_to_size("A+", 1000)
#>  [███████████████████████████████████-----] 87% [00m:00s] Cells: 29232 [████████████████████████████████████████] 100% [00m:00s] Saving snapshot
plot_tissue(sim, num_of_bins = 500)

A tumour having one mutants and two species with different rates.

We sample before introducing a new mutant.

bbox_width <- 10

sim$sample_cells("S_1_1",
                 bottom_left = c(480, 480),
                 top_right = c(480 + bbox_width, 480 + bbox_width))

sim$sample_cells("S_1_2",
                 bottom_left = c(500, 500),
                 top_right = c(500 + bbox_width, 500 + bbox_width))

plot_tissue(sim, num_of_bins = 500)

The sampled tumour.

# Let it grow a bit more
sim$run_up_to_time(sim$get_clock() + 15)
#>  [████████████████████████████████████████] 100% [00m:00s] Saving snapshot

plot_tissue(sim, num_of_bins = 500)

The same tumour after 15 simulated time units have been elapsed.

Add a new submutant.

cell <- sim$choose_cell_in("A")

sim$add_mutant(name = "B",
               epigenetic_rates = c("+-" = 0.05, "-+" = 0.1),
               growth_rates = c("+" = 0.8, "-" = 0.3),
               death_rates = c("+" = 0.05, "-" = 0.05))

sim$mutate_progeny(cell, "B")

# let it grow more time units
sim$run_up_to_size("B+", 7000)
#>  [████████████████████████████████████████] 100% [00m:00s] Saving snapshot

plot_tissue(sim, num_of_bins = 500)

A new mutant with epigenetic state arose.

Sample again and plot the tissue

n_w <- n_h <- 25
ncells <- 0.9 * n_w * n_h

bbox <- sim$search_sample(c("A" = ncells), n_w, n_h)
sim$sample_cells("S_2_1", bbox$lower_corner, bbox$upper_corner)

bbox <- sim$search_sample(c("B" = ncells), n_w, n_h)
sim$sample_cells("S_2_2", bbox$lower_corner, bbox$upper_corner)

plot_tissue(sim, num_of_bins = 500)

The tumour tissue after two samples have been collected.

The Muller plot of the biclonal tumour with epigenetic states.

Now we show the sample forest, which starts being rather complicated

forest <- sim$get_sample_forest()

plot_forest(forest) %>%
  annotate_forest(forest)

The sample forest of the biclonal tumour with epigenetic states.

Storing Samples Forests

A sample forest can be saved in a file by using the method SampleForest$save().

# check the file existence. It should not exist.
file.exists("sample_forest.sff")
#> [1] FALSE

# save the sample forest in the file "sample_forest.sff"
forest$save("sample_forest.sff")

# check the file existence. It now exists.
file.exists("sample_forest.sff")
#> [1] TRUE

The saved sample forest can successively be loaded by using the function load_sample_forest().

# load the sample forest from "sample_forest.sff" and store it in `forest2`
forest2 <- load_sample_forest("sample_forest.sff")

# let us now compare the sample forests stored in `forest` and `forest2`;
# they should be the same.
forest
#> SampleForest
#>   # of trees: 1
#>   # of nodes: 6065
#>   # of leaves: 1476
#>   samples: {"S_1_1", "S_1_2", "S_2_1", "S_2_2"}
forest2
#> SampleForest
#>   # of trees: 1
#>   # of nodes: 6065
#>   # of leaves: 1476
#>   samples: {"S_1_1", "S_1_2", "S_2_1", "S_2_2"}