Skip to contents

Disclaimer: RACES/rRACES internally implements the probability distributions using the C++11 random number distribution classes. The standard does not specify their algorithms, and the class implementations are left free for the compiler. Thus, the simulation output depends on the compiler used to compile RACES, and because of that, the results reported in this article may differ from those obtained by the reader.

The mutation engine places mutation on the sampled cell genome according to the infinite sites model by default. In particular, any new mutations is placed on a locus whose context is mutation-free.

Let us consider the phylogenetic forest as build in vignette("mutations") and verify whether it satisfies the infinite site model.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# this function verifies whether any mutation arised in
# two unrelated cells represented in a phylogenetic forest
test_infinite_sites_model <- function(phylo_forest) {
  # extract non-germinal SNVs that appear multiple times either
  # in the same cell or different cells
  snvs <- phylo_forest$get_sampled_cell_mutations() %>%
    filter(class != "germinal", .data$type == "SNV") %>%
    count(.data$chr, .data$chr_pos, .data$ref, .data$alt) %>%
    filter(n > 1)

  # search for an SNV that independently occurred in two unrelated cells
  first_occurrences <- c()
  row <- 1
  while (length(first_occurrences) < 2 && row <= nrow(snvs)) {
    snv <- SNV(snvs[row, "chr"], snvs[row, "chr_pos"],
               ref = snvs[row, "ref"], alt = snvs[row, "alt"])

    first_occurrences <- phylo_forest$get_first_occurrences(snv)
    row <- row + 1
  }

  # if the last handled SNV independently occurred in two unrelated
  # cells at least
  if (length(first_occurrences) >= 2) {

    # print a message containing the two cells
    paste0("SNV('", snv$get_chromosome(), "'',",
           snv$get_position_in_chromosome(), ",'", snv$get_ref(),
           "','", snv$get_alt(),
           "') independently arises in cells ", first_occurrences[1],
           " and ", first_occurrences[2])
  } else {
    print("Every mutation arises exclusively in one cell")
  }
}

# test whether the infinite sites conditions hold in the built forest
test_infinite_sites_model(phylo_forest)
#> [1] "Every mutation arises exclusively in one cell"

This behavior can be changed by using the mutation engine property MutationEngine$infinite_sites_model(). This property is a Boolean flag that enable/disable the infinite sites model.

# establish whether the infinite sites model is used
m_engine$infinite_sites_model
#> [1] TRUE

# disable it
m_engine$infinite_sites_model <- FALSE

When the infinite sites model is disabled, MutationEngine$place_mutations() may place two mutations in the same locus of different alleles of the same genome or the same mutation in the same locus of one allele of the genomes of two cells that are not each other ancestors.

# test whether the infinite sites model is enable
m_engine$infinite_sites_model
#> [1] FALSE

# place the mutations on the same samples forest used above
phylo_forest2 <- m_engine$place_mutations(samples_forest, 1000, 500)
#>  [█---------------------------------------] 0% [00m:00s] Placing mutations [██████████████████----------------------] 43% [00m:01s] Placing mutations [██████████████████████████████████------] 84% [00m:02s] Placing mutations [████████████████████████████████████----] 89% [00m:03s] Placing mutations [████████████████████████████████████████] 100% [00m:04s] Mutations placed

# test whether the infinite sites conditions hold in the new forest
test_infinite_sites_model(phylo_forest2)
#> [1] "SNV('22'',16055119,'C','T') independently arises in cells 177596 and 189688"