Skip to contents

A mutation engine can label every node of a descendants forest by mutations and produce a consistent phylogenetic forest.

This function downloads and sets up the data requires by a mutation engine. Finally, it builds mutation engine itself.

Arguments

setup_code

The set-up code (alternative to directory).

directory

The set-up directory (alternative to setup_code).

reference_src

The reference genome path or URL (mandatory when directory is provided).

SBS_signatures_src

The SBS signature file path or URL (mandatory when directory is provided).

indel_signatures_src

The indel signature file path or URL (mandatory when directory is provided).

drivers_src

The driver mutation file path or URL (mandatory when directory is provided).

passenger_CNAs_src

The passenger CNAs file path or URL (mandatory when directory is provided).

germline_src

The germline directory path or URL (mandatory when directory is provided).

germline_subject

The germline subject (optional).

context_sampling

The number of reference contexts per context in the index (optional: default value is 100).

max_index_size

The maximum size of an admitted indel and, as a consequence, the maximum size of a motif stored in the repeated sequence index (optional: default value is 50).

max_repetition_storage

The maximum number of repetitions per type stored in the repeated sequence index (optional: default value is 500000).

tumour_type

The type of tumour. This is currently used to select the admissible passenger CNAs. If any passenger CNA in the dataset is admissible, use the the empty string "" (optional: default value is "").

tumour_study

The nationality code of the tumour study. This is used to select the admissible passenger CNAs. If any tumor study in the dataset is admissible, use the the empty string "" (optional: default value is "").

avoid_homozygous_losses

An optional Boolean flag to avoid homozygous losses. When set to TRUE, passenger CNAs will be exclusively applied to regions covered by two alleles at least. (default: TRUE).

quiet

An optional Boolean flag to avoid the progress bar (default: FALSE).

Value

A mutation engine object.

Details

The mutations are randomly generated according to three factors: - the mutational rates of the species involved in the descendants forest - the genotycal characterization of the mutants involved in the descendants forest, i.e., the SVNs and CNAs characterizing the mutant genotypes - the SBS signature coefficients active along the species simulation

These data are provided to a mutation engine by using the methods MutationEngine$add_exposure() and MutationEngine$add_exposure() These data are provided by means of the MutationEngine$add_mutant().

The construction of a MutationEngine object requires a reference sequence and an SBS file which are downloaded from the Internet. After the download a context index of the reference sequence is then automatically built. Thess processes may take time depending on the size of the reference sequence. Because of this, the downloaded files together with the context index are saved in a directory on the disk and they are available for successive MutationEngine constructions.

There are two building modalities: the first one is more general, but it requires to specify all the data sources; the second one adopts some pre-set configurations, but it is more convient than the former in many cases.

The first building modality requires to specify the directory in which the data must be saved, the path or URL of the reference sequence, the SBS file, the driver SNVs file, the passenger CNAs file, and the germline data directory through the parameters directory, reference_src, SBS_src, drivers_src, passenger_CNAs_src, and germline_src, respectively.

The second building modality exclusively requires a set-up code (parameter setup_code). The list of supported set-up codes can be obtained by using the function get_mutation_engine_codes().

The number of context sampling is an optional parameter that allows sampling the reference contexts while building the context index. This parameter, which is set to 100 by default, specifies how many occurences of the same context must be identified before adding one of them to the context index. The larger the number of context sampling, the larger the context index. On the other side, the lower the number of context sampling, the lower the number of sites in the reference genome that can be affected by simulated mutations.

If the parameters of a mutation engine construction match those of a previous construction, then the corresponding reference sequence, the SBS file, and the previously built context index are loaded from the set-up directory avoiding further computations.

See also

get_mutation_engine_codes() provides a list of the supported set-up codes.

MutationEngine$get_germline_subjects() to get the available germline subjects; MutationEngine$set_germline_subject() to set the active germline subject; MutationEngine$get_active_germline() to get the active germline subject.

Examples

# set the reference and SBS URLs
reference_url <- paste0("https://ftp.ensembl.org/pub/grch37/release-111/",
                        "fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.",
                        "dna.chromosome.22.fa.gz")
sbs_url <- paste0("https://cancer.sanger.ac.uk/signatures/documents/",
                  "2123/COSMIC_v3.4_SBS_GRCh37.txt")
indel_url <- paste0("https://cancer.sanger.ac.uk/signatures/documents/",
                    "2121/COSMIC_v3.4_ID_GRCh37.txt")
drivers_url <- paste0("https://raw.githubusercontent.com/",
                      "caravagnalab/rRACES/main/inst/extdata/",
                      "driver_mutations_hg19.csv")
passenger_CNAs_url <- paste0("https://raw.githubusercontent.com/",
                             "caravagnalab/rRACES/main/inst/extdata/",
                             "passenger_CNAs_hg19.csv")
germline_url <- paste0("https://zenodo.org/records/13166780/files/",
                       "germline_data_demo.tar.gz")

# build a mutation engine and save the required files into the
# directory "Test". The `drivers_url` parameter is optional, but
# it is suggested to avoid passenger mutations on driver loci.
m_engine <- MutationEngine(directory = "Test",
                                  reference_src = reference_url,
                                  SBS_signatures_src = sbs_url,
                                  indel_signatures_src = indel_url,
                                  drivers_src = drivers_url,
                                  passenger_CNAs_src = passenger_CNAs_url,
                                  germline_src = germline_url)
#> Downloading reference genome...
#> Reference genome downloaded
#> Decompressing reference file...done
#> Downloading SBS file...
#> SBS file downloaded
#> Downloading indel file...
#> indel file downloaded
#> Downloading driver mutation file...
#> Driver mutation file downloaded
#> Downloading passenger CNAs file...
#> Passenger CNAs file downloaded
#> Downloading germline mutations...
#> Germline mutations downloaded
#> Building context index...
#> 
 [█---------------------------------------] 0% [00m:00s] Processing chr. 22                                

 [█████████████████-----------------------] 40% [00m:01s] Processing chr. 22                               

 [█████████████████████████████████-------] 81% [00m:02s] Processing chr. 22                               

 [████████████████████████████████████████] 100% [00m:02s] Context index built                             

#> 
 [█---------------------------------------] 0% [00m:00s] Saving context index                              

 [████████████████████████████████████████] 100% [00m:00s] Context index saved                             

#> done
#> Building repeated sequence index...
#> 
 [█---------------------------------------] 0% [00m:00s] Processing chr. 22                                

 [█---------------------------------------] 0% [00m:00s] Processing chr. 22                                

 [█---------------------------------------] 0% [00m:03s] Processing chr. 22                                

 [█---------------------------------------] 0% [00m:04s] Processing chr. 22                                

 [█---------------------------------------] 0% [00m:06s] Processing chr. 22                                

 [█---------------------------------------] 0% [00m:07s] Processing chr. 22                                

 [█---------------------------------------] 0% [00m:08s] Processing chr. 22                                

 [█---------------------------------------] 0% [00m:12s] Processing chr. 22                                

 [████████████████████████████████████████] 100% [00m:12s] RS index built                                  

#> 
 [█---------------------------------------] 0% [00m:00s] Saving RS index                                   

 [████████████████████████----------------] 58% [00m:01s] Saving RS index                                  
done
#> 
 [████████████████████████████████████████] 100% [00m:01s] RS index saved                                  

#> 
 [█---------------------------------------] 0% [00m:00s] Loading germline                                  

 [████████████████████████████████████████] 100% [00m:00s] Germline loaded                                 

#> 
 [█---------------------------------------] 0% [00m:00s] Saving germline                                   

 [████████████████████████████████████████] 100% [00m:00s] Germline saved                                  


# if the parameters of a mutation engine construction match those of a
# previous construction, then the corresponding reference sequence,
# the SBS file, and the previously built context index are loaded from
# the set-up directory avoiding further computations.
m_engine <- MutationEngine("Test", reference_url, sbs_url,
                                  indel_url, drivers_url,
                                  passenger_CNAs_url, germline_url)
#> 
 [█---------------------------------------] 0% [00m:00s] Loading context index                             

 [████████████████████████████████████████] 100% [00m:00s] Context index loaded                            

#> 
 [█---------------------------------------] 0% [00m:00s] Loading RS index                                  

 [████████████████------------------------] 38% [00m:01s] Loading RS index                                 

 [██████████████████████████████----------] 74% [00m:02s] Loading RS index                                 

 [████████████████████████████████████████] 100% [00m:02s] RS index loaded                                 

#> 
 [█---------------------------------------] 0% [00m:00s] Loading germline                                  

 [████████████████████████████████████████] 100% [00m:00s] Germline loaded                                 


# if the `context_sampling` parameter changes, a new context index is
# built, while neither the reference sequence nor the SBS file are
# downloaded again.
m_engine <- MutationEngine("Test", reference_url, sbs_url,
                                  indel_url, drivers_url,
                                  passenger_CNAs_url, germline_url,
                                  context_sampling = 50)
#> Building context index...
#> 
 [█---------------------------------------] 0% [00m:00s] Processing chr. 22                                

 [█████████████████-----------------------] 40% [00m:01s] Processing chr. 22                               

 [██████████████████████████████----------] 73% [00m:02s] Processing chr. 22                               

 [████████████████████████████████████████] 100% [00m:03s] Context index built                             

#> 
 [█---------------------------------------] 0% [00m:00s] Saving context index                              

 [████████████████████████████████████████] 100% [00m:00s] Context index saved                             

#> done
#> 
 [█---------------------------------------] 0% [00m:00s] Loading RS index                                  

 [███████████████-------------------------] 37% [00m:01s] Loading RS index                                 

 [██████████████████████████████----------] 74% [00m:02s] Loading RS index                                 

 [████████████████████████████████████████] 100% [00m:02s] RS index loaded                                 

#> 
 [█---------------------------------------] 0% [00m:00s] Loading germline                                  

 [████████████████████████████████████████] 100% [00m:00s] Germline loaded                                 


# a futher contruction with the same parameters avoids both downloads
# and context index construction.
m_engine <- MutationEngine("Test", reference_url, sbs_url,
                                  indel_url, drivers_url,
                                  passenger_CNAs_url, germline_url,
                                  context_sampling = 50)
#> 
 [█---------------------------------------] 0% [00m:00s] Loading context index                             

 [████████████████████████████████████████] 100% [00m:00s] Context index loaded                            

#> 
 [█---------------------------------------] 0% [00m:00s] Loading RS index                                  

 [█████████████---------------------------] 30% [00m:01s] Loading RS index                                 

 [███████████████████████████-------------] 65% [00m:02s] Loading RS index                                 

 [████████████████████████████████████████] 100% [00m:03s] RS index loaded                                 

#> 
 [█---------------------------------------] 0% [00m:00s] Loading germline                                  

 [████████████████████████████████████████] 100% [00m:00s] Germline loaded                                 


m_engine
#> MutationEngine
#>  Passenger rates
#> 
#>  Driver mutations
#> 
#>  Timed Exposure
#>    SBS Timed Exposures
#> 
#>    indel Timed Exposures
#> 

# the parameters `directory`, `reference_src`, `SBS_src`, `drivers_src`,
# `passenger_CNAs_src`, and `germline_src` can be avoided by providing
# the `setup_code` parameter. The set-up code `demo` is provided among
# those available for testing purpose.
m_engine <- MutationEngine(setup_code = "demo")
#> 
 [█---------------------------------------] 0% [00m:00s] Loading context index                             

 [████████████████████████████████████████] 100% [00m:00s] Context index loaded                            

#> 
 [█---------------------------------------] 0% [00m:00s] Loading RS index                                  

 [███████████████-------------------------] 37% [00m:01s] Loading RS index                                 

 [█████████████████████████████-----------] 71% [00m:02s] Loading RS index                                 

 [███████████████████████████████████-----] 87% [00m:03s] Loading RS index                                 

 [████████████████████████████████████████] 100% [00m:03s] RS index loaded                                 

#> 
 [█---------------------------------------] 0% [00m:00s] Loading germline                                  

 [████████████████████████████████████████] 100% [00m:00s] Germline loaded                                 


# the `context_sampling` can be used also when a pre-defined set-up
# configuration is adopted.
m_engine <- MutationEngine(setup_code = "demo",
                                  context_sampling = 50)
#> Building context index...
#> 
 [█---------------------------------------] 0% [00m:00s] Processing chr. 22                                

 [█████████████---------------------------] 32% [00m:01s] Processing chr. 22                               

 [███████████████████████████-------------] 65% [00m:02s] Processing chr. 22                               

 [████████████████████████████████████████] 100% [00m:03s] Context index built                             

#> 
 [█---------------------------------------] 0% [00m:00s] Saving context index                              

 [████████████████████████████████████████] 100% [00m:00s] Context index saved                             

#> done
#> 
 [█---------------------------------------] 0% [00m:00s] Loading RS index                                  

 [████████████████------------------------] 38% [00m:01s] Loading RS index                                 

 [███████████████████████████████---------] 75% [00m:02s] Loading RS index                                 

 [████████████████████████████████████████] 100% [00m:02s] RS index loaded                                 

#> 
 [█---------------------------------------] 0% [00m:00s] Loading germline                                  

 [████████████████████████████████████████] 100% [00m:00s] Germline loaded                                 


m_engine
#> MutationEngine
#>  Passenger rates
#> 
#>  Driver mutations
#> 
#>  Timed Exposure
#>    SBS Timed Exposures
#> 
#>    indel Timed Exposures
#> 

# remove the "Test" and "demo" directories
unlink("Test", recursive = TRUE)
unlink("demo", recursive = TRUE)