Skip to contents

This method simulates a wild-type sample sequencing in a phylogenetic forest. Add the cells in the wild-type sample contains the germline mutations. The forest pre-neoplastic mutations are also added to the sample by default. However, they can be avoided by using the parameter with_preneoplastic.

Arguments

phylo_forest

A phylogenetic forest.

sequencer

The sequencer that performs the sequencing simulation (default: an ErrorlessIlluminaSequencer).

reference_genome

The reference genome (default: NULL to use the mutation engine reference genome).

chromosomes

The chromosomes that must be considered (default: NULL, i.e., all the reference chromosomes).

coverage

The sequencing coverage (default: 10).

read_size

The read size (default: 150).

insert_size_mean

The insert size mean. Use 0 for single read sequencing and any value greater than 0 for pair read sequencing (default: 0).

insert_size_stddev

The insert size standard deviation. (default: 10).

output_dir

The SAM output directory (default: "rRACES_normal_SAM").

write_SAM

A Boolean flag to enable/disable SAM generation (default: TRUE).

update_SAM

Update the output directory (default: FALSE).

with_preneoplastic

Add the forest pre-neoplastic mutations to the sample cells. (default: TRUE).

filename_prefix

The prefix of the output SAM file name (default: "chr_").

template_name_prefix

The template name prefix (default: "r").

include_non_sequenced_mutations

A Boolean flag to include in the resulting data frame also the mutations that are not covered by any of the simulated reads, but occur to one of the samples at least (default: FALSE).

seed

The random seed for the internal random generator (optional).

Value

A named list of two elements: the sequencing output data frame (name "mutations") and the calling parameters (name "parameters").

The sequencing output data frame reports, for each of the observed SNVs and indels, the chromosome and the position in which it occurs (columns chr and chr_pos), the SNV reference base, the alterate base, the causes, and the classes of the SNV (columns ref_base, alt_base, causes, and classes, respectively). Moreover, for each of the sequencied samples normal_sample, the returned data frame contains three columns: the number of reads in which the corresponding SNV occurs (column normal_sample.occurrences), the coverage of the SNV locus (column normal_sample.coverage), and the corresponding VAF (column normal_sample.VAF).

See also

BasicIlluminaSequencer and ErrorlessIlluminaSequencer as sequencer types, and vignette("sequencing") for usage examples