Simulating the sequencing

This method simulates the sequencing of the samples in a phylogenetic forest.

Arguments

phylo_forest: A phylogenetic forest.
sequencer: The sequencer that performs the sequencing simulation (default: an ErrorlessIlluminaSequencer).
reference_genome: The reference genome (default: NULL to use the mutation engine reference genome).
chromosomes: The chromosomes that must be considered (default: NULL, i.e., all the reference chromosomes).
coverage: The sequencing coverage (default: 10).
read_size: The read size (default: 150).
insert_size_mean: The insert size mean. Use 0 for single read sequencing and any value greater than 0 for pair read sequencing (default: 0).
insert_size_stddev: The insert size standard deviation. (default: 10).
output_dir: The SAM output directory (default: "rRACES_SAM").
write_SAM: A Boolean flag to enable/disable SAM generation (default: FALSE).
update_SAM: Update the output directory (default: FALSE).
cell_labelling: The labelling function for sampled cells See vignette("sample_partition") for details (default: NULL).
purity: The ratio between the number of sample tumour cell and that of all the cells, i.e., tumour and normal ones. This value must belong to the interval 0,1 (default: 1).
with_normal_sample: A Boolean flag to enable/disable the analysis of a normal sample (default: TRUE).
filename_prefix: The prefix of the output SAM file name (default: "chr_").
template_name_prefix: The template name prefix (default: "r").
include_non_sequenced_mutations: A Boolean flag to include in the resulting data frame also the mutations that are not covered by any of the simulated reads, but occur to one of the samples at least (default: FALSE).
seed: The random seed for the internal random generator (optional).

Value

A named list of two elements: the sequencing output data frame (name "mutations") and the calling parameters (name "parameters").

The sequencing output data frame reports, for each of the observed SNVs and indels, the chromosome and the position in which it occurs (columns chr and chr_pos), the reference and alterate sequences (columns ref and alt, respectively), its cause and class (columns causes, and classes, respectively). Moreover, for each of the sequenced samples <sample name>, the returned data frame contains three columns: the number of reads in which the corresponding mutation occurs (column <sample name>.occurrences), the coverage of the mutation (column <sample name>.coverage), and the corresponding VAF (column <sample name>.VAF).

Arguments

Value

See also