This method simulates the sequencing of the samples in a phylogenetic forest.
Arguments
- phylo_forest
A phylogenetic forest.
- sequencer
The sequencer that performs the sequencing simulation (default: an
ErrorlessIlluminaSequencer
).- reference_genome
The reference genome (default: NULL to use the mutation engine reference genome).
- chromosomes
The chromosomes that must be considered (default:
NULL
, i.e., all the reference chromosomes).- coverage
The sequencing coverage (default:
10
).- read_size
The read size (default:
150
).- insert_size_mean
The insert size mean. Use 0 for single read sequencing and any value greater than 0 for pair read sequencing (default:
0
).- insert_size_stddev
The insert size standard deviation. (default:
10
).- output_dir
The SAM output directory (default:
"rRACES_SAM"
).- write_SAM
A Boolean flag to enable/disable SAM generation (default:
FALSE
).- update_SAM
Update the output directory (default:
FALSE
).- cell_labelling
The labelling function for sampled cells See
vignette("sample_partition")
for details (default:NULL
).- purity
The ratio between the number of sample tumeral cell and that of all the cells, i.e., tumour and normal ones. This value must belong to the interval 0,1 (default:
1
).- with_normal_sample
A Boolean flag to enable/disable the analysis of a normal sample (default:
TRUE
).- filename_prefix
The prefix of the output SAM file name (default:
"chr_"
).- template_name_prefix
The template name prefix (default:
"r"
).- include_non_sequenced_mutations
A Boolean flag to include in the resulting data frame also the mutations that are not covered by any of the simulated reads, but occur to one of the samples at least (default:
FALSE
).- seed
The random seed for the internal random generator (optional).
Value
A named list of two elements: the sequencing output data
frame (name "mutations
") and the calling parameters (name
"parameters
").
The sequencing output data frame reports, for each of the
observed SNVs and indels, the chromosome and the position in
which it occurs (columns chr
and chr_pos
), the reference
and alterate sequences (columns ref
and alt
, respectively),
its cause and class (columns causes
, and classes
,
respectively).
Moreover, for each of the sequencied samples <sample name>
,
the returned data frame contains three columns: the number of
reads in which the corresponding mutation occurs (column
<sample name>.occurrences
), the coverage of the mutation
(column <sample name>.coverage
), and the corresponding VAF
(column <sample name>.VAF
).
See also
BasicIlluminaSequencer
and
ErrorlessIlluminaSequencer
as sequencer types, and
vignette("sequencing")
for usage examples