Creates a CNAqc object. — init • CNAqc

Creates a CNAqc object from a set of mutations (SNVs or indels), allele-specific copy numbers and a tumour purity value. The resulting object retains the input mutations that map on top of the copy number segments, and allows for the computation of the QC metrics available in the CNAqc package.

Genomic coordinates in relative (per-chromosome) format are transformed into absolute coordinates by means of a reference genome providing the length of each chromosome. CNAqc supports `hg19`/`GRCh37` and `hg38`/`GRCh38` references, which are embedded into the package as `CNAqc::chr_coordinates_hg19` and `CNAqc::chr_coordinates_GRCh38`. An abitrary reference can also be provided it is stored in an equivalent format.

init(
  mutations,
  snvs = NULL,
  cna,
  purity,
  sample = "MySample",
  ref = "GRCh38",
  genome_coords = NULL
)

Arguments

mutations

A dataframe of mutations with the following fields:

* `chr` chromosome name, e.g., "chr3", "chr8", "chrX", ...; * `from` where the mutation start, an integer number; * `to` where the mutation ends, an integer number; * `ref` reference allele, e.g., "A", "ACC", "AGA", ...; * `alt` alternative allele, e.g., "A", "ACC", "AGA", ...; * `DP` sequencing depth at the locus, an integer number; * `NV` number of reads with the variant at the locus, an integer number; * `VAF` variant allele frequency (VAF), defined as `NV/DP`, at the locus, a real number in [0,1].

Optionally, driver mutations can be annotated. In this case the input dataframe needs to report:

* `is_driver` a boolean flag for the driver status; * `driver_label` the driver label that will appear in each plot, e.g., `BRAV V600E`.

snvs

Deprecated parameter.

cna

A dataframe of allele-specific copy number with the following fields:

* `chr` chromosome name, e.g., "chr3", "chr8", "chrX", ... * `from` where the segment start, an integer number * `to` where the segment ends, an integer number * `Major` for the number of copies of the major allele (or A-allele), an integer number * `minor` for the number of copies of the major allele (or B-allele), an integer number * `CCF` an optional cancer cell fraction (CCF) column distinguishing clonal and subclonal segments, a real number in [0,1] * `Major_2` optional for the number of copies of the major allele (or A-allele) in the second clone if present, an integer number * `minor_2` optional for the number of copies of the major allele (or B-allele) in the second clone if present, an integer number

If the `CCF` value is present and equal to 1, a segment is considered clonal, otherwise subclonal. If a segment is subclonal:

* the columns `Major` and `minor` are interpreted as those for a subclone with proportion equal to the `CCF` value; * the columns `Major_2` and `minor_2` are interpreted as those for a second subclone with proportion equal to the `1 - CCF` value;

purity

Value in between `0` and `1` to represent the proportion of actual tumour content (sometimes called "cellularity").

sample

Sample name (a string).

ref

A key word for the used reference coordinate system. CNAqc supports `hg19`/`GRCh37` and `hg38`/`GRCh38` references, which are embedded into the package as `CNAqc::chr_coordinates_hg19` and `CNAqc::chr_coordinates_GRCh38`. An abitrary reference can also be provided if `genome_coords` is a dataframe in the same format as `CNAqc::chr_coordinates_hg19` or `CNAqc::chr_coordinates_GRCh38`. The default reference is `GRCh38`.

genome_coords

A dataframe including the genomic absolute coordinates of a custom reference genome, in the same format as `CNAqc::chr_coordinates_hg19` or `CNAqc::chr_coordinates_GRCh38`. Default is NULL

Value

A CNAqc object of class `cnaqc`, with S3 methods for printing, plotting and analyzing data.

Examples

# Example input data released with the package
data('example_dataset_CNAqc', package = 'CNAqc')
print(example_dataset_CNAqc)
#> $mutations
#> # A tibble: 12,963 × 13
#>    chr      from      to ref   alt   FILTER    DP    NV    VAF ANNOVAR_FUNCTION
#>    <chr>   <dbl>   <dbl> <chr> <chr> <chr>  <dbl> <dbl>  <dbl> <chr>           
#>  1 chr1  1027104 1027105 T     G     PASS      60     6 0.1    UTR5            
#>  2 chr1  2248588 2248589 A     C     PASS     127     9 0.0709 intergenic      
#>  3 chr1  2461999 2462000 G     A     PASS     156    65 0.417  upstream        
#>  4 chr1  2727935 2727936 T     C     PASS     180    90 0.5    downstream      
#>  5 chr1  2763397 2763398 C     T     PASS     183    61 0.333  intergenic      
#>  6 chr1  2768208 2768209 C     T     PASS     203   130 0.640  intergenic      
#>  7 chr1  2935590 2935591 C     T     PASS     228   132 0.579  intergenic      
#>  8 chr1  2980032 2980033 C     T     PASS     196    85 0.434  ncRNA_exonic    
#>  9 chr1  3387161 3387162 T     G     PASS     124     6 0.0484 intronic        
#> 10 chr1  3502517 3502518 G     A     PASS      88    10 0.114  intronic        
#> # ℹ 12,953 more rows
#> # ℹ 3 more variables: GENE <chr>, is_driver <lgl>, driver_label <chr>
#> 
#> $cna
#> # A tibble: 267 × 7
#>    chr       from       to  length covRatio Major minor
#>    <chr>    <int>    <int>   <int>    <dbl> <dbl> <dbl>
#>  1 chr1    840009  1689987  849979     1.19     3     2
#>  2 chr1   1689988  1815015  125028     1.26     3     2
#>  3 chr1   1815016  9799969 7984954     1.19     3     2
#>  4 chr1  10479910 12079917 1600008     1.19     3     2
#>  5 chr1  12079917 12154980   75064     1.24     3     2
#>  6 chr1  12154981 12839977  684997     1.19     3     2
#>  7 chr1  13780016 17790026 4010011     1.19     3     2
#>  8 chr1  17849962 21080067 3230106     1.19     3     2
#>  9 chr1  21080068 21559998  479931     1.26     3     2
#> 10 chr1  21559998 24830001 3270004     1.19     3     2
#> # ℹ 257 more rows
#> 
#> $purity
#> [1] 0.89
#> 
#> $reference
#> [1] "hg19"
#> 

# Note the outputs to screen
x = init(mutations = example_dataset_CNAqc$mutations, cna = example_dataset_CNAqc$cna, purity = example_dataset_CNAqc$purity)
#> 
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#> 
#> ℹ Using reference genome coordinates for: GRCh38.
#> ✔ Found annotated driver mutations: TTN, CTCF, and TP53.
#> ✔ Fortified calls for 12963 somatic mutations: 12963 SNVs (100%) and 0 indels.
#> ! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
#> ✔ Fortified CNAs for 267 segments: 267 clonal and 0 subclonal.
#> ✔ 12963 mutations mapped to clonal CNAs.

# An S3 method can be used to report to screen what is in the object
print(x)
#> ── [ CNAqc ] MySample 12963 mutations in 267 segments (267 clonal, 0 subclonal).
#> 
#> ── Clonal CNAs 
#> 
#>    2:2  [n = 7478, L = 1483 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■  { CTCF }
#>    4:2  [n = 1893, L =  331 Mb] ■■■■■■■
#>    3:2  [n = 1625, L =  357 Mb] ■■■■■■
#>    2:1  [n = 1563, L =  420 Mb] ■■■■■■  { TTN }
#>    3:0  [n =  312, L =  137 Mb] ■
#>    2:0  [n =   81, L =   39 Mb]   { TP53 }
#>   16:2  [n =    4, L =    0 Mb] 
#>   25:2  [n =    2, L =    1 Mb] 
#>    3:1  [n =    2, L =    1 Mb] 
#>  106:1  [n =    1, L =    0 Mb] 
#> 
#> ℹ Sample Purity: 89% ~ Ploidy: 4.
#> 
#> ℹ There are 3 annotated driver(s) mapped to clonal CNAs.
#>          chr      from        to ref alt  DP NV       VAF driver_label is_driver
#>         chr2 179431633 179431634   C   T 117 77 0.6581197          TTN      TRUE
#>        chr16  67646006  67646007   C   T 120 54 0.4500000         CTCF      TRUE
#>        chr17   7577106   7577107   G   C  84 78 0.9285714         TP53      TRUE