Creates a CNAqc object from a set of mutations (SNVs or indels), allele-specific copy numbers and a tumour purity value. The resulting object retains the input mutations that map on top of the copy number segments, and allows for the computation of the QC metrics available in the CNAqc package.
Genomic coordinates in relative (per-chromosome) format are transformed into absolute coordinates by means of a reference genome providing the length of each chromosome. CNAqc supports `hg19`/`GRCh37` and `hg38`/`GRCh38` references, which are embedded into the package as `CNAqc::chr_coordinates_hg19` and `CNAqc::chr_coordinates_GRCh38`. An abitrary reference can also be provided it is stored in am equivalent format.
init(mutations, snvs = NULL, cna, purity, sample = "MySample", ref = "GRCh38")
A dataframe of mutations with the following fields:
* `chr` chromosome name, e.g., "chr3"
, "chr8"
, "chrX"
, ...;
* `from` where the mutation start, an integer number;
* `to` where the mutation ends, an integer number;
* `ref` reference allele, e.g., "A"
, "ACC"
, "AGA"
, ...;
* `alt` alternative allele, e.g., "A"
, "ACC"
, "AGA"
, ...;
* `DP` sequencing depth at the locus, an integer number;
* `NV` number of reads with the variant at the locus, an integer number;
* `VAF` variant allele frequency (VAF), defined as `NV/DP`, at the locus, a real number in [0,1].
Optionally, driver mutations can be annotated. In this case the input dataframe needs to report:
* `is_driver` a boolean flag for the driver status; * `driver_label` the driver label that will appear in each plot, e.g., `BRAV V600E`.
Deprecated parameter.
A dataframe of allele-specific copy number with the following fields:
* `chr` chromosome name, e.g., "chr3"
, "chr8"
, "chrX"
, ...
* `from` where the segment start, an integer number
* `to` where the segment ends, an integer number
* `Major` for the number of copies of the major allele (or A-allele), an integer number
* `minor` for the number of copies of the major allele (or B-allele), an integer number
* `CCF` an optional cancer cell fraction (CCF) column distinguishing clonal and subclonal segments, a real number in [0,1]
* `Major_2` optional for the number of copies of the major allele (or A-allele) in the second clone if present, an integer number
* `minor_2` optional for the number of copies of the major allele (or B-allele) in the second clone if present, an integer number
If the `CCF` value is present and equal to 1, a segment is considered clonal, otherwise subclonal. If a segment is subclonal:
* the columns `Major` and `minor` are interpreted as those for a subclone with proportion equal to the `CCF` value; * the columns `Major_2` and `minor_2` are interpreted as those for a second subclone with proportion equal to the `1 - CCF` value;
Value in between `0` and `1` to represent the proportion of actual tumour content (sometimes called "cellularity").
Sample name (a string).
A key word for the used reference coordinate system. CNAqc supports `hg19`/`GRCh37` and `hg38`/`GRCh38` references, which are embedded into the package as `CNAqc::chr_coordinates_hg19` and `CNAqc::chr_coordinates_GRCh38`. An abitrary reference can also be provided if `ref` is a dataframe in the same format as `CNAqc::chr_coordinates_hg19` or `CNAqc::chr_coordinates_GRCh38`. The default reference is `GRCh38`.
A CNAqc object of class `cnaqc`, with S3 methods for printing, plotting and analyzing data.
# Example input data released with the package
data('example_dataset_CNAqc', package = 'CNAqc')
print(example_dataset_CNAqc)
#> $mutations
#> # A tibble: 12,963 × 13
#> chr from to ref alt FILTER DP NV VAF ANNOVAR_FUNCTION
#> <chr> <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 chr1 1027104 1027105 T G PASS 60 6 0.1 UTR5
#> 2 chr1 2248588 2248589 A C PASS 127 9 0.0709 intergenic
#> 3 chr1 2461999 2462000 G A PASS 156 65 0.417 upstream
#> 4 chr1 2727935 2727936 T C PASS 180 90 0.5 downstream
#> 5 chr1 2763397 2763398 C T PASS 183 61 0.333 intergenic
#> 6 chr1 2768208 2768209 C T PASS 203 130 0.640 intergenic
#> 7 chr1 2935590 2935591 C T PASS 228 132 0.579 intergenic
#> 8 chr1 2980032 2980033 C T PASS 196 85 0.434 ncRNA_exonic
#> 9 chr1 3387161 3387162 T G PASS 124 6 0.0484 intronic
#> 10 chr1 3502517 3502518 G A PASS 88 10 0.114 intronic
#> # ℹ 12,953 more rows
#> # ℹ 3 more variables: GENE <chr>, is_driver <lgl>, driver_label <chr>
#>
#> $cna
#> # A tibble: 267 × 7
#> chr from to length covRatio Major minor
#> <chr> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 chr1 840009 1689987 849979 1.19 3 2
#> 2 chr1 1689988 1815015 125028 1.26 3 2
#> 3 chr1 1815016 9799969 7984954 1.19 3 2
#> 4 chr1 10479910 12079917 1600008 1.19 3 2
#> 5 chr1 12079917 12154980 75064 1.24 3 2
#> 6 chr1 12154981 12839977 684997 1.19 3 2
#> 7 chr1 13780016 17790026 4010011 1.19 3 2
#> 8 chr1 17849962 21080067 3230106 1.19 3 2
#> 9 chr1 21080068 21559998 479931 1.26 3 2
#> 10 chr1 21559998 24830001 3270004 1.19 3 2
#> # ℹ 257 more rows
#>
#> $purity
#> [1] 0.89
#>
#> $reference
#> [1] "hg19"
#>
# Note the outputs to screen
x = init(mutations = example_dataset_CNAqc$mutations, cna = example_dataset_CNAqc$cna, purity = example_dataset_CNAqc$purity)
#>
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#>
#> ℹ Using reference genome coordinates for: GRCh38.
#> ✔ Found annotated driver mutations: TTN, CTCF, and TP53.
#> ✔ Fortified calls for 12963 somatic mutations: 12963 SNVs (100%) and 0 indels.
#> ! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
#> ✔ Fortified CNAs for 267 segments: 267 clonal and 0 subclonal.
#> ✔ 12963 mutations mapped to clonal CNAs.
# An S3 method can be used to report to screen what is in the object
print(x)
#> ── [ CNAqc ] MySample 12963 mutations in 267 segments (267 clonal, 0 subclonal).
#>
#> ── Clonal CNAs
#>
#> 2:2 [n = 7478, L = 1483 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■ { CTCF }
#> 4:2 [n = 1893, L = 331 Mb] ■■■■■■■
#> 3:2 [n = 1625, L = 357 Mb] ■■■■■■
#> 2:1 [n = 1563, L = 420 Mb] ■■■■■■ { TTN }
#> 3:0 [n = 312, L = 137 Mb] ■
#> 2:0 [n = 81, L = 39 Mb] { TP53 }
#> 16:2 [n = 4, L = 0 Mb]
#> 25:2 [n = 2, L = 1 Mb]
#> 3:1 [n = 2, L = 1 Mb]
#> 106:1 [n = 1, L = 0 Mb]
#>
#> ℹ Sample Purity: 89% ~ Ploidy: 4.
#>
#> ℹ There are 3 annotated driver(s) mapped to clonal CNAs.
#> chr from to ref alt DP NV VAF driver_label is_driver
#> chr2 179431633 179431634 C T 117 77 0.6581197 TTN TRUE
#> chr16 67646006 67646007 C T 120 54 0.4500000 CTCF TRUE
#> chr17 7577106 7577107 G C 84 78 0.9285714 TP53 TRUE