By default, this function maps a list of genes to their copy number status using clonal CNA segments. The genes used are all the known human genes, whose coordinates are available inside the CNAqc package and are identified by common names (e.g., TP53). The function can restrict to a subset of genes (faster computation) if one passes, via the `genes` parameter, a vector of gene symbols.
CNA_gene(x, genes = NULL)
A tibble with columns `gene` (gene name), `from`/`to` (gene delimiters), `Major`/`minor`/`karyotype` as the information for the copy number segment where the gene sits. Note that if the gene maps to a subclonal segment this is not returned.
# Example input data released with the package
data('example_dataset_CNAqc', package = 'CNAqc')
print(example_dataset_CNAqc)
#> $mutations
#> # A tibble: 12,963 × 13
#> chr from to ref alt FILTER DP NV VAF ANNOVAR_FUNCTION
#> <chr> <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 chr1 1027104 1027105 T G PASS 60 6 0.1 UTR5
#> 2 chr1 2248588 2248589 A C PASS 127 9 0.0709 intergenic
#> 3 chr1 2461999 2462000 G A PASS 156 65 0.417 upstream
#> 4 chr1 2727935 2727936 T C PASS 180 90 0.5 downstream
#> 5 chr1 2763397 2763398 C T PASS 183 61 0.333 intergenic
#> 6 chr1 2768208 2768209 C T PASS 203 130 0.640 intergenic
#> 7 chr1 2935590 2935591 C T PASS 228 132 0.579 intergenic
#> 8 chr1 2980032 2980033 C T PASS 196 85 0.434 ncRNA_exonic
#> 9 chr1 3387161 3387162 T G PASS 124 6 0.0484 intronic
#> 10 chr1 3502517 3502518 G A PASS 88 10 0.114 intronic
#> # ℹ 12,953 more rows
#> # ℹ 3 more variables: GENE <chr>, is_driver <lgl>, driver_label <chr>
#>
#> $cna
#> # A tibble: 267 × 7
#> chr from to length covRatio Major minor
#> <chr> <int> <int> <int> <dbl> <dbl> <dbl>
#> 1 chr1 840009 1689987 849979 1.19 3 2
#> 2 chr1 1689988 1815015 125028 1.26 3 2
#> 3 chr1 1815016 9799969 7984954 1.19 3 2
#> 4 chr1 10479910 12079917 1600008 1.19 3 2
#> 5 chr1 12079917 12154980 75064 1.24 3 2
#> 6 chr1 12154981 12839977 684997 1.19 3 2
#> 7 chr1 13780016 17790026 4010011 1.19 3 2
#> 8 chr1 17849962 21080067 3230106 1.19 3 2
#> 9 chr1 21080068 21559998 479931 1.26 3 2
#> 10 chr1 21559998 24830001 3270004 1.19 3 2
#> # ℹ 257 more rows
#>
#> $purity
#> [1] 0.89
#>
#> $reference
#> [1] "hg19"
#>
# Note the outputs to screen
x = init(mutations = example_dataset_CNAqc$mutations, cna = example_dataset_CNAqc$cna, purity = example_dataset_CNAqc$purity)
#>
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#>
#> ℹ Using reference genome coordinates for: GRCh38.
#> ✔ Found annotated driver mutations: TTN, CTCF, and TP53.
#> ✔ Fortified calls for 12963 somatic mutations: 12963 SNVs (100%) and 0 indels.
#> ! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
#> ✔ Fortified CNAs for 267 segments: 267 clonal and 0 subclonal.
#> ✔ 12963 mutations mapped to clonal CNAs.
# Get mapping for all the known human genes - takes a bit longer
CNA_gene(x)
#> Warning: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’
#> [easypar] 1/2 computations returned errors and will be removed.
#> # A tibble: 67,149 × 7
#> gene chr from to Major minor karyotype
#> <chr> <chr> <int> <int> <dbl> <dbl> <chr>
#> 1 DDX11L1 chr1 11869 14409 NA NA NA:NA
#> 2 WASH7P chr1 14404 29570 NA NA NA
#> 3 MIR6859-1 chr1 17369 17436 NA NA NA
#> 4 MIR1302-2HG chr1 29554 31109 NA NA NA
#> 5 MIR1302-2 chr1 30366 30503 NA NA NA
#> 6 FAM138A chr1 34554 36081 NA NA NA
#> 7 OR4G4P chr1 52473 53312 NA NA NA
#> 8 OR4G11P chr1 57598 64116 NA NA NA
#> 9 OR4F5 chr1 65419 71585 NA NA NA
#> 10 AL627309.1 chr1 89295 133723 NA NA NA
#> # ℹ 67,139 more rows
# Use known genes set
CNA_gene(x, genes = c("APC", "KRAS", "NRAS", "TP53"))
#> # A tibble: 4 × 7
#> gene chr from to Major minor karyotype
#> <chr> <chr> <int> <int> <dbl> <dbl> <chr>
#> 1 NRAS chr1 114704469 114716771 3 2 3:2
#> 2 KRAS chr12 25205246 25250936 2 2 2:2
#> 3 TP53 chr17 7661779 7687550 2 0 2:0
#> 4 APC chr5 112707498 112846239 2 2 2:2