Extract per-gene copy number status.

By default, this function maps a list of genes to their copy number status using clonal CNA segments. The genes used are all the known human genes, whose coordinates are available inside the CNAqc package and are identified by common names (e.g., TP53). The function can restrict to a subset of genes (faster computation) if one passes, via the `genes` parameter, a vector of gene symbols.

CNA_gene(x, genes = NULL)

Arguments

x: A CNAqc object.
genes: Optional, a vector of gene symbols of interest. If `NULL`, all the human genes are used, according to the genome reference of input `x`.

Value

A tibble with columns `gene` (gene name), `from`/`to` (gene delimiters), `Major`/`minor`/`karyotype` as the information for the copy number segment where the gene sits. Note that if the gene maps to a subclonal segment this is not returned.

Examples

# Example input data released with the package
data('example_dataset_CNAqc', package = 'CNAqc')
print(example_dataset_CNAqc)
#> $mutations
#> # A tibble: 12,963 × 13
#>    chr      from      to ref   alt   FILTER    DP    NV    VAF ANNOVAR_FUNCTION
#>    <chr>   <dbl>   <dbl> <chr> <chr> <chr>  <dbl> <dbl>  <dbl> <chr>           
#>  1 chr1  1027104 1027105 T     G     PASS      60     6 0.1    UTR5            
#>  2 chr1  2248588 2248589 A     C     PASS     127     9 0.0709 intergenic      
#>  3 chr1  2461999 2462000 G     A     PASS     156    65 0.417  upstream        
#>  4 chr1  2727935 2727936 T     C     PASS     180    90 0.5    downstream      
#>  5 chr1  2763397 2763398 C     T     PASS     183    61 0.333  intergenic      
#>  6 chr1  2768208 2768209 C     T     PASS     203   130 0.640  intergenic      
#>  7 chr1  2935590 2935591 C     T     PASS     228   132 0.579  intergenic      
#>  8 chr1  2980032 2980033 C     T     PASS     196    85 0.434  ncRNA_exonic    
#>  9 chr1  3387161 3387162 T     G     PASS     124     6 0.0484 intronic        
#> 10 chr1  3502517 3502518 G     A     PASS      88    10 0.114  intronic        
#> # ℹ 12,953 more rows
#> # ℹ 3 more variables: GENE <chr>, is_driver <lgl>, driver_label <chr>
#> 
#> $cna
#> # A tibble: 267 × 7
#>    chr       from       to  length covRatio Major minor
#>    <chr>    <int>    <int>   <int>    <dbl> <dbl> <dbl>
#>  1 chr1    840009  1689987  849979     1.19     3     2
#>  2 chr1   1689988  1815015  125028     1.26     3     2
#>  3 chr1   1815016  9799969 7984954     1.19     3     2
#>  4 chr1  10479910 12079917 1600008     1.19     3     2
#>  5 chr1  12079917 12154980   75064     1.24     3     2
#>  6 chr1  12154981 12839977  684997     1.19     3     2
#>  7 chr1  13780016 17790026 4010011     1.19     3     2
#>  8 chr1  17849962 21080067 3230106     1.19     3     2
#>  9 chr1  21080068 21559998  479931     1.26     3     2
#> 10 chr1  21559998 24830001 3270004     1.19     3     2
#> # ℹ 257 more rows
#> 
#> $purity
#> [1] 0.89
#> 
#> $reference
#> [1] "hg19"
#> 

# Note the outputs to screen
x = init(mutations = example_dataset_CNAqc$mutations, cna = example_dataset_CNAqc$cna, purity = example_dataset_CNAqc$purity)
#> 
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#> 
#> ℹ Using reference genome coordinates for: GRCh38.
#> ✔ Found annotated driver mutations: TTN, CTCF, and TP53.
#> ✔ Fortified calls for 12963 somatic mutations: 12963 SNVs (100%) and 0 indels.
#> ! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
#> ✔ Fortified CNAs for 267 segments: 267 clonal and 0 subclonal.
#> ✔ 12963 mutations mapped to clonal CNAs.

# Get mapping for all the known human genes - takes a bit longer
CNA_gene(x)
#> Warning: replacing previous import ‘cli::num_ansi_colors’ by ‘crayon::num_ansi_colors’ when loading ‘easypar’
#> [easypar] 1/2 computations returned errors and will be removed.
#> # A tibble: 67,149 × 7
#>    gene        chr    from     to Major minor karyotype
#>    <chr>       <chr> <int>  <int> <dbl> <dbl> <chr>    
#>  1 DDX11L1     chr1  11869  14409    NA    NA NA:NA    
#>  2 WASH7P      chr1  14404  29570    NA    NA NA       
#>  3 MIR6859-1   chr1  17369  17436    NA    NA NA       
#>  4 MIR1302-2HG chr1  29554  31109    NA    NA NA       
#>  5 MIR1302-2   chr1  30366  30503    NA    NA NA       
#>  6 FAM138A     chr1  34554  36081    NA    NA NA       
#>  7 OR4G4P      chr1  52473  53312    NA    NA NA       
#>  8 OR4G11P     chr1  57598  64116    NA    NA NA       
#>  9 OR4F5       chr1  65419  71585    NA    NA NA       
#> 10 AL627309.1  chr1  89295 133723    NA    NA NA       
#> # ℹ 67,139 more rows

# Use known genes set
CNA_gene(x, genes = c("APC", "KRAS", "NRAS", "TP53"))
#> # A tibble: 4 × 7
#>   gene  chr        from        to Major minor karyotype
#>   <chr> <chr>     <int>     <int> <dbl> <dbl> <chr>    
#> 1 NRAS  chr1  114704469 114716771     3     2 3:2      
#> 2 KRAS  chr12  25205246  25250936     2     2 2:2      
#> 3 TP53  chr17   7661779   7687550     2     0 2:0      
#> 4 APC   chr5  112707498 112846239     2     2 2:2