This functions randoly subsample mutations, retaining all the simple clonal CNAs; subclonal CNAs are dropped. If data contains driver mutation annotations, these can be forced to remain.
subsample(x, N = 15000, keep_drivers = TRUE)
data('example_dataset_CNAqc', package = 'CNAqc')
x = init(mutations = example_dataset_CNAqc$mutations, cna = example_dataset_CNAqc$cna, purity = example_dataset_CNAqc$purity)
#>
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#>
#> ℹ Using reference genome coordinates for: GRCh38.
#> ✔ Found annotated driver mutations: TTN, CTCF, and TP53.
#> ✔ Fortified calls for 12963 somatic mutations: 12963 SNVs (100%) and 0 indels.
#> ! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
#> ✔ Fortified CNAs for 267 segments: 267 clonal and 0 subclonal.
#> ✔ 12963 mutations mapped to clonal CNAs.
# Example runs
subsample(x, N = 100)
#>
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#>
#> ℹ Using reference genome coordinates for: GRCh38.
#> ✔ Found annotated driver mutations: TTN, CTCF, and TP53.
#> ✔ Fortified calls for 103 somatic mutations: 103 SNVs (100%) and 0 indels.
#> ! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
#> ✔ Fortified CNAs for 267 segments: 267 clonal and 0 subclonal.
#> Warning: [CNAqc] a karyotype column is present in CNA calls, and will be overwritten
#> ✔ 103 mutations mapped to clonal CNAs.
#> ── [ CNAqc ] MySample 103 mutations in 267 segments (267 clonal, 0 subclonal). G
#>
#> ── Clonal CNAs
#>
#> 2:2 [n = 58, L = 1483 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■ { CTCF }
#> 3:2 [n = 16, L = 357 Mb] ■■■■■■■
#> 4:2 [n = 14, L = 331 Mb] ■■■■■■
#> 2:1 [n = 9, L = 420 Mb] ■■■■ { TTN }
#> 3:0 [n = 5, L = 137 Mb] ■■
#> 2:0 [n = 1, L = 39 Mb] { TP53 }
#>
#> ℹ Sample Purity: 89% ~ Ploidy: 4.
#>
#> ℹ There are 3 annotated driver(s) mapped to clonal CNAs.
#> chr from to ref alt DP NV VAF driver_label is_driver
#> chr2 179431633 179431634 C T 117 77 0.6581197 TTN TRUE
#> chr16 67646006 67646007 C T 120 54 0.4500000 CTCF TRUE
#> chr17 7577106 7577107 G C 84 78 0.9285714 TP53 TRUE
subsample(x, N = 1000)
#>
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#>
#> ℹ Using reference genome coordinates for: GRCh38.
#> ✔ Found annotated driver mutations: TTN, CTCF, and TP53.
#> ✔ Fortified calls for 1003 somatic mutations: 1003 SNVs (100%) and 0 indels.
#> ! CNAs have no CCF, assuming clonal CNAs (CCF = 1).
#> ✔ Fortified CNAs for 267 segments: 267 clonal and 0 subclonal.
#> Warning: [CNAqc] a karyotype column is present in CNA calls, and will be overwritten
#> ✔ 1003 mutations mapped to clonal CNAs.
#> ── [ CNAqc ] MySample 1003 mutations in 267 segments (267 clonal, 0 subclonal).
#>
#> ── Clonal CNAs
#>
#> 2:2 [n = 580, L = 1483 Mb] ■■■■■■■■■■■■■■■■■■■■■■■■■■■ { CTCF }
#> 4:2 [n = 148, L = 331 Mb] ■■■■■■■
#> 2:1 [n = 123, L = 420 Mb] ■■■■■■ { TTN }
#> 3:2 [n = 123, L = 357 Mb] ■■■■■■
#> 3:0 [n = 23, L = 137 Mb] ■
#> 2:0 [n = 4, L = 39 Mb] { TP53 }
#> 25:2 [n = 1, L = 1 Mb]
#> 26:2 [n = 1, L = 0 Mb]
#>
#> ℹ Sample Purity: 89% ~ Ploidy: 4.
#>
#> ℹ There are 3 annotated driver(s) mapped to clonal CNAs.
#> chr from to ref alt DP NV VAF driver_label is_driver
#> chr2 179431633 179431634 C T 117 77 0.6581197 TTN TRUE
#> chr16 67646006 67646007 C T 120 54 0.4500000 CTCF TRUE
#> chr17 7577106 7577107 G C 84 78 0.9285714 TP53 TRUE