require(tidyverse)
#> Loading required package: tidyverse
require(CNAqc)
require(cli)
#> Loading required package: cli
require(patchwork)
#> Loading required package: patchwork

Create a multisample CNAqc object

If treating data belonging to the same patient, but from different samples (ie: longitudinal or multi-region sampling), it might be handful searching for genomic regions harboring abnormal copy number states in all of the samples. Creating a mCNAqc object allows to modify the existing segments, by searching for regions with altered copy number state in all samples and discarding the others. It defines new segments and remaps existing mutations on them.

Analysis setup

For this example we will use the same data presented in the example data analysis in order to create a multisample CNAqc object for the patient Set06 (samples 1-3 of the same patient).

After having extracted the information on mutations and CNA for each sample, we create a list of CNAqc objects. List names must be sample names.

CNAqc_samples = lapply(names(sample_mutations), function(x) {
  CNAqc::init(mutations = sample_mutations[[x]], 
              cna = cna[[x]], 
              purity = purity[[x]], 
              sample = x,
              ref = "GRCh38")
})

names(CNAqc_samples) = sapply(CNAqc_samples, function(x) {x$sample})

Run the quality control on the created object.

All the samples are characterized by different segments among the genome, as can be see in the plot below:

Create the mCNAqc object

Create a m_CNAqc object using the all the CNAs included in the objects. This will define new segments according to which genomic regions are affected by CNAs in all samples, and remap the mutations on them. It is possible to create a m_CNAqc object either including or not the results of the peak analysis: setting the argument QC_filter to TRUE will use in the new segmentation only the segments passing the QC among the different samples.

All breakpoints (from and to of each CNA table) from all samples are selected and reordered. The list is then used to generate new segments intervals; only segments that are present in all samples will be kept. Mutations are then remapped on the new segments.

The results will be stored as a list of CNAqc objects (one per sample) in the cnaqc_obj_new_segmentation attribute of the mCNAqc object. If running multisample_init with the keep_original = TRUE, the original CNAqc object will be stored in the original_cnaqc_objc attribute; else, results of any additional analysis performed on the original data (ie: peaks analysis or CCF estimation) will be stored in the original_additional_info attribute. Setting the discard_private argument to TRUE will keep in the new CNAqc object only those mutations falling in the same position across all samples.

example_multisample = CNAqc::multisample_init(cnaqc_objs = CNAqc_samples, 
                                              QC_filter = TRUE, # use only segments passing QC
                                              keep_original = TRUE, # keep the original CNAqc objects too
                                              discard_private = FALSE) # keep also not private mutations among the samples
#> 
#> ── mCNAqcqc - Defining common segments ─────────────────────────────────────────
#>  Found 3 CNAqc objects:
#> • Set6_42
#> • Set6_44
#> • Set6_45
#> 
#> ── Building a <mCNAqcqc> object ──
#> 
#> ── Selecting new segments ──────────────────────────────────────────────────────
#>  Iterating on chr1
#> 
#> ── Iterating on chr1 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr1:
#> • Set6_42 = 4
#> • Set6_44 = 5
#> • Set6_45 = 4
#>  Found 21 breakpoints
#>  Iterating on chr2
#> 
#> ── Iterating on chr2 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr2:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 2
#>  Found 4 breakpoints
#>  Iterating on chr3
#> 
#> ── Iterating on chr3 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr3:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 2
#>  Found 4 breakpoints
#>  Iterating on chr4
#> 
#> ── Iterating on chr4 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr4:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 2
#>  Found 4 breakpoints
#>  Iterating on chr5
#> 
#> ── Iterating on chr5 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr5:
#> • Set6_42 = 3
#> • Set6_44 = 3
#> • Set6_45 = 3
#>  Found 8 breakpoints
#>  Iterating on chr6
#> 
#> ── Iterating on chr6 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr6:
#> • Set6_42 = 2
#> • Set6_44 = 3
#> • Set6_45 = 2
#>  Found 6 breakpoints
#>  Iterating on chr7
#> 
#> ── Iterating on chr7 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr7:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 2
#>  Found 4 breakpoints
#>  Iterating on chr8
#> 
#> ── Iterating on chr8 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr8:
#> • Set6_42 = 3
#> • Set6_44 = 3
#> • Set6_45 = 3
#>  Found 8 breakpoints
#>  Iterating on chr9
#> 
#> ── Iterating on chr9 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr9:
#> • Set6_42 = 3
#> • Set6_44 = 3
#> • Set6_45 = 3
#>  Found 11 breakpoints
#>  Iterating on chr10
#> 
#> ── Iterating on chr10 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr10:
#> • Set6_42 = 4
#> • Set6_44 = 4
#> • Set6_45 = 4
#>  Found 13 breakpoints
#>  Iterating on chr11
#> 
#> ── Iterating on chr11 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr11:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 2
#>  Found 4 breakpoints
#>  Iterating on chr12
#> 
#> ── Iterating on chr12 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr12:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 2
#>  Found 4 breakpoints
#>  Iterating on chr13
#> 
#> ── Iterating on chr13 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr13:
#> • Set6_42 = 1
#> • Set6_44 = 1
#> • Set6_45 = 1
#>  Found 2 breakpoints
#>  Iterating on chr14
#> 
#> ── Iterating on chr14 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr14:
#> • Set6_42 = 1
#> • Set6_44 = 1
#> • Set6_45 = 1
#>  Found 2 breakpoints
#>  Iterating on chr15
#> 
#> ── Iterating on chr15 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr15:
#> • Set6_42 = 1
#> • Set6_44 = 1
#> • Set6_45 = 1
#>  Found 3 breakpoints
#>  Iterating on chr16
#> 
#> ── Iterating on chr16 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr16:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 2
#>  Found 7 breakpoints
#>  Iterating on chr17
#> 
#> ── Iterating on chr17 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr17:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 3
#>  Found 7 breakpoints
#>  Iterating on chr18
#> 
#> ── Iterating on chr18 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr18:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 3
#>  Found 8 breakpoints
#>  Iterating on chr19
#> 
#> ── Iterating on chr19 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr19:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 2
#>  Found 4 breakpoints
#>  Iterating on chr20
#> 
#> ── Iterating on chr20 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr20:
#> • Set6_42 = 1
#> • Set6_44 = 1
#> • Set6_45 = 1
#>  Found 4 breakpoints
#>  Iterating on chr21
#> 
#> ── Iterating on chr21 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr21:
#> • Set6_42 = 2
#> • Set6_44 = 2
#> • Set6_45 = 2
#>  Found 8 breakpoints
#>  Iterating on chr22
#> 
#> ── Iterating on chr22 ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chr22:
#> • Set6_42 = 1
#> • Set6_44 = 1
#> • Set6_45 = 1
#>  Found 2 breakpoints
#>  Iterating on chrX
#> 
#> ── Iterating on chrX ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chrX:
#> • Set6_42 = 2
#> • Set6_44 = 0
#> • Set6_45 = 2
#>  Found 4 breakpoints
#>  Iterating on chrY
#> 
#> ── Iterating on chrY ──
#> 
#>  Number of original segments in individual <CNAqc> objects in chrY:
#> • Set6_42 = 1
#> • Set6_44 = 1
#> • Set6_45 = 2
#>  Found 5 breakpoints
#>  Shared segments across samples: 58
#> ! param `keep_original` is set to `TRUE`. Original CNAqc object will be kept.
#>  Obtained updated CNA table with shared segments across samples
#> ── Mapping mutations on new segments ───────────────────────────────────────────
#> ── Set6_42 ─────────────────────────────────────────────────────────────────────
#>  Found 8700 mutations in the original <CNAqc> object
#> ── Mapping mutations on shared segments ────────────────────────────────────────
#> ! Detected indels mutation (substitutions with >1 reference/alternative nucleotides).
#>  Fortified calls for 8700 somatic mutations: 8684 SNVs (100%) and 16 indels.
#> ! Added segments length (in basepairs) to CNA segments.
#>  Fortified CNAs for 58 segments: 58 clonal and 0 subclonal.
#> Warning in map_mutations_to_clonal_segments(mutations, cna_clonal): [CNAqc] a
#> karyotype column is present in CNA calls, and will be overwritten
#>  8407 mutations mapped to clonal CNAs.
#> ── Set6_44 ─────────────────────────────────────────────────────────────────────
#>  Found 8734 mutations in the original <CNAqc> object
#> ── Mapping mutations on shared segments ────────────────────────────────────────
#> ! Detected indels mutation (substitutions with >1 reference/alternative nucleotides).
#>  Fortified calls for 8734 somatic mutations: 8719 SNVs (100%) and 15 indels.
#> ! Added segments length (in basepairs) to CNA segments.
#>  Fortified CNAs for 58 segments: 58 clonal and 0 subclonal.
#> Warning in map_mutations_to_clonal_segments(mutations, cna_clonal): [CNAqc] a
#> karyotype column is present in CNA calls, and will be overwritten
#>  8718 mutations mapped to clonal CNAs.
#> ── Set6_45 ─────────────────────────────────────────────────────────────────────
#>  Found 9143 mutations in the original <CNAqc> object
#> ── Mapping mutations on shared segments ────────────────────────────────────────
#> ! Detected indels mutation (substitutions with >1 reference/alternative nucleotides).
#>  Fortified calls for 9143 somatic mutations: 9125 SNVs (100%) and 18 indels.
#> ! Added segments length (in basepairs) to CNA segments.
#>  Fortified CNAs for 58 segments: 58 clonal and 0 subclonal.
#> Warning in map_mutations_to_clonal_segments(mutations, cna_clonal): [CNAqc] a
#> karyotype column is present in CNA calls, and will be overwritten
#>  8851 mutations mapped to clonal CNAs.
#> 
#> ── Defining the <mCNAqcqc> object ──────────────────────────────────────────────
#> 
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#>  Using reference genome coordinates for: GRCh38.
#> ! Detected indels mutation (substitutions with >1 reference/alternative nucleotides).
#>  Fortified calls for 8407 somatic mutations: 8392 SNVs (100%) and 15 indels.
#> ! Added segments length (in basepairs) to CNA segments.
#>  Fortified CNAs for 58 segments: 58 clonal and 0 subclonal.
#> Warning in map_mutations_to_clonal_segments(mutations, cna_clonal): [CNAqc] a
#> karyotype column is present in CNA calls, and will be overwritten
#>  8407 mutations mapped to clonal CNAs.
#> 
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#>  Using reference genome coordinates for: GRCh38.
#> ! Detected indels mutation (substitutions with >1 reference/alternative nucleotides).
#>  Fortified calls for 8718 somatic mutations: 8703 SNVs (100%) and 15 indels.
#> ! Added segments length (in basepairs) to CNA segments.
#>  Fortified CNAs for 58 segments: 58 clonal and 0 subclonal.
#> Warning in map_mutations_to_clonal_segments(mutations, cna_clonal): [CNAqc] a
#> karyotype column is present in CNA calls, and will be overwritten
#>  8718 mutations mapped to clonal CNAs.
#> 
#> ── CNAqc - CNA Quality Check ───────────────────────────────────────────────────
#>  Using reference genome coordinates for: GRCh38.
#> ! Detected indels mutation (substitutions with >1 reference/alternative nucleotides).
#>  Fortified calls for 8851 somatic mutations: 8836 SNVs (100%) and 15 indels.
#> ! Added segments length (in basepairs) to CNA segments.
#>  Fortified CNAs for 58 segments: 58 clonal and 0 subclonal.
#> Warning in map_mutations_to_clonal_segments(mutations, cna_clonal): [CNAqc] a
#> karyotype column is present in CNA calls, and will be overwritten
#>  8851 mutations mapped to clonal CNAs.
#> 
#> ── Creating mCNAqc stats ──
#> 
#> ── Ended ───────────────────────────────────────────────────────────────────────
#>  <mCNAqc> object created including all samples
print(example_multisample)
#> ── [ multi CNAqc ] ─────────────────────────────────────────────────────────────
#> • Rerefence genome: GRCh38
#> • Samples: Set6_42, Set6_44, Set6_45
#> • 58 total shared segments across samples
#> • 8407, 8718, and 8851 mutations on shared positions across samples
#> 
#> ── Original CNAqc object avaiable for Set6_42, Set6_44, Set6_45

The function plot_segments_multisample allows visualization of the effect of the new segmentation across all the samples included in the mCNAqc object.

CNAqc::plot_segments_multisample(example_multisample, which = "shared", chromosomes = paste0("chr", 1:3)) + ggplot2::ggtitle("New segmentation")
#> 
#> ── Retrieving <CNAqc> objects with the new segmentation ────────────────────────

Accessing elements inside the mCNAqc object

Sample names can be retrieved using the get_sample_name function, which can also be applied on classical CNAqc objects.

CNAqc::get_sample_name(example_multisample)
#> [1] "Set6_42" "Set6_44" "Set6_45"

Elements inside the mCNAqc object (for desired samples) can be accessed through the get_sample function. It returns a list of CNAqc objects. In this way, it is possible to obtain a joint table with all the mutations across samples:

res = CNAqc::get_sample(example_multisample, 
           sample = get_sample_name(example_multisample), 
           which_obj = "shared")
#> 
#> ── Retrieving <CNAqc> objects with the new segmentation ────────────────────────

lapply(res, function(x) {CNAqc::Mutations(x)}) %>% 
  dplyr::bind_rows()
#> # A tibble: 25,976 × 51
#>    chr       from       to ref   alt      NV    DP   VAF Indiv   Key FR     MMLQ
#>    <chr>    <dbl>    <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <int> <chr> <dbl>
#>  1 chr1   3634121  3634122 C     T        42    63 0.667 Set6…    72 0.42…    37
#>  2 chr1   3643785  3643786 G     A        34    59 0.576 Set6…    73 0.42…    37
#>  3 chr1   4161648  4161649 C     A        18    37 0.486 Set6…    81 0.42…    37
#>  4 chr1   5513535  5513536 A     T        10    44 0.227 Set6…    95 0.07…    37
#>  5 chr1   5882139  5882140 C     T        35    55 0.636 Set6…   110 0.42…    37
#>  6 chr1   6595746  6595747 G     A        23    53 0.434 Set6…   122 0.42…    37
#>  7 chr1   7836836  7836837 C     T        18    35 0.514 Set6…   140 0.42…    37
#>  8 chr1   8570678  8570679 C     T        32    55 0.582 Set6…   149 0.42…    37
#>  9 chr1   9149650  9149651 T     G        45    59 0.763 Set6…   154 0.42…    37
#> 10 chr1  10218495 10218496 T     G        20    41 0.488 Set6…   167 0.42…    37
#> # ℹ 25,966 more rows
#> # ℹ 39 more variables: TCR <int>, HP <int>, WE <int>, Source <chr>, FS <chr>,
#> #   WS <int>, PP <chr>, TR <chr>, NF <chr>, TCF <int>, NR <chr>, TC <int>,
#> #   END <chr>, MGOF <chr>, SbPval <chr>, START <chr>, ReadPosRankSum <chr>,
#> #   MQ <chr>, QD <dbl>, SC <chr>, BRF <dbl>, HapScore <chr>, Size <chr>,
#> #   ID <chr>, QUAL <chr>, FILTER <chr>, INFO <chr>, gt_GT <chr>, gt_GQ <chr>,
#> #   gt_GOF <chr>, gt_NR <chr>, gt_GL <chr>, gt_NV <chr>, gt_GT_alleles <chr>, …