Skip to contents
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(INCOMMON)
#> Warning: replacing previous import 'cli::num_ansi_colors' by
#> 'crayon::num_ansi_colors' when loading 'INCOMMON'

In this vignette we carry out survival analysis based on INCOMMON classification of samples of breast cancer (BRCA) patients of the MSK-MetTropsim cohort.

5.1 Classification of 2462 breast cancer samples

In order to study patterns of metastatisation (propensity and organotropism) related to the INCOMMON interpreted genomes, we first need to classify all the mutations in these samples.

5.1.1 Input intialisation

First we prepare the input using function init:

data(MSK_genomic_data)
data(MSK_clinical_data)
data(cancer_gene_census)

x = init(
  genomic_data = MSK_genomic_data,
  clinical_data = MSK_clinical_data %>% filter(tumor_type == 'BRCA'),
  gene_roles = cancer_gene_census
)
#> ── INCOMMON - Inference of copy number and mutation multiplicity in oncology ───
#> 
#> ── Genomic data ──
#> 
#>  Found 25659 samples, with 224939 mutations in 491 genes
#> ! No read counts found for 1393 mutations in 1393 samples
#> ! Gene name not provided for 1393 mutations
#> ! 201 genes could not be assigned a role (TSG or oncogene)
#> 
#> ── Clinical data ──
#> 
#>  Provided clinical features:
#>  sample (required for classification)
#>  purity (required for classification)
#>  tumor_type
#>  OS_MONTHS
#>  OS_STATUS
#>  SAMPLE_TYPE
#>  MET_COUNT
#>  METASTATIC_SITE
#>  MET_SITE_COUNT
#>  PRIMARY_SITE
#>  SUBTYPE_ABBREVIATION
#>  GENE_PANEL
#>  SEX
#>  TMB_NONSYNONYMOUS
#>  FGA
#>  AGE_AT_SEQUENCING
#>  RACE
#>  Found 2562 matching samples
#>  Found 23104 unmatched samples

print(x)
#> ── [ INCOMMON ]  9916 PASS mutations across 2462 samples, with 286 mutant genes
#>  Average sample purity: 0.42
#>  Average sequencing depth: 681
#> # A tibble: 9,916 × 27
#>    sample    tumor_type purity chr     from     to ref   alt      DP    NV   VAF
#>    <chr>     <chr>       <dbl> <chr>  <dbl>  <dbl> <chr> <chr> <int> <int> <dbl>
#>  1 P-0015535 BRCA          0.3 chr3  1.79e8 1.79e8 G     A       868   167 0.192
#>  2 P-0015535 BRCA          0.3 chr17 3.79e7 3.79e7 T     C      1172   205 0.175
#>  3 P-0015535 BRCA          0.3 chr7  1.41e8 1.41e8 G     A       765   120 0.157
#>  4 P-0015535 BRCA          0.3 chr21 3.62e7 3.62e7 G     A      1006   162 0.161
#>  5 P-0015535 BRCA          0.3 chr16 6.88e7 6.88e7 C     -       774   210 0.271
#>  6 P-0015535 BRCA          0.3 chr17 1.60e7 1.60e7 C     T       764   155 0.203
#>  7 P-0015535 BRCA          0.3 chr19 1.46e7 1.46e7 G     C       544    70 0.129
#>  8 P-0007009 BRCA          0.5 chr19 4.28e7 4.28e7 G     A       852   648 0.761
#>  9 P-0007009 BRCA          0.5 chr14 1.05e8 1.05e8 -     GGCA…  1530   453 0.296
#> 10 P-0013299 BRCA          0.4 chr19 4.59e7 4.59e7 C     T      1542   180 0.117
#> # ℹ 9,906 more rows
#> # ℹ 16 more variables: gene <chr>, gene_role <chr>, OS_MONTHS <dbl>,
#> #   OS_STATUS <dbl>, SAMPLE_TYPE <chr>, MET_COUNT <dbl>, METASTATIC_SITE <chr>,
#> #   MET_SITE_COUNT <dbl>, PRIMARY_SITE <chr>, SUBTYPE_ABBREVIATION <chr>,
#> #   GENE_PANEL <chr>, SEX <chr>, TMB_NONSYNONYMOUS <dbl>, FGA <dbl>,
#> #   AGE_AT_SEQUENCING <dbl>, RACE <chr>

There are 9916 mutations with average sequencing depth 681 across 2462 samples with average purity 0.42.

5.1.2 Classification

We then classify the mutations using PCAWG priors and the default entropy cutoff and overdispersion parameter:

x = classify(
  x = x,
  priors = INCOMMON::pcawg_priors,
  entropy_cutoff = 0.2,
  rho = 0.01
  # parallel = TRUE, # uncomment these to run in parallel
  # num_cores = 8
)
print(x)
#> ── [ INCOMMON ]  9916 PASS mutations across 2462 samples, with 286 mutant genes
#>  Average sample purity: 0.42
#>  Average sequencing depth: 681
#> ── [ INCOMMON ]  Classified mutations with overdispersion parameter 0.01 and ent
#>  There are:
#>  N = 4238 mutations (HMD)
#>  N = 581 mutations (LOH)
#>  N = 2019 mutations (CNLOH)
#>  N = 666 mutations (AM)
#>  N = 2412 mutations (Tier-2)
#> # A tibble: 9,916 × 18
#>    sample    tumor_type purity chr     from     to ref   alt      DP    NV   VAF
#>    <chr>     <chr>       <dbl> <chr>  <dbl>  <dbl> <chr> <chr> <int> <int> <dbl>
#>  1 P-0015535 BRCA          0.3 chr3  1.79e8 1.79e8 G     A       868   167 0.192
#>  2 P-0015535 BRCA          0.3 chr17 3.79e7 3.79e7 T     C      1172   205 0.175
#>  3 P-0015535 BRCA          0.3 chr7  1.41e8 1.41e8 G     A       765   120 0.157
#>  4 P-0015535 BRCA          0.3 chr21 3.62e7 3.62e7 G     A      1006   162 0.161
#>  5 P-0015535 BRCA          0.3 chr16 6.88e7 6.88e7 C     -       774   210 0.271
#>  6 P-0015535 BRCA          0.3 chr17 1.60e7 1.60e7 C     T       764   155 0.203
#>  7 P-0015535 BRCA          0.3 chr19 1.46e7 1.46e7 G     C       544    70 0.129
#>  8 P-0007009 BRCA          0.5 chr19 4.28e7 4.28e7 G     A       852   648 0.761
#>  9 P-0007009 BRCA          0.5 chr14 1.05e8 1.05e8 -     GGCA…  1530   453 0.296
#> 10 P-0013299 BRCA          0.4 chr19 4.59e7 4.59e7 C     T      1542   180 0.117
#> # ℹ 9,906 more rows
#> # ℹ 7 more variables: gene <chr>, gene_role <chr>, id <chr>, label <chr>,
#> #   state <chr>, posterior <dbl>, entropy <dbl>

There are 4147 heterozygous diploid mutations (HMD), 578 mutations with loss of heterozygosity (LOH), 2018 mutations with copy-neutral LOH (CNLOH), 663 mutations with amplification. In addition, 2510 mutations were classified as Tier-2, either because of entropy being larger than cutoff or because of a low number of mutant alleles relative to the wild-type.

5.2 Metastatic propensity of BRCA samples

5.2.1 Metastatic propensity of fully incactivated TP53

We can analyse the metastatic propensity of primary breast tumor genomes containing TP53 mutations by using function met_propensity. This function implements a logistic regression to fit the Binomial probability of developing metastasis based on the interpreted mutant genome, with the mutant gene without CNA (here, Mutant TP53 without LOH) as reference.

x = met_propensity(x, tumor_type = 'BRCA', gene = 'TP53')
#>  There are 2115 different genotypes
#>  The most abundant genotypes are:
#>  Mutant TP53 with LOH (54 Samples, Frequency 0.02)
#>  Mutant TP53 without LOH (36 Samples, Frequency 0.01)
#>  Mutant PIK3CA without AMP (33 Samples, Frequency 0.01)
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> # A tibble: 1 × 6
#>   gene  class                   OR   low    up p.value
#>   <chr> <chr>                <dbl> <dbl> <dbl>   <dbl>
#> 1 TP53  Mutant TP53 with LOH  1.63  1.11  2.38  0.0118

From this analysis it emerges that Mutant TP53 with LOH patients have more than 150% increased risk to metastatise (OR = 1.64, p.value = 0.01) with respect to mutant samples without LOH.

5.2.2 Metastatic propensity for the top mutant genes in BRCA

We extend this analysis to multiple genes, focusing on the 50 most frequently mutated ones.

top_genes = classification(x) %>% 
  dplyr::filter(state != 'Tier-2') %>% 
  dplyr::group_by(gene) %>% 
  dplyr::reframe(N = length(unique(sample))) %>% 
  dplyr::arrange(dplyr::desc(N)) %>% 
  dplyr::slice_head(n = 50) %>% 
  pull(gene)

print(top_genes)

for(g in top_genes){
  x = met_propensity(x, tumor_type = 'BRCA', gene = g)
}

5.2.2 Visualising metastatic propensity odds ratio

INCOMMON provides the function plot_met_volcano to visualise metastatic propensity odds ratios in a volcano plot.

plot_met_volcano(x = x, tumor_type = 'BRCA')
#> Warning: Removed 14 rows containing missing values or values outside the scale range
#> (`geom_point()`).

In addition to TP53, among the 50 most frequently mutant genes in BRCA the complete inactivation (Mutation with LOH) of ARID1A significantly increases the risk of metastasis (OR = 9.33, p.value = 0.04), whereas among the oncogenes, only for PIK3CA the full activation (Mutation with AMP) leads to higher risks of metastases (OR = 2.00, p.value = 0.0007).

5.3 Metastatic tropism of BRCA samples

5.3.1 Tropism of fully inactivated TP53 BRCA samples to the Liver

We can analyse the metastatic organotropism of metastatic breast tumor genomes containing TP53 mutations by using function met_tropsim. Similarly to the metastatic propensity analysis, this function implements a logistic regression to fit the Binomial probability of developing metastasis towards a specific metastatic site (here the Liver, as example), based on the interpreted mutant genome, with the mutant gene without CNA (here, Mutant TP53 without LOH) as reference.

x = met_tropism(x, tumor_type = 'BRCA', gene = 'TP53', metastatic_site = 'Liver')
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> # A tibble: 1 × 7
#>   gene  metastatic_site class                   OR   low    up p.value
#>   <chr> <chr>           <chr>                <dbl> <dbl> <dbl>   <dbl>
#> 1 TP53  Liver           Mutant TP53 with LOH  1.95  1.10  3.63  0.0273

There is odds ratio (OR) of metastatising to the Liver for Mutant TP53 with LOH breast cancer is almost two-fold (OR = 1.9, p.value = 0.03) with respect to mutant samples without LOH.

5.3.2 Tropism of top mutant genes in BRCA to the Liver

We extend this analysis to multiple genes, focusing on the 10 most frequently mutated ones and the top 10 most frequent metastatic sites.

top_sites = x$clinical_data %>% 
  dplyr::group_by(METASTATIC_SITE) %>% 
  dplyr::reframe(N = length(unique(sample))) %>% 
  dplyr::arrange(dplyr::desc(N)) %>% 
  dplyr::slice_head(n = 10) %>% 
  pull(METASTATIC_SITE)

for(g in top_genes[1:10]){
  for(m in top_sites){
   x = met_tropism(x, gene = g, tumor_type = 'BRCA', metastatic_site = m) 
  }
}

5.3.3 Visualising metastatic tropism

INCOMMON provides the function plot_tropism to visualise metastatic tropism odds ratios by metastatic site.

plot_tropism(x = x, tumor_type = 'BRCA')

Interestingly, the complete inactivation of TP53 seems to be related with tropism from primary breast tumours to the Liver and CNS/Brain. Mutations of CDH1 seem to be more frequently without LOH in association with metastasis to the lymphatic system.