Skip to contents
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(INCOMMON)
#> Warning: replacing previous import 'cli::num_ansi_colors' by
#> 'crayon::num_ansi_colors' when loading 'INCOMMON'

In this vignette we carry out survival analysis based on INCOMMON classification of samples of breast cancer (BRCA) patients of the MSK-MetTropsim cohort.

First we prepare the input using function init:

data(MSK_genomic_data)
data(MSK_clinical_data)
data(cancer_gene_census)

x = init(
  genomic_data = MSK_genomic_data,
  clinical_data = MSK_clinical_data %>% filter(tumor_type == 'BRCA'),
  gene_roles = cancer_gene_census
)
#> ── INCOMMON - Inference of copy number and mutation multiplicity in oncology ───
#> 
#> ── Genomic data ──
#> 
#>  Found 25659 samples, with 224939 mutations in 491 genes
#> ! No read counts found for 1393 mutations in 1393 samples
#> ! Gene name not provided for 1393 mutations
#> ! 201 genes could not be assigned a role (TSG or oncogene)
#> 
#> ── Clinical data ──
#> 
#>  Provided clinical features:
#>  sample (required for classification)
#>  purity (required for classification)
#>  tumor_type
#>  OS_MONTHS
#>  OS_STATUS
#>  SAMPLE_TYPE
#>  MET_COUNT
#>  METASTATIC_SITE
#>  MET_SITE_COUNT
#>  PRIMARY_SITE
#>  SUBTYPE_ABBREVIATION
#>  GENE_PANEL
#>  TMB_NONSYNONYMOUS
#>  FGA
#>  AGE_AT_DEATH
#>  Found 2484 matching samples
#>  Found 23175 unmatched samples

print(x)
#> ── [ INCOMMON ]  9916 PASS mutations across 2462 samples, with 286 mutant genes
#>  Average sample purity: 0.42
#>  Average sequencing depth: 681
#> # A tibble: 9,916 × 25
#>    sample    tumor_type purity chr     from     to ref   alt      DP    NV   VAF
#>    <chr>     <chr>       <dbl> <chr>  <dbl>  <dbl> <chr> <chr> <int> <int> <dbl>
#>  1 P-0015535 BRCA          0.3 chr3  1.79e8 1.79e8 G     A       868   167 0.192
#>  2 P-0015535 BRCA          0.3 chr17 3.79e7 3.79e7 T     C      1172   205 0.175
#>  3 P-0015535 BRCA          0.3 chr7  1.41e8 1.41e8 G     A       765   120 0.157
#>  4 P-0015535 BRCA          0.3 chr21 3.62e7 3.62e7 G     A      1006   162 0.161
#>  5 P-0015535 BRCA          0.3 chr16 6.88e7 6.88e7 C     -       774   210 0.271
#>  6 P-0015535 BRCA          0.3 chr17 1.60e7 1.60e7 C     T       764   155 0.203
#>  7 P-0015535 BRCA          0.3 chr19 1.46e7 1.46e7 G     C       544    70 0.129
#>  8 P-0007009 BRCA          0.5 chr19 4.28e7 4.28e7 G     A       852   648 0.761
#>  9 P-0007009 BRCA          0.5 chr14 1.05e8 1.05e8 -     GGCA…  1530   453 0.296
#> 10 P-0013299 BRCA          0.4 chr19 4.59e7 4.59e7 C     T      1542   180 0.117
#> # ℹ 9,906 more rows
#> # ℹ 14 more variables: gene <chr>, gene_role <chr>, OS_MONTHS <dbl>,
#> #   OS_STATUS <dbl>, SAMPLE_TYPE <chr>, MET_COUNT <dbl>, METASTATIC_SITE <chr>,
#> #   MET_SITE_COUNT <dbl>, PRIMARY_SITE <chr>, SUBTYPE_ABBREVIATION <chr>,
#> #   GENE_PANEL <chr>, TMB_NONSYNONYMOUS <dbl>, FGA <dbl>, AGE_AT_DEATH <dbl>

There are 9916 mutations with average sequencing depth 681 across 2462 samples with average purity 0.42.

Classification of 2462 MSK-MetTropism BRCA samples

We then classify the mutations using PCAWG priors and the default entropy cutoff and overdispersion parameter:

x = classify(
  x = x,
  priors = pcawg_priors,
  entropy_cutoff = 0.2,
  rho = 0.01
)
print(x)
#> ── [ INCOMMON ]  9916 PASS mutations across 2462 samples, with 286 mutant genes
#>  Average sample purity: 0.42
#>  Average sequencing depth: 681
#> ── [ INCOMMON ]  Classified mutations with overdispersion parameter 0.01 and ent
#> # A tibble: 9,916 × 18
#>    sample    tumor_type purity chr     from     to ref   alt      DP    NV   VAF
#>    <chr>     <chr>       <dbl> <chr>  <dbl>  <dbl> <chr> <chr> <int> <int> <dbl>
#>  1 P-0015535 BRCA          0.3 chr3  1.79e8 1.79e8 G     A       868   167 0.192
#>  2 P-0015535 BRCA          0.3 chr17 3.79e7 3.79e7 T     C      1172   205 0.175
#>  3 P-0015535 BRCA          0.3 chr7  1.41e8 1.41e8 G     A       765   120 0.157
#>  4 P-0015535 BRCA          0.3 chr21 3.62e7 3.62e7 G     A      1006   162 0.161
#>  5 P-0015535 BRCA          0.3 chr16 6.88e7 6.88e7 C     -       774   210 0.271
#>  6 P-0015535 BRCA          0.3 chr17 1.60e7 1.60e7 C     T       764   155 0.203
#>  7 P-0015535 BRCA          0.3 chr19 1.46e7 1.46e7 G     C       544    70 0.129
#>  8 P-0007009 BRCA          0.5 chr19 4.28e7 4.28e7 G     A       852   648 0.761
#>  9 P-0007009 BRCA          0.5 chr14 1.05e8 1.05e8 -     GGCA…  1530   453 0.296
#> 10 P-0013299 BRCA          0.4 chr19 4.59e7 4.59e7 C     T      1542   180 0.117
#> # ℹ 9,906 more rows
#> # ℹ 7 more variables: gene <chr>, gene_role <chr>, id <chr>, label <chr>,
#> #   state <chr>, posterior <dbl>, entropy <dbl>

There are 4147 heterozygous diploid mutations (HMD), 578 mutations with loss of heterozygosity (LOH), 2018 mutations with copy-neutral LOH (CNLOH), 663 mutations with amplification. In addition, 2510 mutations were classified as Tier-2, either because of entropy being larger than cutoff or because of a low number of mutant alleles relative to the wild-type.

Metastatic propensity of Mutant TP53 with LOH patients

We can analyse the metastatic propensity of primary breast tumor genomes containing TP53 mutations by using function met_propensity. This function implements a logistic regression to fit the Binomial probability of developing metastasis based on the interpreted mutant genome, with the mutant gene without CNA (here, Mutant TP53 without LOH) as reference.

x = met_propensity(x, tumor_type = 'BRCA', gene = 'TP53')
#>  There are 2112 different genotypes
#>  The most abundant genotypes are:
#>  Mutant TP53 with LOH (54 Samples, Frequency 0.02)
#>  Mutant PIK3CA without AMP (33 Samples, Frequency 0.01)
#>  Mutant TP53 without LOH (33 Samples, Frequency 0.01)
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> # A tibble: 1 × 6
#>   gene  class                   OR   low    up p.value
#>   <chr> <chr>                <dbl> <dbl> <dbl>   <dbl>
#> 1 TP53  Mutant TP53 with LOH  1.64  1.12  2.41  0.0105

The odds ratio (OR) of metastatising for Mutant TP53 with LOH breast cancer is 1.6 (p.value = 0.01) with respect to mutant samples without LOH.

Metastatic tropism of Mutant TP53 with LOH patients

We can analyse the metastatic organotropism of metastatic breast tumor genomes containing TP53 mutations by using function met_tropsim. Similarly to the metastatic propensity analysis, this function implements a logistic regression to fit the Binomial probability of developing metastasis towards a specific metastatic site (here the Liver, as example), based on the interpreted mutant genome, with the mutant gene without CNA (here, Mutant TP53 without LOH) as reference.

x = met_tropism(x, tumor_type = 'BRCA', gene = 'TP53', metastic_site = 'Liver')
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> # A tibble: 1 × 6
#>   gene  class                   OR   low    up p.value
#>   <chr> <chr>                <dbl> <dbl> <dbl>   <dbl>
#> 1 TP53  Mutant TP53 with LOH  1.90  1.07  3.54  0.0343

There is odds ratio (OR) of metastatising to the Liver for Mutant TP53 with LOH breast cancer is almost two-fold (OR = 1.9, p.value = 0.03) with respect to mutant samples without LOH.