5. Analysis of metastatic patterns of MSK-MetTropism
Source:vignettes/a5_metastasis_analysis.Rmd
a5_metastasis_analysis.Rmd
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(INCOMMON)
#> Warning: replacing previous import 'cli::num_ansi_colors' by
#> 'crayon::num_ansi_colors' when loading 'INCOMMON'
In this vignette we carry out survival analysis based on INCOMMON classification of samples of breast cancer (BRCA) patients of the MSK-MetTropsim cohort.
5.1 Classification of 2462 breast cancer samples
In order to study patterns of metastatisation (propensity and organotropism) related to the INCOMMON interpreted genomes, we first need to classify all the mutations in these samples.
5.1.1 Input intialisation
First we prepare the input using function init
:
data(MSK_genomic_data)
data(MSK_clinical_data)
data(cancer_gene_census)
x = init(
genomic_data = MSK_genomic_data,
clinical_data = MSK_clinical_data %>% filter(tumor_type == 'BRCA'),
gene_roles = cancer_gene_census
)
#> ── INCOMMON - Inference of copy number and mutation multiplicity in oncology ───
#>
#> ── Genomic data ──
#>
#> ✔ Found 25659 samples, with 224939 mutations in 491 genes
#> ! No read counts found for 1393 mutations in 1393 samples
#> ! Gene name not provided for 1393 mutations
#> ! 201 genes could not be assigned a role (TSG or oncogene)
#>
#> ── Clinical data ──
#>
#> ℹ Provided clinical features:
#> ✔ sample (required for classification)
#> ✔ purity (required for classification)
#> ✔ tumor_type
#> ✔ OS_MONTHS
#> ✔ OS_STATUS
#> ✔ SAMPLE_TYPE
#> ✔ MET_COUNT
#> ✔ METASTATIC_SITE
#> ✔ MET_SITE_COUNT
#> ✔ PRIMARY_SITE
#> ✔ SUBTYPE_ABBREVIATION
#> ✔ GENE_PANEL
#> ✔ SEX
#> ✔ TMB_NONSYNONYMOUS
#> ✔ FGA
#> ✔ AGE_AT_SEQUENCING
#> ✔ RACE
#> ✔ Found 2562 matching samples
#> ✖ Found 23104 unmatched samples
print(x)
#> ── [ INCOMMON ] 9916 PASS mutations across 2462 samples, with 286 mutant genes
#> ℹ Average sample purity: 0.42
#> ℹ Average sequencing depth: 681
#> # A tibble: 9,916 × 27
#> sample tumor_type purity chr from to ref alt DP NV VAF
#> <chr> <chr> <dbl> <chr> <dbl> <dbl> <chr> <chr> <int> <int> <dbl>
#> 1 P-0015535 BRCA 0.3 chr3 1.79e8 1.79e8 G A 868 167 0.192
#> 2 P-0015535 BRCA 0.3 chr17 3.79e7 3.79e7 T C 1172 205 0.175
#> 3 P-0015535 BRCA 0.3 chr7 1.41e8 1.41e8 G A 765 120 0.157
#> 4 P-0015535 BRCA 0.3 chr21 3.62e7 3.62e7 G A 1006 162 0.161
#> 5 P-0015535 BRCA 0.3 chr16 6.88e7 6.88e7 C - 774 210 0.271
#> 6 P-0015535 BRCA 0.3 chr17 1.60e7 1.60e7 C T 764 155 0.203
#> 7 P-0015535 BRCA 0.3 chr19 1.46e7 1.46e7 G C 544 70 0.129
#> 8 P-0007009 BRCA 0.5 chr19 4.28e7 4.28e7 G A 852 648 0.761
#> 9 P-0007009 BRCA 0.5 chr14 1.05e8 1.05e8 - GGCA… 1530 453 0.296
#> 10 P-0013299 BRCA 0.4 chr19 4.59e7 4.59e7 C T 1542 180 0.117
#> # ℹ 9,906 more rows
#> # ℹ 16 more variables: gene <chr>, gene_role <chr>, OS_MONTHS <dbl>,
#> # OS_STATUS <dbl>, SAMPLE_TYPE <chr>, MET_COUNT <dbl>, METASTATIC_SITE <chr>,
#> # MET_SITE_COUNT <dbl>, PRIMARY_SITE <chr>, SUBTYPE_ABBREVIATION <chr>,
#> # GENE_PANEL <chr>, SEX <chr>, TMB_NONSYNONYMOUS <dbl>, FGA <dbl>,
#> # AGE_AT_SEQUENCING <dbl>, RACE <chr>
There are 9916 mutations with average sequencing depth 681 across 2462 samples with average purity 0.42.
5.1.2 Classification
We then classify the mutations using PCAWG priors and the default entropy cutoff and overdispersion parameter:
x = classify(
x = x,
priors = INCOMMON::pcawg_priors,
entropy_cutoff = 0.2,
rho = 0.01
# parallel = TRUE, # uncomment these to run in parallel
# num_cores = 8
)
print(x)
#> ── [ INCOMMON ] 9916 PASS mutations across 2462 samples, with 286 mutant genes
#> ℹ Average sample purity: 0.42
#> ℹ Average sequencing depth: 681
#> ── [ INCOMMON ] Classified mutations with overdispersion parameter 0.01 and ent
#> ℹ There are:
#> • N = 4238 mutations (HMD)
#> • N = 581 mutations (LOH)
#> • N = 2019 mutations (CNLOH)
#> • N = 666 mutations (AM)
#> • N = 2412 mutations (Tier-2)
#> # A tibble: 9,916 × 18
#> sample tumor_type purity chr from to ref alt DP NV VAF
#> <chr> <chr> <dbl> <chr> <dbl> <dbl> <chr> <chr> <int> <int> <dbl>
#> 1 P-0015535 BRCA 0.3 chr3 1.79e8 1.79e8 G A 868 167 0.192
#> 2 P-0015535 BRCA 0.3 chr17 3.79e7 3.79e7 T C 1172 205 0.175
#> 3 P-0015535 BRCA 0.3 chr7 1.41e8 1.41e8 G A 765 120 0.157
#> 4 P-0015535 BRCA 0.3 chr21 3.62e7 3.62e7 G A 1006 162 0.161
#> 5 P-0015535 BRCA 0.3 chr16 6.88e7 6.88e7 C - 774 210 0.271
#> 6 P-0015535 BRCA 0.3 chr17 1.60e7 1.60e7 C T 764 155 0.203
#> 7 P-0015535 BRCA 0.3 chr19 1.46e7 1.46e7 G C 544 70 0.129
#> 8 P-0007009 BRCA 0.5 chr19 4.28e7 4.28e7 G A 852 648 0.761
#> 9 P-0007009 BRCA 0.5 chr14 1.05e8 1.05e8 - GGCA… 1530 453 0.296
#> 10 P-0013299 BRCA 0.4 chr19 4.59e7 4.59e7 C T 1542 180 0.117
#> # ℹ 9,906 more rows
#> # ℹ 7 more variables: gene <chr>, gene_role <chr>, id <chr>, label <chr>,
#> # state <chr>, posterior <dbl>, entropy <dbl>
There are 4147 heterozygous diploid mutations (HMD), 578 mutations with loss of heterozygosity (LOH), 2018 mutations with copy-neutral LOH (CNLOH), 663 mutations with amplification. In addition, 2510 mutations were classified as Tier-2, either because of entropy being larger than cutoff or because of a low number of mutant alleles relative to the wild-type.
5.2 Metastatic propensity of BRCA samples
5.2.1 Metastatic propensity of fully incactivated TP53
We can analyse the metastatic propensity of primary breast tumor
genomes containing TP53 mutations by using function
met_propensity
. This function implements a logistic
regression to fit the Binomial probability of developing metastasis
based on the interpreted mutant genome, with the mutant gene without CNA
(here, Mutant TP53 without LOH) as reference.
x = met_propensity(x, tumor_type = 'BRCA', gene = 'TP53')
#> ℹ There are 2115 different genotypes
#> ℹ The most abundant genotypes are:
#> • Mutant TP53 with LOH (54 Samples, Frequency 0.02)
#> • Mutant TP53 without LOH (36 Samples, Frequency 0.01)
#> • Mutant PIK3CA without AMP (33 Samples, Frequency 0.01)
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> # A tibble: 1 × 6
#> gene class OR low up p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 TP53 Mutant TP53 with LOH 1.63 1.11 2.38 0.0118
From this analysis it emerges that Mutant TP53 with LOH patients have more than 150% increased risk to metastatise (OR = 1.64, p.value = 0.01) with respect to mutant samples without LOH.
5.2.2 Metastatic propensity for the top mutant genes in BRCA
We extend this analysis to multiple genes, focusing on the 50 most frequently mutated ones.
top_genes = classification(x) %>%
dplyr::filter(state != 'Tier-2') %>%
dplyr::group_by(gene) %>%
dplyr::reframe(N = length(unique(sample))) %>%
dplyr::arrange(dplyr::desc(N)) %>%
dplyr::slice_head(n = 50) %>%
pull(gene)
print(top_genes)
for(g in top_genes){
x = met_propensity(x, tumor_type = 'BRCA', gene = g)
}
5.2.2 Visualising metastatic propensity odds ratio
INCOMMON provides the function plot_met_volcano
to
visualise metastatic propensity odds ratios in a volcano plot.
plot_met_volcano(x = x, tumor_type = 'BRCA')
#> Warning: Removed 14 rows containing missing values or values outside the scale range
#> (`geom_point()`).
In addition to TP53, among the 50 most frequently mutant genes in BRCA the complete inactivation (Mutation with LOH) of ARID1A significantly increases the risk of metastasis (OR = 9.33, p.value = 0.04), whereas among the oncogenes, only for PIK3CA the full activation (Mutation with AMP) leads to higher risks of metastases (OR = 2.00, p.value = 0.0007).
5.3 Metastatic tropism of BRCA samples
5.3.1 Tropism of fully inactivated TP53 BRCA samples to the Liver
We can analyse the metastatic organotropism of metastatic breast
tumor genomes containing TP53 mutations by using function
met_tropsim
. Similarly to the metastatic propensity
analysis, this function implements a logistic regression to fit the
Binomial probability of developing metastasis towards a specific
metastatic site (here the Liver, as example), based on the interpreted
mutant genome, with the mutant gene without CNA (here, Mutant TP53
without LOH) as reference.
x = met_tropism(x, tumor_type = 'BRCA', gene = 'TP53', metastatic_site = 'Liver')
#> Waiting for profiling to be done...
#> Waiting for profiling to be done...
#> # A tibble: 1 × 7
#> gene metastatic_site class OR low up p.value
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 TP53 Liver Mutant TP53 with LOH 1.95 1.10 3.63 0.0273
There is odds ratio (OR) of metastatising to the Liver for Mutant TP53 with LOH breast cancer is almost two-fold (OR = 1.9, p.value = 0.03) with respect to mutant samples without LOH.
5.3.2 Tropism of top mutant genes in BRCA to the Liver
We extend this analysis to multiple genes, focusing on the 10 most frequently mutated ones and the top 10 most frequent metastatic sites.
top_sites = x$clinical_data %>%
dplyr::group_by(METASTATIC_SITE) %>%
dplyr::reframe(N = length(unique(sample))) %>%
dplyr::arrange(dplyr::desc(N)) %>%
dplyr::slice_head(n = 10) %>%
pull(METASTATIC_SITE)
for(g in top_genes[1:10]){
for(m in top_sites){
x = met_tropism(x, gene = g, tumor_type = 'BRCA', metastatic_site = m)
}
}
5.3.3 Visualising metastatic tropism
INCOMMON provides the function plot_tropism
to visualise
metastatic tropism odds ratios by metastatic site.
plot_tropism(x = x, tumor_type = 'BRCA')
Interestingly, the complete inactivation of TP53 seems to be related with tropism from primary breast tumours to the Liver and CNS/Brain. Mutations of CDH1 seem to be more frequently without LOH in association with metastasis to the lymphatic system.