Skip to contents

SCOUT data are distributed across several Zenodo records organised by data type and SPN. Files are downloaded once and cached locally — repeat calls detect the cache and skip the download automatically.

Cache location

By default files are stored at ~/.cache/SCOUT/<spn>/. Override with:

Sys.setenv(SCOUT_CACHE_DIR = "/scratch/shared/SCOUT")

Zenodo record structure


Archive contents

SPN0X_sequencing.tar.gz

Contains tumour mutation RDS files for all purity and coverage combinations.

sequencing/
└── tumour/
    ├── purity_0.9/data/mutations/seq_results_muts_merged_coverage_50x.rds
    │                             seq_results_muts_merged_coverage_100x.rds
    │                             seq_results_muts_merged_coverage_150x.rds
    │                             seq_results_muts_merged_coverage_200x.rds
    ├── purity_0.6/data/mutations/  (same files)
    └── purity_0.3/data/mutations/  (same files)

SPN0X_normal_sequencing.tar.gz

Contains the matched normal mutation RDS file (fixed at 30x, purity 1).

sequencing/
└── normal/
    └── purity_1/data/mutations/seq_results_muts_merged_coverage_30x.rds

sarek.tar.gz

Contains Sarek variant calling results for all callers and samples.

sarek/
└── <coverage>x_<purity>p/
    └── variant_calling/
        ├── mutect2/<spn>/                         (SPN-level)
        ├── strelka/<sample>_vs_normal_sample/
        ├── freebayes/<sample>_vs_normal_sample/
        │           normal_sample/
        ├── haplotypecaller/normal_sample/
        ├── ascat/<sample>_vs_normal_sample/
        ├── battenberg/<sample>_vs_normal_sample/
        ├── sequenza/<sample>_vs_normal_sample/
        └── cnvkit/<sample>_vs_normal_sample/

tumourevo.tar.gz

Contains tumourevo pipeline results for all tools and samples.

tumourevo/
└── <coverage>x_<purity>p_<vcf_caller>_<cna_caller>/
    ├── formatter/
    │   ├── vcf2cnaqc/SCOUT/<spn>/<sample>/        (SNV RDS per sample)
    │   ├── cna2cnaqc/SCOUT/<spn>/<sample>/        (CNA RDS per sample)
    │   └── cnaqc2tsv/SCOUT/<spn>/                 (joint table TSV)
    ├── variant_annotation/vep/SCOUT/<spn>/<sample>/
    ├── driver_annotation/annotate_driver/SCOUT/<spn>/<sample>/
    ├── qc/
    │   ├── cnaqc/SCOUT/<spn>/<sample>/
    │   ├── join_cnaqc/SCOUT/<spn>/
    │   └── tinc/SCOUT/<spn>/<sample>/
    ├── subclonal_deconvolution/
    │   ├── mobster/SCOUT/<spn>/<sample>/
    │   ├── viber/SCOUT/<spn>/
    │   ├── pyclonevi/SCOUT/<spn>/
    │   └── ctree/SCOUT/<spn>/  (and per sample)
    └── signature_deconvolution/
        ├── BASCULE/SCOUT/
        ├── sigprofiler/SCOUT/results/<context>/
        └── sparsesignatures/SCOUT/

Downloading data

get_sequencing_data()

Downloads SPN0X_sequencing.tar.gz from the SPN’s sequencing record. Contains tumour mutation RDS files for all purity and coverage combinations.

get_normal_data()

Downloads SPN0X_normal.tar.gz from the shared normal record. Contains haplotypecaller and freebayes VCF files for the normal sample.

get_sarek_results()

Downloads sarek.tar.gz for a given SPN and purity.

get_sarek_results("SPN01", purity = 0.9)
get_sarek_results("SPN01", purity = 0.6)
get_sarek_results("SPN01", purity = 0.3)

get_tumourevo_results()

Downloads tumourevo.tar.gz for a given SPN and purity.

get_tumourevo_results("SPN01", purity = 0.9)

list_zenodo_files()

Inspect what is available in a record before downloading.

list_zenodo_files("1234567")
#> # A tibble: 2 × 3
#>   filename                size download_url
#>   <chr>                  <int> <chr>
#> 1 sarek.tar.gz             ...  https://zenodo.org/...
#> 2 tumourevo.tar.gz         ...  https://zenodo.org/...

Ground truth getters

After get_sequencing_data() or get_normal_data(), use these functions to access specific files without navigating the directory structure.

get_mutations()

Returns the path to the mutations RDS file for a given type, coverage and purity.

# Tumour (requires coverage and purity)
path <- get_mutations("SPN04", type = "tumour", coverage = 100, purity = 0.9)
readRDS(path)

# Normal (fixed at 30x, purity 1)
path <- get_mutations("SPN04", type = "normal")

Sarek getters

After get_sarek_results() or get_normal_data(), these functions return named lists of file paths for a given sample, coverage, purity and caller.

get_sarek_vcf()

Returns a named list of VCF and index file paths. Supported callers: "mutect2", "strelka", "freebayes", "haplotypecaller". Sample naming convention: "SPN04_1.1".

# mutect2 — SPN-level result, no sample needed
vcf <- get_sarek_vcf("SPN04", NULL, 100, 0.3, "mutect2")
vcf$vcf
vcf$tbi

# strelka — one somatic VCF
vcf <- get_sarek_vcf("SPN04", "SPN04_1.1", 100, 0.3, "strelka")
vcf$vcf

# freebayes
vcf <- get_sarek_vcf("SPN04", "SPN04_1.1", 100, 0.3, "freebayes")
vcf$vcf

# haplotypecaller — normal sample (sample = NULL)
vcf <- get_sarek_vcf("SPN04", NULL, 100, 0.3, "haplotypecaller")
vcf$vcf

get_sarek_cna()

Returns a named list of CNA file paths. Supported callers: "ascat", "battenberg", "sequenza", "cnvkit".

# ASCAT
cna <- get_sarek_cna("SPN04", "SPN04_1.1", 100, 0.3, "ascat")
cna$segments
cna$purityploidy
cna$cnvs

# Battenberg
cna <- get_sarek_cna("SPN04", "SPN04_1.1", 100, 0.3, "battenberg")
cna$subclones
cna$rho_and_psi

# Sequenza
cna <- get_sarek_cna("SPN04", "SPN04_1.1", 100, 0.3, "sequenza")
cna$segments
cna$confints_CP

# CNVkit
cna <- get_sarek_cna("SPN04", "SPN04_1.1", 100, 0.3, "cnvkit")
cna$cns

tumourevo getters

After get_tumourevo_results(), these functions return named lists of file paths. All require spn, coverage, purity, vcf_caller ("mutect2" or "strelka"), and cna_caller ("ascat" or "sequenza"). Sample naming convention: "SPN04_1.1".

Formatter

# Formatted SNV RDS (vcf2cnaqc)
get_tumourevo_snv("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")

# Formatted CNA RDS (cna2cnaqc)
get_tumourevo_cna("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")

# Joint CNAqc table (cnaqc2tsv) — one file per combination
get_tumourevo_joint_table("SPN04", 50, 0.6, "mutect2", "sequenza")

Variant annotation

# VEP annotated VCF
get_tumourevo_vep("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")

Driver annotation

# One sample
get_tumourevo_driver("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")

# All samples (sample = NULL)
get_tumourevo_driver("SPN04", 50, 0.6, "mutect2", "sequenza")

Subclonal deconvolution

Supported tools: "mobster", "pyclonevi", "ctree", "viber".

# mobster — sample required
get_tumourevo_subclonal("SPN04", 50, 0.6, "mutect2", "sequenza",
                         "mobster", "SPN04_1.1")

# viber — SPN-level result, no sample needed
get_tumourevo_subclonal("SPN04", 50, 0.6, "mutect2", "sequenza", "viber")

# ctree — SPN-level trees (sample = NULL) or per-sample
get_tumourevo_subclonal("SPN04", 50, 0.6, "mutect2", "sequenza", "ctree")
get_tumourevo_subclonal("SPN04", 50, 0.6, "mutect2", "sequenza",
                         "ctree", "SPN04_1.1")

QC

Supported tools: "cnaqc", "join_cnaqc", "tinc".

# cnaqc / tinc — sample required
get_tumourevo_qc("SPN04", 50, 0.6, "mutect2", "sequenza", "cnaqc", "SPN04_1.1")
get_tumourevo_qc("SPN04", 50, 0.6, "mutect2", "sequenza", "tinc",  "SPN04_1.1")

# join_cnaqc — SPN-level, no sample needed
get_tumourevo_qc("SPN04", 50, 0.6, "mutect2", "sequenza", "join_cnaqc")

Signature deconvolution

Supported tools: "sigprofiler", "sparsesignatures", "BASCULE". sigprofiler requires a context argument (e.g. "SBS96", "ID83", "DBS78").

# BASCULE
sigs <- get_tumourevo_signatures("SPN04", 50, 0.6, "mutect2", "sequenza",
                                  "BASCULE")
sigs$refined_fit
sigs$base_fit

# SigProfiler
sigs <- get_tumourevo_signatures("SPN04", 50, 0.6, "mutect2", "sequenza",
                                  "sigprofiler", context = "SBS96")
sigs$context_matrix
sigs$COSMIC_exposure
sigs$denovo_signatures