
Zenodo
zenodo.RmdSCOUT data are distributed across several Zenodo records organised by data type and SPN. Files are downloaded once and cached locally — repeat calls detect the cache and skip the download automatically.
Cache location
By default files are stored at
~/.cache/SCOUT/<spn>/. Override with:
Sys.setenv(SCOUT_CACHE_DIR = "/scratch/shared/SCOUT")Archive contents
SPN0X_sequencing.tar.gz
Contains tumour mutation RDS files for all purity and coverage combinations.
sequencing/
└── tumour/
├── purity_0.9/data/mutations/seq_results_muts_merged_coverage_50x.rds
│ seq_results_muts_merged_coverage_100x.rds
│ seq_results_muts_merged_coverage_150x.rds
│ seq_results_muts_merged_coverage_200x.rds
├── purity_0.6/data/mutations/ (same files)
└── purity_0.3/data/mutations/ (same files)
SPN0X_normal_sequencing.tar.gz
Contains the matched normal mutation RDS file (fixed at 30x, purity 1).
sequencing/
└── normal/
└── purity_1/data/mutations/seq_results_muts_merged_coverage_30x.rds
sarek.tar.gz
Contains Sarek variant calling results for all callers and samples.
sarek/
└── <coverage>x_<purity>p/
└── variant_calling/
├── mutect2/<spn>/ (SPN-level)
├── strelka/<sample>_vs_normal_sample/
├── freebayes/<sample>_vs_normal_sample/
│ normal_sample/
├── haplotypecaller/normal_sample/
├── ascat/<sample>_vs_normal_sample/
├── battenberg/<sample>_vs_normal_sample/
├── sequenza/<sample>_vs_normal_sample/
└── cnvkit/<sample>_vs_normal_sample/
tumourevo.tar.gz
Contains tumourevo pipeline results for all tools and samples.
tumourevo/
└── <coverage>x_<purity>p_<vcf_caller>_<cna_caller>/
├── formatter/
│ ├── vcf2cnaqc/SCOUT/<spn>/<sample>/ (SNV RDS per sample)
│ ├── cna2cnaqc/SCOUT/<spn>/<sample>/ (CNA RDS per sample)
│ └── cnaqc2tsv/SCOUT/<spn>/ (joint table TSV)
├── variant_annotation/vep/SCOUT/<spn>/<sample>/
├── driver_annotation/annotate_driver/SCOUT/<spn>/<sample>/
├── qc/
│ ├── cnaqc/SCOUT/<spn>/<sample>/
│ ├── join_cnaqc/SCOUT/<spn>/
│ └── tinc/SCOUT/<spn>/<sample>/
├── subclonal_deconvolution/
│ ├── mobster/SCOUT/<spn>/<sample>/
│ ├── viber/SCOUT/<spn>/
│ ├── pyclonevi/SCOUT/<spn>/
│ └── ctree/SCOUT/<spn>/ (and per sample)
└── signature_deconvolution/
├── BASCULE/SCOUT/
├── sigprofiler/SCOUT/results/<context>/
└── sparsesignatures/SCOUT/
Downloading data
get_sequencing_data()
Downloads SPN0X_sequencing.tar.gz from the SPN’s
sequencing record. Contains tumour mutation RDS files for all purity and
coverage combinations.
get_sequencing_data("SPN04")
get_normal_data()
Downloads SPN0X_normal.tar.gz from the shared normal
record. Contains haplotypecaller and freebayes VCF files for the normal
sample.
get_normal_data("SPN04")
get_sarek_results()
Downloads sarek.tar.gz for a given SPN and purity.
get_sarek_results("SPN01", purity = 0.9)
get_sarek_results("SPN01", purity = 0.6)
get_sarek_results("SPN01", purity = 0.3)
get_tumourevo_results()
Downloads tumourevo.tar.gz for a given SPN and
purity.
get_tumourevo_results("SPN01", purity = 0.9)
list_zenodo_files()
Inspect what is available in a record before downloading.
list_zenodo_files("1234567")
#> # A tibble: 2 × 3
#> filename size download_url
#> <chr> <int> <chr>
#> 1 sarek.tar.gz ... https://zenodo.org/...
#> 2 tumourevo.tar.gz ... https://zenodo.org/...Ground truth getters
After get_sequencing_data() or
get_normal_data(), use these functions to access specific
files without navigating the directory structure.
get_mutations()
Returns the path to the mutations RDS file for a given type, coverage and purity.
# Tumour (requires coverage and purity)
path <- get_mutations("SPN04", type = "tumour", coverage = 100, purity = 0.9)
readRDS(path)
# Normal (fixed at 30x, purity 1)
path <- get_mutations("SPN04", type = "normal")Sarek getters
After get_sarek_results() or
get_normal_data(), these functions return named lists of
file paths for a given sample, coverage, purity and caller.
get_sarek_vcf()
Returns a named list of VCF and index file paths. Supported callers:
"mutect2", "strelka",
"freebayes", "haplotypecaller". Sample naming
convention: "SPN04_1.1".
# mutect2 — SPN-level result, no sample needed
vcf <- get_sarek_vcf("SPN04", NULL, 100, 0.3, "mutect2")
vcf$vcf
vcf$tbi
# strelka — one somatic VCF
vcf <- get_sarek_vcf("SPN04", "SPN04_1.1", 100, 0.3, "strelka")
vcf$vcf
# freebayes
vcf <- get_sarek_vcf("SPN04", "SPN04_1.1", 100, 0.3, "freebayes")
vcf$vcf
# haplotypecaller — normal sample (sample = NULL)
vcf <- get_sarek_vcf("SPN04", NULL, 100, 0.3, "haplotypecaller")
vcf$vcf
get_sarek_cna()
Returns a named list of CNA file paths. Supported callers:
"ascat", "battenberg",
"sequenza", "cnvkit".
# ASCAT
cna <- get_sarek_cna("SPN04", "SPN04_1.1", 100, 0.3, "ascat")
cna$segments
cna$purityploidy
cna$cnvs
# Battenberg
cna <- get_sarek_cna("SPN04", "SPN04_1.1", 100, 0.3, "battenberg")
cna$subclones
cna$rho_and_psi
# Sequenza
cna <- get_sarek_cna("SPN04", "SPN04_1.1", 100, 0.3, "sequenza")
cna$segments
cna$confints_CP
# CNVkit
cna <- get_sarek_cna("SPN04", "SPN04_1.1", 100, 0.3, "cnvkit")
cna$cnstumourevo getters
After get_tumourevo_results(), these functions return
named lists of file paths. All require spn,
coverage, purity, vcf_caller
("mutect2" or "strelka"), and
cna_caller ("ascat" or
"sequenza"). Sample naming convention:
"SPN04_1.1".
Formatter
# Formatted SNV RDS (vcf2cnaqc)
get_tumourevo_snv("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")
# Formatted CNA RDS (cna2cnaqc)
get_tumourevo_cna("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")
# Joint CNAqc table (cnaqc2tsv) — one file per combination
get_tumourevo_joint_table("SPN04", 50, 0.6, "mutect2", "sequenza")Variant annotation
# VEP annotated VCF
get_tumourevo_vep("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")Driver annotation
# One sample
get_tumourevo_driver("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")
# All samples (sample = NULL)
get_tumourevo_driver("SPN04", 50, 0.6, "mutect2", "sequenza")Subclonal deconvolution
Supported tools: "mobster", "pyclonevi",
"ctree", "viber".
# mobster — sample required
get_tumourevo_subclonal("SPN04", 50, 0.6, "mutect2", "sequenza",
"mobster", "SPN04_1.1")
# viber — SPN-level result, no sample needed
get_tumourevo_subclonal("SPN04", 50, 0.6, "mutect2", "sequenza", "viber")
# ctree — SPN-level trees (sample = NULL) or per-sample
get_tumourevo_subclonal("SPN04", 50, 0.6, "mutect2", "sequenza", "ctree")
get_tumourevo_subclonal("SPN04", 50, 0.6, "mutect2", "sequenza",
"ctree", "SPN04_1.1")QC
Supported tools: "cnaqc", "join_cnaqc",
"tinc".
# cnaqc / tinc — sample required
get_tumourevo_qc("SPN04", 50, 0.6, "mutect2", "sequenza", "cnaqc", "SPN04_1.1")
get_tumourevo_qc("SPN04", 50, 0.6, "mutect2", "sequenza", "tinc", "SPN04_1.1")
# join_cnaqc — SPN-level, no sample needed
get_tumourevo_qc("SPN04", 50, 0.6, "mutect2", "sequenza", "join_cnaqc")Signature deconvolution
Supported tools: "sigprofiler",
"sparsesignatures", "BASCULE".
sigprofiler requires a context argument
(e.g. "SBS96", "ID83",
"DBS78").
# BASCULE
sigs <- get_tumourevo_signatures("SPN04", 50, 0.6, "mutect2", "sequenza",
"BASCULE")
sigs$refined_fit
sigs$base_fit
# SigProfiler
sigs <- get_tumourevo_signatures("SPN04", 50, 0.6, "mutect2", "sequenza",
"sigprofiler", context = "SBS96")
sigs$context_matrix
sigs$COSMIC_exposure
sigs$denovo_signatures