
Getting started with SCOUT
getting_started.RmdThe SCOUT package gives you direct access to the
Simulated Cohort of Universal Tumours from R.
| Source | What is stored there | Key functions |
|---|---|---|
| Tables | Cohort metadata, ground truth tables |
get_metadata(), get_ground_truth_cna(),
get_ground_truth_drivers(),
get_ground_truth_exposures(),
get_sampling_information()
|
| Zenodo | Sequencing RDS files, normal and tumour sarek results, tumourevo results |
get_sequencing_data(), get_normal_data(),
get_sarek_results(),
get_tumourevo_results()
|
| ENA (PRJEB97253) | Raw paired-end FASTQ files — 144 tumour + 14 normal (200× per sample, subsamplable to 50×/100×/150×) | — see Raw FASTQ data article |
Tables
Cohort metadata and ground truth tables are stored as public Google Sheets.
| Function | Description |
|---|---|
get_metadata() |
Sample-level annotations (tumour type, clonal class, WGD, sex, …) |
get_ground_truth_cna() |
Ground truth copy number segments |
get_ground_truth_drivers() |
Ground truth driver events (SNVs, CNAs, WGD) |
get_ground_truth_exposures() |
Ground truth mutational signature exposures |
get_sampling_information() |
Per-sample clone proportions and sampling time |
get_sample_names() |
Sample names for a given SPN |
get_tumour_type() |
Tumour type for a given SPN |
get_gender() |
Sex chromosome for a given SPN |
All table functions accept optional spn and
sample arguments:
get_metadata()
get_ground_truth_cna("SPN01")
get_ground_truth_cna("SPN01", sample = "1.1")
get_ground_truth_drivers("SPN01")
get_ground_truth_exposures("SPN01", type = "SBS")
get_sampling_information("SPN01")
get_sample_names("SPN01")
get_tumour_type("SPN01")
get_gender("SPN01")See the Tables article for the full column-level reference.
Zenodo
Data on Zenodo are organised as follows:
| Data type | Content |
|---|---|
| Sequencing ground truth | One record per SPN — SPN0X_sequencing.tar.gz
|
| Normal sarek outputs | One shared record — SPN0X_normal.tar.gz per SPN |
| Sarek + tumourevo (SPN01–06) | One record per SPN per purity (0.9, 0.6,
0.3) |
| SPN07 sarek | One record per purity |
| SPN07 tumourevo | One record for purity 0.9 + 0.6, one for 0.3 |
For the full list of record IDs see the Zenodo article.
Files are downloaded once and cached at
~/.cache/SCOUT/<spn>/. Override the cache root with
SCOUT_CACHE_DIR:
Sys.setenv(SCOUT_CACHE_DIR = "/scratch/shared/SCOUT")Download functions
# Tumour sequencing ground truth (all purities and coverages)
get_sequencing_data("SPN04")
# Normal sarek VCF outputs
get_normal_data("SPN04")
# Sarek and tumourevo pipeline results for a given purity
get_sarek_results("SPN04", purity = 0.9)
get_tumourevo_results("SPN04", purity = 0.9)Getter functions
Once downloaded, dedicated getters resolve file paths without manual directory navigation:
# Ground truth mutations
get_mutations("SPN04", type = "tumour", coverage = 100, purity = 0.9)
get_mutations("SPN04", type = "normal")
# Sarek VCF and CNA files
get_sarek_vcf("SPN04", "SPN04_1.1", 100, 0.9, "mutect2", "tumour")
get_sarek_cna("SPN04", "SPN04_1.1", 100, 0.9, "ascat")
# tumourevo outputs
get_tumourevo_snv("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")
get_tumourevo_cna("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")
get_tumourevo_driver("SPN04", 50, 0.6, "mutect2", "sequenza", "SPN04_1.1")
get_tumourevo_subclonal("SPN04", 50, 0.6, "mutect2", "sequenza", "mobster", "SPN04_1.1")
get_tumourevo_qc("SPN04", 50, 0.6, "mutect2", "sequenza", "cnaqc", "SPN04_1.1")
get_tumourevo_signatures("SPN04", 50, 0.6, "mutect2", "sequenza", "BASCULE")See the Zenodo article for the full function reference.
Raw FASTQ data
Raw paired-end FASTQ files are available in the European Nucleotide
Archive under accession PRJEB97253. There are 79
entries in total: 72 tumour sample × purity combinations across all
SPNs, plus 7 normal samples (one per SPN). Each entry contains files
named tXX_{spn}_{sample}.R1.fastq.gz for
t00–t39 (40 bins × 5× = 200× total coverage
per sample).
Standard coverage levels can be reproduced by subsetting consecutive bins with SeqKit:
| Coverage | Bins |
|---|---|
| 50× |
t00–t09
|
| 100× |
t00–t19
|
| 150× |
t00–t29
|
See the Raw FASTQ data article for installation instructions and the full subsampling commands.