Filter observed counts by quantile.
filter_counts_by_quantile(x, upper_quantile = 0.98)
An input RNA/ATAC dataset where entries are indexeable by genomic coordinate: "chr", "from" and "to".
The maximum quantile to determine cuts. If a value is above the quantile it is removed
The input data with removed entries.
data('example_input')
filter_counts_by_quantile(example_input$rna, upper_quantile = .98)
#> ── Upper quantile 0.98
#> ℹ n = 5961 entries to remove
#>
#> # A tibble: 5,961 × 8
#> gene chr from to cell value q_max del
#> <chr> <chr> <int> <int> <chr> <int> <dbl> <lgl>
#> 1 ACAP3 chr1 1292390 1309609 bcc.su008.pre.tumor_AAGG… 2 1.88 TRUE
#> 2 ZBTB40 chr1 22428838 22531157 bcc.su008.pre.tumor_AAGG… 2 1.8 TRUE
#> 3 SRRM1 chr1 24631716 24673281 bcc.su008.pre.tumor_AAGG… 12 11.8 TRUE
#> 4 TXNDC12 chr1 52020131 52055191 bcc.su008.pre.tumor_AAGG… 6 5.42 TRUE
#> 5 ALG6 chr1 63367575 63438553 bcc.su008.pre.tumor_AAGG… 3 2.66 TRUE
#> 6 SSX2IP chr1 84643706 84690803 bcc.su008.pre.tumor_AAGG… 3 2.78 TRUE
#> 7 MTF2 chr1 93079235 93139079 bcc.su008.pre.tumor_AAGG… 4 3.34 TRUE
#> 8 GPSM2 chr1 108875350 108934545 bcc.su008.pre.tumor_AAGG… 8 6.5 TRUE
#> 9 CELSR2 chr1 109249539 109275751 bcc.su008.pre.tumor_AAGG… 4 3.32 TRUE
#> 10 RBM15 chr1 110338506 110346681 bcc.su008.pre.tumor_AAGG… 5 4.16 TRUE
#> # ℹ 5,951 more rows
#> # A tibble: 195,498 × 6
#> gene chr from to cell value
#> <chr> <chr> <int> <int> <chr> <int>
#> 1 NOC2L chr1 944203 959309 bcc.su008.pre.tumor_AAGGCAGTCACCGTAA 2
#> 2 AGRN chr1 1020120 1056118 bcc.su008.pre.tumor_AAGGCAGTCACCGTAA 1
#> 3 SDF4 chr1 1216909 1232067 bcc.su008.pre.tumor_AAGGCAGTCACCGTAA 1
#> 4 CPTP chr1 1324756 1328896 bcc.su008.pre.tumor_AAGGCAGTCACCGTAA 1
#> 5 AURKAIP1 chr1 1373730 1375495 bcc.su008.pre.tumor_AAGGCAGTCACCGTAA 1
#> 6 CCNL2 chr1 1385711 1399335 bcc.su008.pre.tumor_AAGGCAGTCACCGTAA 2
#> 7 MRPL20 chr1 1401909 1407293 bcc.su008.pre.tumor_AAGGCAGTCACCGTAA 1
#> 8 CDK11B chr1 1635225 1659012 bcc.su008.pre.tumor_AAGGCAGTCACCGTAA 1
#> 9 CDK11A chr1 1702379 1724357 bcc.su008.pre.tumor_AAGGCAGTCACCGTAA 1
#> 10 WRAP73 chr1 3630767 3652761 bcc.su008.pre.tumor_AAGGCAGTCACCGTAA 2
#> # ℹ 195,488 more rows
filter_counts_by_quantile(example_input$atac, upper_quantile = .98)
#>
#> ── Upper quantile 0.98
#> ℹ n = 60036 entries to remove
#>
#> # A tibble: 60,036 × 7
#> cell value chr from to q_max del
#> <chr> <int> <chr> <int> <int> <dbl> <lgl>
#> 1 SU008_Tumor_Pre_45 2 chr1 871996 872496 1.96 TRUE
#> 2 SU008_Tumor_Pre_45 4 chr1 937126 937626 3.88 TRUE
#> 3 SU008_Tumor_Pre_45 2 chr1 1050579 1051079 1.96 TRUE
#> 4 SU008_Tumor_Pre_45 4 chr1 1138211 1138711 3.92 TRUE
#> 5 SU008_Tumor_Pre_45 4 chr1 1176170 1176670 3.96 TRUE
#> 6 SU008_Tumor_Pre_45 5 chr1 1186185 1186685 4.94 TRUE
#> 7 SU008_Tumor_Pre_45 6 chr1 1238592 1239092 5.86 TRUE
#> 8 SU008_Tumor_Pre_45 6 chr1 1239980 1240480 5.44 TRUE
#> 9 SU008_Tumor_Pre_45 4 chr1 1241917 1242417 3.96 TRUE
#> 10 SU008_Tumor_Pre_45 3 chr1 1280686 1281186 2.98 TRUE
#> # ℹ 60,026 more rows
#> # A tibble: 521,771 × 5
#> cell value chr from to
#> <chr> <int> <chr> <int> <int>
#> 1 SU008_Tumor_Pre_45 1 chr1 127538 128038
#> 2 SU008_Tumor_Pre_45 2 chr1 540701 541201
#> 3 SU008_Tumor_Pre_45 4 chr1 762643 763143
#> 4 SU008_Tumor_Pre_45 3 chr1 859974 860474
#> 5 SU008_Tumor_Pre_45 1 chr1 866643 867143
#> 6 SU008_Tumor_Pre_45 2 chr1 876975 877475
#> 7 SU008_Tumor_Pre_45 3 chr1 894505 895005
#> 8 SU008_Tumor_Pre_45 2 chr1 895705 896205
#> 9 SU008_Tumor_Pre_45 1 chr1 898583 899083
#> 10 SU008_Tumor_Pre_45 3 chr1 901529 902029
#> # ℹ 521,761 more rows