You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: R/barcodeRanks.R
+62-75Lines changed: 62 additions & 75 deletions
Original file line number
Diff line number
Diff line change
@@ -5,18 +5,13 @@
5
5
#' @param m A numeric matrix-like object containing UMI counts, where columns represent barcoded droplets and rows represent genes.
6
6
#' Alternatively, a \linkS4class{SummarizedExperiment} containing such a matrix.
7
7
#' @param lower A numeric scalar specifying the lower bound on the total UMI count,
8
-
#' at or below which all barcodes are assumed to correspond to empty droplets.
9
-
#' @param fit.bounds A numeric vector of length 2, specifying the lower and upper bounds on the total UMI count
10
-
#' from which to obtain a section of the curve for spline fitting.
11
-
#' @param exclude.from An integer scalar specifying the number of highest ranking barcodes to exclude from spline fitting.
12
-
#' Ignored if \code{fit.bounds} is specified.
8
+
#' at or below which all barcodes are assumed to correspond to empty droplets and excluded from knee/inflection point identification.
9
+
#' @param exclude.from An integer scalar specifying the number of highest ranking barcodes to exclude from knee/inflection point identification.
10
+
#' @param fit.bounds,df Deprecated and ignored.
13
11
#' @param assay.type Integer or string specifying the assay containing the count matrix.
14
-
#' @param df Deprecated and ignored.
15
12
#' @param ... For the generic, further arguments to pass to individual methods.
16
13
#'
17
14
#' For the SummarizedExperiment method, further arguments to pass to the ANY method.
18
-
#'
19
-
#' For the ANY method, further arguments to pass to \code{\link{smooth.spline}}.
20
15
#' @param BPPARAM A \linkS4class{BiocParallelParam} object specifying how parallelization should be performed.
21
16
#'
22
17
#' @details
@@ -27,27 +22,17 @@
27
22
#' To help create this plot, the \code{barcodeRanks} function will compute these ranks for all barcodes in \code{m}.
28
23
#' Barcodes with the same total count receive the same average rank to avoid problems with discrete runs of the same total.
29
24
#'
30
-
#' The function will also identify the inflection and knee points on the curve for downstream use,
25
+
#' The function will also identify the inflection and knee points on the curve for downstream use.
31
26
#' Both of these points correspond to a sharp transition between two components of the total count distribution,
32
27
#' presumably reflecting the difference between empty droplets with little RNA and cell-containing droplets with much more RNA.
33
28
#' \itemize{
34
-
#' \item The inflection point is computed as the point on the rank/total curve where the first derivative is minimized.
35
-
#' The derivative is computed directly from all points on the curve with total counts greater than \code{lower}.
36
-
#' This avoids issues with erratic behaviour of the curve at lower totals.
37
-
#' \item The knee point is defined as the point on the curve that is furthest from the straight line drawn between the \code{fit.bounds} locations on the curve.
38
-
#' We used to minimize the signed curvature to identify the knee point but this relies on the second derivative,
39
-
#' which was too unstable even after smoothing.
29
+
#' \item The inflection point is defined as the point on the log-rank/log-total curve where the first derivative is minimized.
30
+
#' If multiple inflection points are present, we choose the point that immediately follows the knee point.
31
+
#' \item To find the knee point, we draw a diagonal line that passes through the inflection point in the log-rank/log-total curve.
32
+
#' The knee point is defined as the location on the curve that is above and most distant from this line.
40
33
#' }
41
-
#'
42
-
#' If \code{fit.bounds} is not specified, the lower bound is automatically set to the inflection point
43
-
#' as this should lie below the knee point on typical curves.
44
-
#' The upper bound is set to the point at which the first derivative is closest to zero,
45
-
#' i.e., the \dQuote{plateau} region before the knee point.
46
-
#' The first \code{exclude.from} barcodes with the highest totals are ignored in this process
47
-
#' to avoid spuriously large numerical derivatives from unstable parts of the curve with low point density.
48
-
#'
49
-
#' Note that only points with total counts above \code{lower} will be considered for curve fitting,
50
-
#' regardless of how \code{fit.bounds} is defined.
34
+
#' Only points with total counts above \code{lower} will be considered for knee/inflection point identification.
35
+
#' Similarly, the first \code{exclude.from} points will be ignored to avoid instability at the start of the curve.
51
36
#'
52
37
#' @return
53
38
#' A \linkS4class{DataFrame} where each row corresponds to a column of \code{m}, and containing the following fields:
0 commit comments