|
75 | 75 | #' The latter is a \dQuote{no frills} version that is largely intended for use within other functions.
|
76 | 76 | #'
|
77 | 77 | #' @section Handling overdispersion:
|
| 78 | +#' By default, \code{alpha=Inf} which means that the count vector is modelled with a multinomial distribution. |
| 79 | +#' This is appropriate when molecules are independently sampled into each droplet. |
| 80 | +#' |
78 | 81 | #' If \code{alpha} is set to a positive number, sampling is assumed to follow a Dirichlet-multinomial (DM) distribution.
|
79 | 82 | #' The parameter vector of the DM distribution is defined as the estimated ambient profile scaled by \code{alpha}.
|
80 |
| -#' Smaller values of \code{alpha} model overdispersion in the counts, due to dependencies in sampling between molecules. |
| 83 | +#' Smaller values of \code{alpha} model overdispersion in the counts, due to dependencies in sampling between molecules (e.g., aggregates, PCR duplication). |
81 | 84 | #' If \code{alpha=NULL}, a maximum likelihood estimate is obtained from the count profiles for all barcodes with totals less than or equal to \code{lower}.
|
82 |
| -#' If \code{alpha=Inf}, the sampling of molecules is modelled with a multinomial distribution. |
83 | 85 | #'
|
84 | 86 | #' Users can check whether the model is suitable by extracting the p-values for all barcodes with \code{test.ambient=TRUE}.
|
85 | 87 | #' Under the null hypothesis, the p-values for presumed ambient barcodes (i.e., with total counts less than or equal to \code{lower}) should be uniformly distributed.
|
86 |
| -#' Skews in the p-value distribution are indicative of an inaccuracy in the model and/or its estimates (of \code{alpha} or the ambient profile). |
87 |
| -#' |
| 88 | +#' Skews in the p-value distribution are indicative of an inaccuracy in the model. |
| 89 | +#' For example, an inaccurate \code{alpha} or ambient profile will manifest in the overenrichment of low p-values. |
| 90 | +#' Conversely, very sparse data will often exhibit in a enrichment of p-values at 1 as the Good-Turing probabilities in the ambient profile cannot be zero. |
| 91 | +#' |
88 | 92 | #' @section \code{NA} values in the results:
|
89 | 93 | #' We assume that barcodes with total UMI counts less than or equal to \code{lower} correspond to empty droplets.
|
90 | 94 | #' These are used to estimate the ambient expression profile against which the remaining barcodes are tested.
|
|
181 | 185 |
|
182 | 186 | #' @export
|
183 | 187 | #' @rdname emptyDrops
|
184 |
| -testEmptyDrops <- function(m, lower=100, niters=10000, test.ambient=FALSE, ignore=NULL, alpha=NULL, round=TRUE, by.rank=NULL, known.empty=NULL, BPPARAM=SerialParam()) { |
| 188 | +testEmptyDrops <- function(m, lower=100, niters=10000, test.ambient=FALSE, ignore=NULL, alpha=Inf, round=TRUE, by.rank=NULL, known.empty=NULL, BPPARAM=SerialParam()) { |
185 | 189 | ambfun <- function(mat, totals) {
|
186 | 190 | assumed.empty <- .get_putative_empty(totals, lower, by.rank, known.empty)
|
187 | 191 | astats <- .compute_ambient_stats(mat, totals, assumed.empty)
|
|
0 commit comments