Proposal: Stop using `AnnData.raw`

I would like to see data distributed with cellxgene stop using `.raw.X` for counts and to instead put this matrix in `adata.layers["counts"]`. I would ideally like to get this schema change in for `6.0`.

For some background, `raw` was initially put in anndata so the user could have a "normalized" and "raw" copy of the matrix. The workflow back in the day also assumed you may only be interested in normalized values for a subset of selected features – largely due to memory constraints and the use of densifying normalization methods. However, people generally need all of their features normalized for downstream methods that are used across all features (e.g. differential expression/ plotting/ enrichment) and densifying normalization methods are less popular now. In addition, Anndata has since added the `.layers` attribute which allows storing multiple matrices.
 
Within `scverse`, we would like to eventually get rid of the `.raw` entry as a whole. It's confusing to users, has difficult semantics (e.g. it's assumed to be read-only, but we can't actually enforce that), and is easily replaced by existing functionality/ just using a separate object. In addition, we have stopped developing features for `.raw` (including improved out-of-core compute support) a while ago and will not be adding more features.

Because cellxgene stores a matrix in `.raw.X` with the same shape as `.X`, I don't see any barrier to moving this over. What is gained is better support within scanpy api, out-of-core support, and better usability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Stop using `AnnData.raw` #1304

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Stop using AnnData.raw #1304

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Proposal: Stop using `AnnData.raw` #1304