Fix: Support for `--aligner cellrangerarc` by matbonfanti · Pull Request #441 · nf-core/scrnaseq

matbonfanti · 2025-03-05T15:04:39Z

This PR introduces two changes to ensure the pipeline functions correctly when using --aligner cellrangerarc:

A dedicated parsing section structures the samplesheet channel to be compatible with the cellrangerarc module.
The raw and filtered gene expression matrices are now extracted from the cellrangerarc module output for further processing in the pipeline.

See issues #389 and #374

Testing

I have tested these changes locally subsampling FASTQ files from 10x Genomics, and the pipeline runs successfully. For reference, here is the samplesheet used in testing:

sample,fastq_1,fastq_2,fastq_barcode,sample_type
10k_PBMC,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L001_R1_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L001_R2_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L001_R3_001.fastq.gz,atac
10k_PBMC,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L002_R1_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L002_R2_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L002_R3_001.fastq.gz,atac
10k_PBMC,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_gex_S2_L001_R1_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_gex_S2_L001_R2_001.fastq.gz,,gex
10k_PBMC,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_gex_S2_L002_R1_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_gex_S2_L002_R2_001.fastq.gz,,gex

PR checklist

grst · 2025-03-06T10:36:31Z

Hi @matbonfanti,

thanks for working on this!
Would you by any chance also have time to implement a test-case for cellrangerarc? It is currently not covered by CI at all which is one of the reasons the bug you are fixing exists in the first case. See also #290.

matbonfanti · 2025-03-06T10:51:46Z

Hi @grst, that was indeed the plan!

I have seen that on the test-dataset repo there is already a dataset for atac-seq (https://github.com/nf-core/test-datasets/tree/modules/data/genomics/homo_sapiens/10xgenomics/cellranger-atac) which I think would be a good test, for starters. In the long term I could make a new test using multiome data (atac+gex) that would be probably more appropriate, but it will definitely take much more time to implement.

If you agree, I will start including the atac-only test in this PR, so that atac alignment will be fixed soon in the dev branch. Then maybe I can make a new PR for the other test dataset.

…crnaseq into fix_cellrangerarc_input_ch

grst · 2025-03-25T07:06:38Z

Hi @matbonfanti, is this ready, i.e. did you add the ATAC testcase?

matbonfanti · 2025-03-25T07:29:15Z

hi, I was planning to use a ATAC dataset as test case, but unfortunately It turned out that Cellranger-arc needs boh modalities, ATAC and gex... I need to create a new test dataset subsampling a 10x multiome dataset, I was planning to start today at the hackathon.

matbonfanti · 2025-03-25T07:30:05Z

Bottom line: no, It Is not ready, the test Is still missing

grst · 2025-03-25T07:31:43Z

ok, no worries

matbonfanti · 2025-03-26T11:50:46Z

hi,

I have created the dataset for cellranger-arc, I have run the pipeline with it and it is ready to be added to the dataset test repository (nf-core/test-datasets#1562).
Now waiting for the review :-)

matbonfanti · 2025-03-26T16:28:40Z

@grst I have added the test file and added the test to the CI, as you can see it worked.

I think the code is ready for review. I am still missing the changelog update. I will do It ASAP.

grst

LGTM, thank you!

grst · 2025-03-31T09:19:08Z

subworkflows/local/utils_nfcore_scrnaseq_pipeline/main.nf

+def cellrangerarcStructure(input) {
+    def (metas, fastqs) = input[1..2]
+
+    // Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end
+    def endedness_ok = metas.collect{ meta -> meta.single_end }.unique().size == 1
+    if (!endedness_ok) {
+        error("Please check input samplesheet -> Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end: ${metas[0].id}")
+    }
+
+    // Validate that the property "sample_type" is present and has valid values
+    def valid_sample_types = ["gex", "atac"]
+    def sample_type_ok = metas.collect { meta -> meta.sample_type }.unique().every { it in valid_sample_types }
+    if (!sample_type_ok) {
+        error("Please check input samplesheet -> The property 'sample_type' is required and can only be 'gex' or 'atac'.")
+    }
+
+    // Define a new common meta for all the fastqs in this channel instance
+    def sampleMeta = metas[0].clone()
+    sampleMeta.remove("sample_type")
+    sampleMeta.remove("feature_type")
+
+    // Create a list with all the entries of meta.sample_type
+    def sampletypes = metas.collect { meta -> meta.sample_type }
+
+    // Create a list with all the base name of the fastq files
+    def subsamples = fastqs.collect { fastq ->
+        def match = (fastq[0].baseName =~ /^(.*?)_S\d+_L\d+_R\d+_\d+\.fastq(\.gz)?$/)
+        if (!match) {
+            error("Filename does not follow the expected FASTQ filename convention (SampleName_S1_L001_R1_001.fastq.gz): ${fastq[0]}")
+        }
+        return match[0][1]
+    }
+
+    return [ sampleMeta, sampletypes, subsamples, fastqs.flatten() ]


Ideally, this could go to the json-schema, but it currently can't because we only have one aligner-agnostic schema.
Created #461 to follow up as this is beyond the scope of this PR.

I agree, that would be much cleaner... If you need help writing and testing the schema for cellranger-arc, I would be happy to contribute!

If you want to give #461 a shot, that would be fantastic. I don't think I'd have time soonish.

matbonfanti added 3 commits March 4, 2025 17:28

modified samplesheet parsing for cellrangeratac

8be8cd6

process fastq_barcode from input samplesheet

6edee1c

get raw and filtered matrices from cellrangerarc output

4de1e0d

matbonfanti self-assigned this Mar 5, 2025

matbonfanti and others added 4 commits March 5, 2025 16:36

fastq_barcode read through meta

ec5cc94

use file only when meta.fastq_barcode is defined

156991c

Merge branch 'dev' into fix_cellrangerarc_input_ch

801aaf1

Merge branch 'dev' into fix_cellrangerarc_input_ch

c2cc11e

matbonfanti added 2 commits March 6, 2025 18:00

draft of the cellrangerarc test added

980f5c6

Merge branch 'fix_cellrangerarc_input_ch' of github.com:matbonfanti/s…

e65bd57

…crnaseq into fix_cellrangerarc_input_ch

apeltzer added this to the 4.1.0 milestone Mar 10, 2025

Merge branch 'dev' into fix_cellrangerarc_input_ch

17cc26b

matbonfanti added this to Hackathon March 2025 Mar 21, 2025

github-project-automation bot moved this to To do in Hackathon March 2025 Mar 21, 2025

matbonfanti mentioned this pull request Mar 26, 2025

add test datatest for cellranger-arc nf-core/test-datasets#1562

Merged

matbonfanti and others added 2 commits March 26, 2025 16:25

added cellrangerarc test

6983615

Merge branch 'dev' into fix_cellrangerarc_input_ch

b2df8c4

matbonfanti requested a review from grst March 26, 2025 16:29

matbonfanti moved this from To do to Ready for review in Hackathon March 2025 Mar 26, 2025

matbonfanti added 2 commits March 26, 2025 21:47

Update CHANGELOG.md

fb3ae40

Merge branch 'dev' into fix_cellrangerarc_input_ch

0f6bcad

grst approved these changes Mar 31, 2025

View reviewed changes

github-project-automation bot moved this from Ready for review to In progress in Hackathon March 2025 Mar 31, 2025

grst merged commit 43adb18 into nf-core:dev Mar 31, 2025
16 checks passed

github-project-automation bot moved this from In progress to Done in Hackathon March 2025 Mar 31, 2025

matbonfanti mentioned this pull request Mar 31, 2025

cellrangerarc test failing - null path? #374

Closed

matbonfanti mentioned this pull request Apr 24, 2025

Error running cellrangerarc workflow with 10x Multiome data #463

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Support for `--aligner cellrangerarc`#441

Fix: Support for `--aligner cellrangerarc`#441
grst merged 14 commits intonf-core:devfrom
matbonfanti:fix_cellrangerarc_input_ch

matbonfanti commented Mar 5, 2025 •

edited

Loading

Uh oh!

grst commented Mar 6, 2025

Uh oh!

matbonfanti commented Mar 6, 2025

Uh oh!

grst commented Mar 25, 2025

Uh oh!

matbonfanti commented Mar 25, 2025

Uh oh!

matbonfanti commented Mar 25, 2025

Uh oh!

grst commented Mar 25, 2025

Uh oh!

matbonfanti commented Mar 26, 2025

Uh oh!

matbonfanti commented Mar 26, 2025

Uh oh!

grst left a comment

Uh oh!

grst Mar 31, 2025

Uh oh!

matbonfanti Mar 31, 2025

Uh oh!

grst Mar 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

matbonfanti commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

PR checklist

Uh oh!

grst commented Mar 6, 2025

Uh oh!

matbonfanti commented Mar 6, 2025

Uh oh!

grst commented Mar 25, 2025

Uh oh!

matbonfanti commented Mar 25, 2025

Uh oh!

matbonfanti commented Mar 25, 2025

Uh oh!

grst commented Mar 25, 2025

Uh oh!

matbonfanti commented Mar 26, 2025

Uh oh!

matbonfanti commented Mar 26, 2025

Uh oh!

grst left a comment

Choose a reason for hiding this comment

Uh oh!

grst Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

matbonfanti Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

grst Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

matbonfanti commented Mar 5, 2025 •

edited

Loading