Fix: Support for --aligner cellrangerarc#441
Conversation
|
Hi @matbonfanti, thanks for working on this! |
|
Hi @grst, that was indeed the plan! I have seen that on the test-dataset repo there is already a dataset for atac-seq (https://github.com/nf-core/test-datasets/tree/modules/data/genomics/homo_sapiens/10xgenomics/cellranger-atac) which I think would be a good test, for starters. In the long term I could make a new test using multiome data (atac+gex) that would be probably more appropriate, but it will definitely take much more time to implement. If you agree, I will start including the atac-only test in this PR, so that atac alignment will be fixed soon in the dev branch. Then maybe I can make a new PR for the other test dataset. |
…crnaseq into fix_cellrangerarc_input_ch
|
Hi @matbonfanti, is this ready, i.e. did you add the ATAC testcase? |
|
hi, I was planning to use a ATAC dataset as test case, but unfortunately It turned out that Cellranger-arc needs boh modalities, ATAC and gex... I need to create a new test dataset subsampling a 10x multiome dataset, I was planning to start today at the hackathon. |
|
Bottom line: no, It Is not ready, the test Is still missing |
|
ok, no worries |
|
hi, I have created the dataset for cellranger-arc, I have run the pipeline with it and it is ready to be added to the dataset test repository (nf-core/test-datasets#1562). |
|
@grst I have added the test file and added the test to the CI, as you can see it worked. I think the code is ready for review. I am still missing the changelog update. I will do It ASAP. |
| def cellrangerarcStructure(input) { | ||
| def (metas, fastqs) = input[1..2] | ||
|
|
||
| // Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end | ||
| def endedness_ok = metas.collect{ meta -> meta.single_end }.unique().size == 1 | ||
| if (!endedness_ok) { | ||
| error("Please check input samplesheet -> Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end: ${metas[0].id}") | ||
| } | ||
|
|
||
| // Validate that the property "sample_type" is present and has valid values | ||
| def valid_sample_types = ["gex", "atac"] | ||
| def sample_type_ok = metas.collect { meta -> meta.sample_type }.unique().every { it in valid_sample_types } | ||
| if (!sample_type_ok) { | ||
| error("Please check input samplesheet -> The property 'sample_type' is required and can only be 'gex' or 'atac'.") | ||
| } | ||
|
|
||
| // Define a new common meta for all the fastqs in this channel instance | ||
| def sampleMeta = metas[0].clone() | ||
| sampleMeta.remove("sample_type") | ||
| sampleMeta.remove("feature_type") | ||
|
|
||
| // Create a list with all the entries of meta.sample_type | ||
| def sampletypes = metas.collect { meta -> meta.sample_type } | ||
|
|
||
| // Create a list with all the base name of the fastq files | ||
| def subsamples = fastqs.collect { fastq -> | ||
| def match = (fastq[0].baseName =~ /^(.*?)_S\d+_L\d+_R\d+_\d+\.fastq(\.gz)?$/) | ||
| if (!match) { | ||
| error("Filename does not follow the expected FASTQ filename convention (SampleName_S1_L001_R1_001.fastq.gz): ${fastq[0]}") | ||
| } | ||
| return match[0][1] | ||
| } | ||
|
|
||
| return [ sampleMeta, sampletypes, subsamples, fastqs.flatten() ] |
There was a problem hiding this comment.
Ideally, this could go to the json-schema, but it currently can't because we only have one aligner-agnostic schema.
Created #461 to follow up as this is beyond the scope of this PR.
There was a problem hiding this comment.
I agree, that would be much cleaner... If you need help writing and testing the schema for cellranger-arc, I would be happy to contribute!
There was a problem hiding this comment.
If you want to give #461 a shot, that would be fantastic. I don't think I'd have time soonish.
This PR introduces two changes to ensure the pipeline functions correctly when using
--aligner cellrangerarc:cellrangerarcmodule.cellrangerarcmodule output for further processing in the pipeline.See issues #389 and #374
Testing
I have tested these changes locally subsampling FASTQ files from 10x Genomics, and the pipeline runs successfully. For reference, here is the samplesheet used in testing:
PR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).