Skip to content

Commit 83e312f

Browse files
committed
Add samplesheet output with bams
1 parent 346e42a commit 83e312f

File tree

5 files changed

+80
-3
lines changed

5 files changed

+80
-3
lines changed

docs/output.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -868,6 +868,7 @@ A number of genome-specific files are generated by the pipeline because they are
868868
- Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline.
869869
- Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`.
870870
- Parameters used by the pipeline run: `params.json`.
871+
- `samplesheet_with_bams.csv`: **Auto-generated complete samplesheet** containing all samples with BAM file paths. For samples processed from FASTQ, includes paths to newly generated BAMs; for samples that were BAM input, preserves the original input paths. This comprehensive samplesheet can be used directly for future pipeline runs, enabling efficient reprocessing without re-alignment.
871872

872873
</details>
873874

docs/usage.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,7 @@ SAMPLE2,sample2_R1.fastq.gz,sample2_R2.fastq.gz,forward,,
129129
- When using BAM input, you can leave the FASTQ columns empty or omit them
130130
- Mixed samplesheets (some samples with FASTQ, others with BAM) are supported
131131
- For BAM file locations from pipeline outputs, see the [output documentation](https://nf-co.re/rnaseq/output)
132+
- **Automated samplesheet generation**: The pipeline automatically generates a `samplesheet_with_bams.csv` file in the `pipeline_info/` directory containing all samples with their BAM file paths. For FASTQ-derived samples, this includes paths to newly generated BAMs; for BAM input samples, it preserves the original input paths. This complete samplesheet can be used directly for future pipeline runs
132133

133134
## FASTQ sampling
134135

subworkflows/local/utils_nfcore_rnaseq_pipeline/main.nf

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -177,15 +177,24 @@ def checkSamplesAfterGrouping(input) {
177177
def genome_bam = genome_bams?.find { it != null }
178178
def transcriptome_bam = transcriptome_bams?.find { it != null }
179179

180-
// Add BAM flags to meta
180+
// Add BAM flags and original paths to meta
181181
def meta_with_bams = metas[0] + [
182182
has_genome_bam: genome_bam ? true : false,
183-
has_transcriptome_bam: transcriptome_bam ? true : false
183+
has_transcriptome_bam: transcriptome_bam ? true : false,
184+
original_genome_bam: genome_bam ?: null,
185+
original_transcriptome_bam: transcriptome_bam ?: null
184186
]
185187

186188
return [ meta_with_bams, fastqs, genome_bam, transcriptome_bam ]
187189
} else {
188-
return [ metas[0], fastqs ]
190+
// Add null BAM fields to meta for consistency
191+
def meta_no_bams = metas[0] + [
192+
has_genome_bam: false,
193+
has_transcriptome_bam: false,
194+
original_genome_bam: null,
195+
original_transcriptome_bam: null
196+
]
197+
return [ meta_no_bams, fastqs ]
189198
}
190199
}
191200

@@ -635,6 +644,36 @@ def getInferexperimentStrandedness(inferexperiment_file, stranded_threshold = 0.
635644
return calculateStrandedness(forwardFragments, reverseFragments, unstrandedFragments, stranded_threshold, unstranded_threshold)
636645
}
637646

647+
//
648+
// Function to map work directory BAM paths to published paths
649+
//
650+
def mapBamToPublishedPath(bam_path, sample_id, aligner, outdir) {
651+
if (!bam_path) return ''
652+
653+
def filename = file(bam_path).getName()
654+
def base_dir = "${outdir}/${aligner}"
655+
656+
// Map based on aligner type and filename patterns
657+
if (aligner == 'star_salmon') {
658+
if (filename.contains('Aligned.out.bam')) {
659+
return "${base_dir}/${sample_id}.Aligned.out.bam"
660+
} else if (filename.contains('toTranscriptome')) {
661+
return "${base_dir}/${sample_id}.Aligned.toTranscriptome.out.bam"
662+
}
663+
} else if (aligner == 'star_rsem') {
664+
if (filename.contains('genome.bam')) {
665+
return "${base_dir}/${sample_id}.STAR.genome.bam"
666+
} else if (filename.contains('transcript.bam')) {
667+
return "${base_dir}/${sample_id}.transcript.bam"
668+
}
669+
} else if (aligner == 'hisat2') {
670+
return "${base_dir}/${sample_id}.bam"
671+
}
672+
673+
// Fallback to original filename
674+
return "${base_dir}/${filename}"
675+
}
676+
638677
//
639678
// Print pipeline summary on completion
640679
//

tower.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,3 +53,5 @@ reports:
5353
display: "All samples STAR RSEM merged transcript raw counts"
5454
"**/star_rsem/rsem.merged.transcript_tpm.tsv":
5555
display: "All samples STAR RSEM merged transcript TPM counts"
56+
"**/pipeline_info/samplesheet_with_bams.csv":
57+
display: "Samplesheet with BAM paths for reanalysis"

workflows/rnaseq/main.nf

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ include { getStarPercentMapped } from '../../subworkflows/local/utils_
2626
include { biotypeInGtf } from '../../subworkflows/local/utils_nfcore_rnaseq_pipeline'
2727
include { getInferexperimentStrandedness } from '../../subworkflows/local/utils_nfcore_rnaseq_pipeline'
2828
include { methodsDescriptionText } from '../../subworkflows/local/utils_nfcore_rnaseq_pipeline'
29+
include { mapBamToPublishedPath } from '../../subworkflows/local/utils_nfcore_rnaseq_pipeline'
2930

3031
/*
3132
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -770,6 +771,39 @@ workflow RNASEQ {
770771
ch_multiqc_report = MULTIQC.out.report
771772
}
772773

774+
//
775+
// Generate samplesheet with BAM paths for future runs
776+
//
777+
if (!params.skip_alignment) {
778+
// Create channel with original input info and BAM paths
779+
ch_fastq
780+
.join(ch_genome_bam, by: 0, remainder: true)
781+
.join(ch_transcriptome_bam, by: 0, remainder: true)
782+
.map { meta, reads, genome_bam, transcriptome_bam ->
783+
// Extract FASTQ paths
784+
def fastq_1 = reads && reads.size() > 0 ? reads[0] : ''
785+
def fastq_2 = reads && reads.size() > 1 ? reads[1] : ''
786+
787+
// Handle BAM paths - use original input paths for BAM input samples, published paths for FASTQ-derived samples
788+
def genome_bam_published = meta.has_genome_bam ?
789+
(meta.original_genome_bam ?: '') :
790+
mapBamToPublishedPath(genome_bam, meta.id, params.aligner, params.outdir)
791+
792+
def transcriptome_bam_published = meta.has_transcriptome_bam ?
793+
(meta.original_transcriptome_bam ?: '') :
794+
mapBamToPublishedPath(transcriptome_bam, meta.id, params.aligner, params.outdir)
795+
796+
// Return CSV line
797+
return "${meta.id},${fastq_1},${fastq_2},${meta.strandedness},${genome_bam_published},${transcriptome_bam_published}"
798+
}
799+
.collectFile(
800+
name: 'samplesheet_with_bams.csv',
801+
storeDir: "${params.outdir}/pipeline_info",
802+
newLine: true,
803+
seed: 'sample,fastq_1,fastq_2,strandedness,genome_bam,transcriptome_bam'
804+
)
805+
}
806+
773807
emit:
774808
trim_status = ch_trim_status // channel: [id, boolean]
775809
map_status = ch_map_status // channel: [id, boolean]

0 commit comments

Comments
 (0)