Skip to content

Commit 8964a2a

Browse files
committed
add demo content and walkthrough
1 parent e8cd947 commit 8964a2a

File tree

13 files changed

+508129
-0
lines changed

13 files changed

+508129
-0
lines changed

README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
- [Shell completion](#shell-completion)
1111
- [Usage](#usage)
1212
- [Calculation methods](#calculation-methods)
13+
- [Demo](#demo)
1314
- [Citation](#citation)
1415
- [License](#license)
1516

@@ -163,6 +164,60 @@ is in a genome with 2,000,000bp contig with no reads mapped, then the
163164
trimmed_mean will be 0 as all positions in the 2000bp are in the top 5% of
164165
positions sorted by coverage.
165166

167+
## Demo
168+
169+
Download a test dataset of 8 genomes and 1 sample of paired-end reads
170+
171+
```bash
172+
wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/sample_1.1.fq.gz
173+
wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/sample_1.2.fq.gz
174+
wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_1.fna
175+
wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_2.fna
176+
wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_3.fna
177+
wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_4.fna
178+
wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_5.fna
179+
wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_6.fna
180+
wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_7.fna
181+
wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_8.fna
182+
```
183+
184+
Run CoverM
185+
186+
```bash
187+
coverm genome \
188+
--coupled sample_1.1.fq.gz sample_1.2.fq.gz \
189+
--genome-fasta-files \
190+
genome_1.fna genome_2.fna genome_3.fna genome_4.fna \
191+
genome_5.fna genome_6.fna genome_7.fna genome_8.fna \
192+
-t 8 \
193+
-m mean relative_abundance covered_fraction \
194+
-o output_coverm.tsv
195+
```
196+
197+
This should have created the file `output_coverm.tsv` and logged the following message:
198+
`coverm::genome] In sample 'sample_1.1.fq.gz', found 48254 reads mapped out of 100000 total (48.25%)`.
199+
This indicates that 48.25% of the reads from our sample mapped to the genomes. So our genomes represent about half of the diversity in the sample.
200+
201+
Looking in `output_coverm.tsv`, we find columns with the following headings:
202+
203+
- `Genome`: The name of the genome
204+
- `sample_1.1.fq.gz Mean`: The mean read coverage from sample_1 across the given genome, i.e. the average height across the genome if reads aligned were stacked on top of each other.
205+
- `sample_1.1.fq.gz Relative Abundance (%)`: The relative abundance of the genome within sample_1. This metric accounts for differing genome sizes by using the proportion of mean coverage rather than the proportion of reads.
206+
- `sample_1.1.fq.gz Covered Fraction`: The proportion of the genome that is covered by at least one read.
207+
208+
Each row represents a genome, and the columns represent the coverage metrics calculated for that genome for each provided sample.
209+
For instance, the row for `genome_1` shows that the mean coverage of this genome is `0.941`, the relative abundance is `25.9`%, and the covered fraction is `0.528`.
210+
Again, the row for `genome_5` shows that the mean coverage of this genome is `0.0`, the relative abundance is `0.0`%, and the covered fraction is `0.0`.
211+
This indicates that `genome_1` is well represented in the sample, while `genome_5` is not present at all.
212+
There are 3 other genomes with varying coverage, and 3 other genomes with 0 coverage.
213+
214+
You may have noticed that the coverage fraction for most genomes is rather low. This is because the reads have been sub-sampled to 100,000 reads.
215+
The full sample has 76,618,686 reads and produces covered fractions of 1 for all present genomes. Notably, the relative abundances are very similar.
216+
The output from the full sample can be downloaded as follows: `wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/output_coverm_full.tsv`
217+
218+
There is an additional row named `unmapped` which represents the coverage metrics for the reads that did not map to any of the provided genomes.
219+
This is only applicable to the relative abundance metric (among those we selected), and we can see that 51% of the reads were unmapped.
220+
166221
## Citation
167222

168223
If you use CoverM in your research, please cite the following publication:

demo/genome_1.fna

Lines changed: 72967 additions & 0 deletions
Large diffs are not rendered by default.

demo/genome_2.fna

Lines changed: 60313 additions & 0 deletions
Large diffs are not rendered by default.

demo/genome_3.fna

Lines changed: 76544 additions & 0 deletions
Large diffs are not rendered by default.

demo/genome_4.fna

Lines changed: 51851 additions & 0 deletions
Large diffs are not rendered by default.

demo/genome_5.fna

Lines changed: 53947 additions & 0 deletions
Large diffs are not rendered by default.

demo/genome_6.fna

Lines changed: 72372 additions & 0 deletions
Large diffs are not rendered by default.

demo/genome_7.fna

Lines changed: 30474 additions & 0 deletions
Large diffs are not rendered by default.

demo/genome_8.fna

Lines changed: 89586 additions & 0 deletions
Large diffs are not rendered by default.

demo/output_coverm.tsv

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Genome sample_1.1.fq.gz Mean sample_1.1.fq.gz Relative Abundance (%) sample_1.1.fq.gz Covered Fraction
2+
unmapped NA 51.746 NA
3+
genome_1 0.9410575 25.87694 0.52770287
4+
genome_2 0.40274984 11.074703 0.27789244
5+
genome_3 0.20988818 5.7714467 0.15818907
6+
genome_4 0.20114066 5.5309105 0.1509256
7+
genome_5 0 0 0
8+
genome_6 0 0 0
9+
genome_7 0 0 0
10+
genome_8 0 0 0

0 commit comments

Comments
 (0)