|
10 | 10 | - [Shell completion](#shell-completion) |
11 | 11 | - [Usage](#usage) |
12 | 12 | - [Calculation methods](#calculation-methods) |
| 13 | + - [Demo](#demo) |
13 | 14 | - [Citation](#citation) |
14 | 15 | - [License](#license) |
15 | 16 |
|
@@ -163,6 +164,60 @@ is in a genome with 2,000,000bp contig with no reads mapped, then the |
163 | 164 | trimmed_mean will be 0 as all positions in the 2000bp are in the top 5% of |
164 | 165 | positions sorted by coverage. |
165 | 166 |
|
| 167 | +## Demo |
| 168 | + |
| 169 | +Download a test dataset of 8 genomes and 1 sample of paired-end reads |
| 170 | + |
| 171 | +```bash |
| 172 | +wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/sample_1.1.fq.gz |
| 173 | +wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/sample_1.2.fq.gz |
| 174 | +wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_1.fna |
| 175 | +wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_2.fna |
| 176 | +wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_3.fna |
| 177 | +wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_4.fna |
| 178 | +wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_5.fna |
| 179 | +wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_6.fna |
| 180 | +wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_7.fna |
| 181 | +wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/genome_8.fna |
| 182 | +``` |
| 183 | + |
| 184 | +Run CoverM |
| 185 | + |
| 186 | +```bash |
| 187 | +coverm genome \ |
| 188 | + --coupled sample_1.1.fq.gz sample_1.2.fq.gz \ |
| 189 | + --genome-fasta-files \ |
| 190 | + genome_1.fna genome_2.fna genome_3.fna genome_4.fna \ |
| 191 | + genome_5.fna genome_6.fna genome_7.fna genome_8.fna \ |
| 192 | + -t 8 \ |
| 193 | + -m mean relative_abundance covered_fraction \ |
| 194 | + -o output_coverm.tsv |
| 195 | +``` |
| 196 | + |
| 197 | +This should have created the file `output_coverm.tsv` and logged the following message: |
| 198 | +`coverm::genome] In sample 'sample_1.1.fq.gz', found 48254 reads mapped out of 100000 total (48.25%)`. |
| 199 | +This indicates that 48.25% of the reads from our sample mapped to the genomes. So our genomes represent about half of the diversity in the sample. |
| 200 | + |
| 201 | +Looking in `output_coverm.tsv`, we find columns with the following headings: |
| 202 | + |
| 203 | +- `Genome`: The name of the genome |
| 204 | +- `sample_1.1.fq.gz Mean`: The mean read coverage from sample_1 across the given genome, i.e. the average height across the genome if reads aligned were stacked on top of each other. |
| 205 | +- `sample_1.1.fq.gz Relative Abundance (%)`: The relative abundance of the genome within sample_1. This metric accounts for differing genome sizes by using the proportion of mean coverage rather than the proportion of reads. |
| 206 | +- `sample_1.1.fq.gz Covered Fraction`: The proportion of the genome that is covered by at least one read. |
| 207 | + |
| 208 | +Each row represents a genome, and the columns represent the coverage metrics calculated for that genome for each provided sample. |
| 209 | +For instance, the row for `genome_1` shows that the mean coverage of this genome is `0.941`, the relative abundance is `25.9`%, and the covered fraction is `0.528`. |
| 210 | +Again, the row for `genome_5` shows that the mean coverage of this genome is `0.0`, the relative abundance is `0.0`%, and the covered fraction is `0.0`. |
| 211 | +This indicates that `genome_1` is well represented in the sample, while `genome_5` is not present at all. |
| 212 | +There are 3 other genomes with varying coverage, and 3 other genomes with 0 coverage. |
| 213 | + |
| 214 | +You may have noticed that the coverage fraction for most genomes is rather low. This is because the reads have been sub-sampled to 100,000 reads. |
| 215 | +The full sample has 76,618,686 reads and produces covered fractions of 1 for all present genomes. Notably, the relative abundances are very similar. |
| 216 | +The output from the full sample can be downloaded as follows: `wget https://raw.githubusercontent.com/wwood/CoverM/refs/heads/main/demo/output_coverm_full.tsv` |
| 217 | + |
| 218 | +There is an additional row named `unmapped` which represents the coverage metrics for the reads that did not map to any of the provided genomes. |
| 219 | +This is only applicable to the relative abundance metric (among those we selected), and we can see that 51% of the reads were unmapped. |
| 220 | + |
166 | 221 | ## Citation |
167 | 222 |
|
168 | 223 | If you use CoverM in your research, please cite the following publication: |
|
0 commit comments