Commands

Below are the commands available in outerspace:

`outerspace findseq`

Extracts sequences from various file formats based on configuration patterns. Features:

Support for FASTQ, FASTA, SAM, and BAM files
Single file or paired-end processing
Global pattern configuration system
Auto-detection of file formats
Multi-threaded processing support
Progress tracking and error handling

usage: outerspace findseq [-h] [-1 READ1_FILENAME] [-2 READ2_FILENAME] 
                          [-o OUTPUT_FILENAME] [--region REGION] 
                          [--fetch {mapped,unmapped,all}] [--long-format] 
                          [--matches-only] [--threads THREADS] [--skip-unmapped] 
                          [--max-reads MAX_READS] [--config CONFIG] 
                          [--progress-bar] [--log-file LOG_FILE] [--log-level LOG_LEVEL]

Extract sequences from files based on configuration patterns

options:
  -h, --help            show this help message and exit
  -1 READ1_FILENAME, --read1_filename READ1_FILENAME
                        Input file for read 1 (FASTQ, FASTA, SAM, BAM) or single read file
  -2 READ2_FILENAME, --read2_filename READ2_FILENAME
                        Input file for read 2 (FASTQ, FASTA, SAM, BAM) for paired reads
  -o OUTPUT_FILENAME, --output_filename OUTPUT_FILENAME
                        Output CSV file name
  --region REGION       SAM/BAM region specification (e.g., "chr1:1-1000")
  --fetch {mapped,unmapped,all}
                        SAM/BAM fetch mode (mapped, unmapped, all)
  --long-format         Output in long format (one row per pattern match instead of one row per read)
  --matches-only        Only output reads that have at least one pattern match
  --threads THREADS     Number of threads for parallel processing (default: auto-detect)
  --skip-unmapped       Skip unmapped reads in SAM/BAM files (based on SAM flag 0x4)
  --max-reads MAX_READS
                        Maximum number of reads to process (default: all)

Common Arguments:
  --config CONFIG, -c CONFIG
                        Configuration file
  --progress-bar, -p    Enable progress bar
  --log-file LOG_FILE   Log file
  --log-level LOG_LEVEL
                        Log level (default: WARNING)

`outerspace collapse`

Corrects barcodes in CSV files using UMI-tools clustering or nearest-neighbor matching. Features:

Supports multiple barcode columns
Configurable mismatch tolerance
Multiple clustering methods (cluster, adjacency, directional) for UMI correction
Nearest-neighbor matching for key correction with allowed lists
Exact matching with allowed lists
Iterative collapse steps via TOML configuration
Row limiting for testing
Detailed metrics output

usage: outerspace collapse [-h] (--input-file INPUT_FILE | --input-dir INPUT_DIR)
                           (--output-file OUTPUT_FILE | --output-dir OUTPUT_DIR)
                           [--columns COLUMNS] [--mismatches MISMATCHES] [--sep SEP]
                           [--row-limit ROW_LIMIT] 
                           [--method {cluster,adjacency,directional,allowed,nearest}]
                           [--allowed-list ALLOWED_LIST]
                           [--min-score MIN_SCORE] [--match-score MATCH_SCORE]
                           [--mismatch-penalty MISMATCH_PENALTY] [--gap-penalty GAP_PENALTY]
                           [--rescue-kmer-size RESCUE_KMER_SIZE]
                           [--rescue-min-overlap RESCUE_MIN_OVERLAP]
                           [--rescue-exhaustive] [--rescue-strategy {random,first,last,all}]
                           [--metrics METRICS] [--config CONFIG] [--progress-bar]
                           [--log-file LOG_FILE] [--log-level LOG_LEVEL]

Correct barcodes in CSV files using UMI-tools clustering or nearest-neighbor matching

options:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        Input CSV file to process
  --input-dir INPUT_DIR
                        Input directory containing CSV files to process
  --output-file OUTPUT_FILE
                        Output CSV file for corrected barcodes
  --output-dir OUTPUT_DIR
                        Output directory for corrected CSV files
  --columns COLUMNS     Column(s) containing barcodes to correct. Can be a single column or comma-separated list
  --mismatches MISMATCHES
                        Number of mismatches allowed for clustering (default: 2)
  --sep SEP             CSV separator (default: ,)
  --row-limit ROW_LIMIT
                        Process only the first N rows (for testing)
  --method {cluster,adjacency,directional,allowed,nearest}
                        Correction method: cluster/adjacency/directional for UMI clustering,
                        allowed for exact matching with allowed list, nearest for nearest-neighbor
                        matching with allowed list (default: directional)
  --allowed-list ALLOWED_LIST
                        Text file containing allowed values (required for 'allowed' and 'nearest' methods)
  --min-score MIN_SCORE
                        Minimum alignment score for nearest-neighbor rescue (default: 0)
  --match-score MATCH_SCORE
                        Score for character matches in alignment (default: 1)
  --mismatch-penalty MISMATCH_PENALTY
                        Penalty for mismatches in alignment (default: -1)
  --gap-penalty GAP_PENALTY
                        Penalty for gaps/indels in alignment (default: -3)
  --rescue-kmer-size RESCUE_KMER_SIZE
                        K-mer size for prescreening candidates (default: 3)
  --rescue-min-overlap RESCUE_MIN_OVERLAP
                        Minimum k-mer overlap to consider a candidate (default: 1)
  --rescue-exhaustive   Disable k-mer prescreening (slower but guaranteed optimal)
  --rescue-strategy {random,first,last,all}
                        Strategy for choosing among tied rescued values (default: random)
  --metrics METRICS     Output YAML file for metrics

Iterative Steps (TOML Config Only):
  Define multi-step corrections using [[collapse.steps]] in your TOML config file.
  See docs/collapse_steps.md for details.

Common Arguments:
  --config CONFIG, -c CONFIG
                        Configuration file
  --progress-bar, -p    Enable progress bar
  --log-file LOG_FILE   Log file
  --log-level LOG_LEVEL
                        Log level (default: WARNING)

`outerspace count`

Counts unique barcodes per key value in CSV files. Features:

Barcode and key column specification
--allowed-list exact-match filtering (DEPRECATED - use collapse for key correction)
Downsampling capability
Detailed output with barcode lists
Gini coefficient and Simpson diversity calculation for both barcodes and keys

usage: outerspace count [-h] (--input-file INPUT_FILE | --input-dir INPUT_DIR)
                        (--output-file OUTPUT_FILE | --output-dir OUTPUT_DIR)
                        [--barcode-column BARCODE_COLUMN] [--key-column KEY_COLUMN]
                        [--sep SEP] [--row-limit ROW_LIMIT] [--allowed-list ALLOWED_LIST]
                        [--detailed] [--downsample DOWNSAMPLE] [--random-seed RANDOM_SEED]
                        [--config CONFIG] [--progress-bar] [--log-file LOG_FILE] 
                        [--log-level LOG_LEVEL]

Count unique barcodes per key value in CSV files

options:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        Input CSV file to process
  --input-dir INPUT_DIR
                        Input directory containing CSV files to process
  --output-file OUTPUT_FILE
                        Output CSV file for barcode counts
  --output-dir OUTPUT_DIR
                        Output directory for barcode counts
  --barcode-column BARCODE_COLUMN
                        Column containing barcodes
  --key-column KEY_COLUMN
                        Column to group by
  --sep SEP             CSV separator (default: ,)
  --row-limit ROW_LIMIT
                        Process only the first N rows (for testing)
  --allowed-list ALLOWED_LIST
                        DEPRECATED: Text file containing allowed keys (exact match only).
                        For key correction, use 'collapse --method nearest' instead.
  --detailed            Include barcode lists in output
  --downsample DOWNSAMPLE
                        Randomly sample reads with probability between 0 and 1
  --random-seed RANDOM_SEED
                        Random seed for downsampling

Common Arguments:
  --config CONFIG, -c CONFIG
                        Configuration file
  --progress-bar, -p    Enable progress bar
  --log-file LOG_FILE   Log file
  --log-level LOG_LEVEL
                        Log level (default: WARNING)

`outerspace align`

Aligns sequences from CSV files using spoa (Partial Order Alignment). Features:

Counts unique barcodes per key (sequence)
Filters keys based on unique barcode count
Multiple alignment algorithms (local, global, semi-global)
Configurable alignment scoring parameters
Option to align all keys together or group by barcode
CSV output with aligned sequences and counts

usage: outerspace align [-h] (--input-file INPUT_FILE | --input-dir INPUT_DIR)
                        [--output-file OUTPUT_FILE] [--output-dir OUTPUT_DIR]
                        [--key-column KEY_COLUMN] [--barcode-column BARCODE_COLUMN]
                        [--sep SEP] [--row-limit ROW_LIMIT] [--min-count MIN_COUNT]
                        [--top-n TOP_N] [--min-frequency MIN_FREQUENCY]
                        [--align-by-barcode] [--match MATCH] [--mismatch MISMATCH]
                        [--gap GAP] [--algorithm {0,1,2}] [--config CONFIG]
                        [--progress-bar] [--log-file LOG_FILE] [--log-level LOG_LEVEL]

Align sequences from CSV files using spoa

options:
  -h, --help            show this help message and exit
  --input-file INPUT_FILE
                        Input CSV file to process
  --input-dir INPUT_DIR
                        Input directory containing CSV files to process
  --output-file OUTPUT_FILE
                        Output file for aligned sequences (default: stdout)
  --output-dir OUTPUT_DIR
                        Output directory for aligned sequences (for batch processing)
  --key-column KEY_COLUMN
                        Column containing sequences to align (required)
  --barcode-column BARCODE_COLUMN
                        Column containing unique markers for counting (required)
  --sep SEP             CSV separator (default: ,)
  --row-limit ROW_LIMIT
                        Process only the first N rows (for testing)
  --min-count MIN_COUNT
                        Minimum unique barcode count threshold for a key to be included (default: 0)
  --top-n TOP_N         Keep only top N keys by unique barcode count (default: all)
  --min-frequency MIN_FREQUENCY
                        Minimum frequency percentage threshold based on unique barcode count (default: 0.0)
  --align-by-barcode    If set, align keys separately grouped by identical barcodes (default: align all together)
  --match MATCH         Score for matching bases (default: 5)
  --mismatch MISMATCH   Penalty for mismatching bases (default: -4)
  --gap GAP             Penalty for gaps/indels (default: -8)
  --algorithm {0,1,2}   Alignment algorithm: 0=local (Smith-Waterman), 1=global (Needleman-Wunsch), 2=semi-global (default: 1)

Common Arguments:
  --config CONFIG, -c CONFIG
                        Configuration file
  --progress-bar, -p    Enable progress bar
  --log-file LOG_FILE   Log file
  --log-level LOG_LEVEL
                        Log level (default: WARNING)

Key Concepts:

Key: Column containing sequences to align
Barcode: Column containing unique markers used for counting
Count: Number of unique barcodes associated with each key (used for filtering)

Filtering: Keys are filtered based on their unique barcode count. Multiple filters can be applied simultaneously:

--min-count: Keep only keys with at least N unique barcodes
--top-n: Keep only the top N keys by unique barcode count
--min-frequency: Keep only keys representing at least X% of total unique barcodes

Alignment Modes:

Default: All filtered keys are aligned together in a single alignment
--align-by-barcode: Keys are grouped by barcode and aligned separately. Keys that share multiple barcodes will appear in multiple alignment groups (once per barcode)

Output Format: CSV with columns:

Default mode: key_column, aligned_sequence, unique_barcode_count
Barcode-grouped mode: key_column, barcode_column, aligned_sequence, unique_barcode_count

Example:

# Align all sequences with at least 5 unique barcodes
outerspace align \
  --input-file sequences.csv \
  --key-column sequence \
  --barcode-column barcode \
  --min-count 5 \
  --output-file aligned.csv

# Align top 10 sequences by barcode count, grouped by barcode
outerspace align \
  --input-file sequences.csv \
  --key-column sequence \
  --barcode-column barcode \
  --top-n 10 \
  --align-by-barcode \
  --output-file aligned_by_barcode.csv

`outerspace merge`

Merges multiple UMI count files into a single file. Features:

Wide and long output formats
Optional UMI clustering with various methods
Comprehensive metrics reporting
Sample name customization

usage: outerspace merge [-h] [--output-file OUTPUT_FILE] [--key-column KEY_COLUMN]
                        [--count-column COUNT_COLUMN] [--sample-names SAMPLE_NAMES [SAMPLE_NAMES ...]]
                        [--sep SEP] [--format {wide,long}] [--mismatches MISMATCHES]
                        [--method {cluster,adjacency,directional}] [--metrics METRICS]
                        [--config CONFIG] [--progress-bar] [--log-file LOG_FILE]
                        [--log-level LOG_LEVEL] files [files ...]

Merge multiple UMI count files into a single file

positional arguments:
  files                 Input CSV files to merge

options:
  -h, --help            show this help message and exit
  --output-file OUTPUT_FILE
                        Output CSV file for merged counts
  --key-column KEY_COLUMN
                        Column containing UMIs
  --count-column COUNT_COLUMN
                        Column containing counts (if not provided, assumes count=1)
  --sample-names SAMPLE_NAMES [SAMPLE_NAMES ...]
                        Optional list of sample names (must match number of input files)
  --sep SEP             CSV separator (default: ,)
  --format {wide,long}  Output format: wide (samples as columns) or long (sample,umi,count columns) (default: wide)
  --mismatches MISMATCHES
                        Number of mismatches allowed for clustering (default: 0)
  --method {cluster,adjacency,directional}
                        Clustering method to use (default: directional)
  --metrics METRICS     Output YAML file for metrics

Common Arguments:
  --config CONFIG, -c CONFIG
                        Configuration file
  --progress-bar, -p    Enable progress bar
  --log-file LOG_FILE   Log file
  --log-level LOG_LEVEL
                        Log level (default: WARNING)

`outerspace stats`

Calculates comprehensive statistics from UMI count data. Features:

Multiple diversity metrics (Gini coefficient, Shannon diversity, Simpson diversity)
Efficiency measures (recovery rate, efficiency rate, error rate)
Redundancy analysis
Support for pre-counted values
Scale factor application for normalized counts
--allowed-list filtering

usage: outerspace stats [-h] [--key-column KEY_COLUMN] [--count-column COUNT_COLUMN]
                        [--scale SCALE] [--sep SEP] [--allowed-list ALLOWED_LIST]
                        [--config CONFIG] [--progress-bar] [--log-file LOG_FILE]
                        [--log-level LOG_LEVEL] input_files [input_files ...]

Calculate all single-library statistics from counts in a CSV column

positional arguments:
  input_files           Input CSV file(s) to process (supports glob patterns)

options:
  -h, --help            show this help message and exit
  --key-column KEY_COLUMN
                        Column containing keys
  --count-column COUNT_COLUMN
                        Column containing pre-counted values
  --scale SCALE         Scale factor for normalized values (e.g., if normalized to mean=1)
  --sep SEP             CSV separator (default: ,)
  --allowed-list ALLOWED_LIST
                        Text file containing allowed values (one per line)

Common Arguments:
  --config CONFIG, -c CONFIG
                        Configuration file
  --progress-bar, -p    Enable progress bar
  --log-file LOG_FILE   Log file
  --log-level LOG_LEVEL
                        Log level (default: WARNING)

`outerspace subsample`

Estimates metric stability through random subsampling at various sample sizes. Features:

Random row-based subsampling with configurable sample sizes
Multiple replicates per sample size for robust estimates
Reproducible results via RNG seeding
Reuses metric configurations from [[stats.metrics]] or [[subsample.metrics]]
Long-format output for easy visualization and downstream analysis
Useful for determining minimum sequencing depth and assessing metric robustness

usage: outerspace subsample [-h] [--sep SEP] [--sample-sizes SAMPLE_SIZES]
                            [--n-replicates N_REPLICATES] [--seed SEED]
                            [-o OUTPUT_FILE] [--threads THREADS]
                            [--config CONFIG] [--progress-bar]
                            [--log-file LOG_FILE] [--log-level LOG_LEVEL]
                            input_file

Estimate metric stability through random subsampling

positional arguments:
  input_file            Input CSV file to subsample (collapse output)

options:
  -h, --help            show this help message and exit
  --sep SEP             CSV separator (default: ,)
  --sample-sizes SAMPLE_SIZES
                        Comma-separated sample size percentages (e.g., '0.1,1,10,50')
  --n-replicates N_REPLICATES
                        Number of replicates per sample size
  --seed SEED           Random seed for reproducibility (default: 42)
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Output CSV file (default: stdout)
  --threads THREADS     Number of threads for parallel processing (default: 1)

Common Arguments:
  --config CONFIG, -c CONFIG
                        Configuration file
  --progress-bar, -p    Enable progress bar
  --log-file LOG_FILE   Log file
  --log-level LOG_LEVEL
                        Log level (default: WARNING)

Output Format: Long-format CSV with columns:

sample_size_pct: Sample size as percentage of total
sample_size_n: Absolute number of rows sampled
replicate: Replicate number (0-indexed)
metric_name: Name of the metric
metric_value: Calculated metric value

Configuration: Metrics are defined using [[subsample.metrics]] or [[stats.metrics]] sections in the config file. Each metric requires method, name, and method-specific parameters (e.g., key_column, barcode_column).

Example:

# Estimate diversity metric stability at different sample sizes
outerspace subsample -c config.toml \
  --sample-sizes "0.1,1,5,10,25,50,100" \
  --n-replicates 10 \
  --seed 42 \
  -o subsample_results.csv \
  collapse_output.csv

This generates a long-format table that can be easily visualized to show how metrics stabilize with increasing sample size, helping determine optimal sequencing depth.

`outerspace pipeline`

Runs complete OUTERSPACE workflows using Snakemake. Features:

Integrated workflow execution
Configuration management
Comprehensive error handling
Custom Snakemake argument support

usage: outerspace pipeline [-h] [--snakemake-args SNAKEMAKE_ARGS] 
                           config_file snakemake_config

Run the complete OUTERSPACE pipeline using Snakemake

positional arguments:
  config_file           TOML configuration file with search patterns
  snakemake_config      YAML configuration file for Snakemake workflow

options:
  -h, --help            show this help message and exit
  --snakemake-args SNAKEMAKE_ARGS
                        Additional arguments to pass to Snakemake (e.g. --snakemake-args="--dry-run --cores 4")

`outerspace visualize`

Creates visualizations of barcode counts from CSV files. Features:

Histogram generation
Customizable plot parameters
Multiple output formats
Log scale support

usage: outerspace visualize [-h] [--sep SEP] [--bins BINS] [--title-prefix TITLE_PREFIX]
                            [--xlabel XLABEL] [--ylabel YLABEL] [--log-scale]
                            [--format FORMAT] [--config CONFIG] [--progress-bar]
                            [--log-file LOG_FILE] [--log-level LOG_LEVEL]
                            input_dir output_dir

Visualize barcode counts from CSV files

positional arguments:
  input_dir             Input directory containing CSV files with barcode counts
  output_dir            Output directory for visualization plots

options:
  -h, --help            show this help message and exit
  --sep SEP             CSV separator (default: ,)
  --bins BINS           Number of histogram bins (default: 50)
  --title-prefix TITLE_PREFIX
                        Prefix for plot titles (default: filename)
  --xlabel XLABEL       X-axis label (default: Number of Unique Barcodes)
  --ylabel YLABEL       Y-axis label (default: Count)
  --log-scale           Use log scale for y-axis
  --format FORMAT       Output image format (default: png)

Common Arguments:
  --config CONFIG, -c CONFIG
                        Configuration file
  --progress-bar, -p    Enable progress bar
  --log-file LOG_FILE   Log file
  --log-level LOG_LEVEL
                        Log level (default: WARNING)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commands

`outerspace findseq`

`outerspace collapse`

`outerspace count`

`outerspace align`

`outerspace merge`

`outerspace stats`

`outerspace subsample`

`outerspace pipeline`

`outerspace visualize`

FilesExpand file tree

commands.md

Latest commit

History

commands.md

File metadata and controls

Commands

outerspace findseq

outerspace collapse

outerspace count

outerspace align

outerspace merge

outerspace stats

outerspace subsample

outerspace pipeline

outerspace visualize

`outerspace findseq`

`outerspace collapse`

`outerspace count`

`outerspace align`

`outerspace merge`

`outerspace stats`

`outerspace subsample`

`outerspace pipeline`

`outerspace visualize`