Command-Line Help for rastair
This document contains the help content for the rastair command-line program.
Version: 2.0.0
Command Overview:
rastair↴rastair call↴rastair per-read↴rastair convert↴rastair view↴rastair mbias↴rastair license↴
rastair
Rastair -- detect genetic variants and methylated positions from short-read sequencing data created using TET-Assisted Pyridine-Borane Sequencing.
See https://docs.rastair.com/ for more information.
Usage: rastair [OPTIONS] <COMMAND>
Subcommands:
call— Call methylated positionsper-read— Call methylation per-readconvert— Convert between different file formatsview— View internal format as JSON linesmbias— Calculate conversion per base position in readlicense— Show license -- rastair is licensed under a non-commercial use licence
Options:
-
-v,--verbose— Enable more loggingYou can also use the
RASTAIR_LOGenvironment variable to configure logging in a more precise way. See the documentation of thetracing-subscriberlibrary to learn more.
rastair call
Call methylated positions
Process TAPS-sequenced BAM files and call methylated positions.
If no output file is specified, the output is written to stdout. You can use --vcf and --bed to write to files instead.
If using -c (--cpgs-only), all CpG positions in the reference as well as de-novo CpGs are written. Stdout will default to BED.
Only variants that pass all filters are written by default. Use --all to get a full VCF file.
Usage: rastair call [OPTIONS] --fasta-file <FASTA_FILE> <BAM_FILE>
Arguments:
<BAM_FILE>— Path to sorted and indexed BAM file
Filter Options:
-
--keep-overlapping-reads— Whether to keep overlapping readsDefault value:
false -
--v-min-depth <V_MIN_DEPTH>Default value:
3 -
--max-coverage <MAX_COVERAGE>Default value:
1000 -
-q,--min-mapq <MIN_MAPQ>— Minimum mapping quality to consider a readDefault value:
1 -
-Q,--min-baseq <MIN_BASEQ>— Minimum base quality to consider a baseDefault value:
10 -
--nOT <N_OT>— For OT reads, exclude[r1_start, r1_end, r2_start, r2_end]bases from counting.The coordinates are relative to the read, so start is the distance from the 5' of the read, the end is the distance to the 3', irrespective of which way around the read aligns to the reference.
Also note that the distance is relative to read length, not alignment length, so soft-clipped bases count, too!
Default value:
0,0,0,0 -
--nOB <N_OB>— For OB reads, exclude[r1_start, r1_end, r2_start, r2_end]bases from counting.The coordinates are relative to the read, so start is the distance from the 5' of the read, the end is the distance to the 3', irrespective of which way around the read aligns to the reference.
Also note that the distance is relative to read length, not alignment length, so soft-clipped bases count, too!
Default value:
0,0,0,0 -
-f,--include-flags <INCLUDE_FLAGS>— Include reads that match all of these bit-flagsDefault value:
3 -
-F,--exclude-flags <EXCLUDE_FLAGS>— Exclude reads that match any of these bit-flagsDefault value:
3852 -
--cpg-novo-min-depth <CPG_NOVO_MIN_DEPTH>— Minimum reads needed in support of de-novo CpGDefault value:
2 -
--cpg-novo-min-baseq <CPG_NOVO_MIN_BASEQ>— Minimum base quality for de-novo CpGsDefault value:
15 -
--cpg-novo-min-mapq <CPG_NOVO_MIN_MAPQ>— Minimum mapping quality for de-novo CpGsDefault value:
50 -
--cpg-novo-min-vaf <CPG_NOVO_MIN_VAF>— Minimum variant allele frequency for de-novo CpGsDefault value:
0.2 -
--m-vaf-min <M_VAF_MIN>— The minimum variant allele frequencyDefault value:
0.2 -
--m-min-depth <M_MIN_DEPTH>— The minimum number of reads to call a position as methylatedDefault value:
3 -
--m-bq-ratio-min <M_BQ_RATIO_MIN>— The minimum quality ratio(ad_alt*bq_alt + 1) / (ad_ref*bq_ref + 1)Default value:
0.27 -
--m-read-position-min <M_READ_POSITION_MIN>— The minimum relative position in read for alt allele evidenceDefault value:
0.2 -
--m-read-position-max <M_READ_POSITION_MAX>— The maximum relative position in read for alt allele evidenceDefault value:
0.8 -
--m-max-coverage <M_MAX_COVERAGE>— The maximum coverage depth for methylation callingDefault value:
1000 -
--no-ml— Only use hard thresholds to call variants and methylation events.This disables using the machine learning models. This will make rastair much faster, but at the cost of accuracy.
-
--ml <ML>— Use machine learning model with this threshold value to call variants and methylation eventsWhen specified, a ML model will classify positions with a prediction score. Anything above this threshold is considered PASS.
For consistency with
--no-ml, this option can be also be specified as--mlwithout a value, which will use the default threshold.Default value:
0.50 -
--model <MODEL>— Path to the combined model file containing CpG, denovo, and others modelsDefault is the bundled model in the Rastair binary.
-
-c,--cpgs-only— Report CpGs only and default to BED outputOnly report positions that are CpGs in the reference or variants that would result in a de-novo CpG.
If combined with
--all, non-passing de-novo CpG positions and CpGs in the reference but without coverage in the sample will also be reported.Default value:
false -
--bed-include-empty— Include CpG positions with zero coverageThis can be useful to get a complete list of CpG positions in the output BED file. Note that this requires the input data to contain a complete list of CpG positions, e.g. by using the
--cpgs-onlyoption when calling methylation.
Input Options:
-r,--fasta-file <FASTA_FILE>— Path to sorted and indexed (via samtools faidx) FASTA file. Can be bgzip compressed, but requires both a gzi index and a fai index-l,--region <REGION>— Restrict to a specific chromosome or region of a chromosome. Format is "chr", "chr:start" or "chr:start-end", where start is 1-based and end is inclusive
Output Options:
-
--all— Output all positions, even if they do not pass filters.If combined with
--cpgs-only, only CpG positions will be reported, including non-passing de-novo CpGs, and those without coverage. -
-o,--vcf <VCF>— VCF/BCF output file path (use - to write to stdout)Format is guessed based on the file extension:
.vcffor VCF (uncompressed),.vcf.gzfor VCF (compressed),.bcffor BCF (compressed).mpk.lz4for internal format (Message Pack, LZ4-compressed) -
--vcf-info-fields <VCF_INFO_FIELDS>— Additional INFO fields to include in VCF output (comma-separated VCF field IDs)By default, only a minimal set is included.
Possible values:
AD,BQ,DP,MQ,MQ0,NS,AS_SB,SC5,AF,ABQ,AMQ,AS_SS_BQ,AS_SS_MQ,PIR,ENT100,NAB,NOI,M5mC_Strands,CPG,CPGnovo -
--vcf-format-fields <VCF_FORMAT_FIELDS>— Additional FORMAT fields to include in VCF output (comma-separated VCF field IDs)By default, only a minimal set is included.
Possible values:
GT,GL,GC,DP,M5mC,ML -
--bed <BED>— Output BED file with the called methylated positions -
--bed-format <BED_FORMAT>— Format of the output BED fileIf not specified, the format is guessed based on the file extension.
Possible values:
bed-gz: BGZIP compressed file, usually.bed.gzbed: Regular BED file, usually.bed
Processing Options:
-
--segment-max-length <SEGMENT_MAX_LENGTH>— Maximum length of a segment in basesUsed for splitting work between threads. Tweak this to adjust memory usage.
Default value:
100000 -
--segment-overlap <SEGMENT_OVERLAP>— Number of bases to overlap between segmentsHelpful to avoid missing variants at the edges of segments.
Default value:
200 -
--error-model <ERROR_MODEL>— The error model to useAccepts platform names or a custom error rate (e.g., 0.005)
Default value:
novaseq6000Possible values:
miseq: MiSeq https://support.illumina.com/sequencing/sequencing_instruments/miseq.htmlminiseq: MiniSeq https://support.illumina.com/sequencing/sequencing_instruments/miniseq.htmlnextseq500: NextSeq500 https://support.illumina.com/sequencing/sequencing_instruments/nextseq-500.htmlnextseq550: NextSeq550 https://support.illumina.com/sequencing/sequencing_instruments/nextseq-550.htmlhiseq2500: HiSeq2500 https://support.illumina.com/sequencing/sequencing_instruments/hiseq_2500.htmlnovaseq6000: NovaSeq6000 https://support.illumina.com/sequencing/sequencing_instruments/novaseq-6000.htmlhiseqxten: HiSeq X Ten https://support.illumina.com/sequencing/sequencing_instruments/hiseq-x.html
-
--vcf-threads <VCF_THREADS>— Number of threads to use for writing (and compressing) VCF filesThis is subtracted from
--threadsbut never below 1. Adjust this if you think that VCF writing is a bottleneck, e.g. when the output files contain a lot of positions.Default value:
1 -
-@,--threads <TOTAL_THREADS>— Number of threads to use for processing the BAM file. Will use all available threads when not specified.Note that VCF writing might use additional threads internally for compression. This can be overwritten with
--vcf-threads.Default value:
10[env:RASTAIR_THREADS]
rastair per-read
Call methylation per-read
This will produce a bed file that list the methylation status of all CpGs in every read that overlaps a CpG, plus some other metadata
Usage: rastair per-read [OPTIONS] --fasta-file <FASTA_FILE> <BAM_FILE>
Arguments:
<BAM_FILE>— Path to sorted and indexed BAM file
Filter Options:
-
-f,--include-flags <INCLUDE_FLAGS>— Include reads that match all of these bit-flagsDefault value:
3 -
-F,--exclude-flags <EXCLUDE_FLAGS>— Exclude reads that match any of these bit-flagsDefault value:
3852 -
-w,--max-read-length <MAX_READ_LENGTH>— expected maximum read length. If set too short, some read positions might not get counted. Safest to set this a bit higher than the actual read length, to allow for indels in readsDefault value:
200 -
-q,--min-mapq <MIN_MAPQ>— Minimum mapping quality per aligned readDefault value:
1 -
--exclude-ambiguous— Exclude reads where the orientation cannot be unambiguously determined -
--count-clipped— Count clipped positionsBy default, rastair ignores the leading (soft and hard) clipped positions in the "positions in read" columns. The indices written can be seen as "position in read relative to the first base actually aligned".
If
--count-clippedis set, clipped positions will instead be counted. The indices written then match the sequence of the read.
Input Options:
-r,--fasta-file <FASTA_FILE>— Path to sorted and indexed (via samtools faidx) FASTA file. Can be bgzip compressed, but requires both a gzi index and a fai index-l,--region <REGION>— Restrict to a specific chromosome or region of a chromosome. Format is "chr", "chr:start" or "chr:start-end", where start is 1-based and end is inclusive--calls <CALLS>— BED file Rastair wrote with methylation calls per position
Output Options:
-
-A,--all-reads— Report reads with no CpGs in them -
--bed <BED>— Output BED file with all readsDefault value:
- -
--bed-format <BED_FORMAT>— Format of the output BED reads fileIf not specified, the format is guessed based on the file extension.
Possible values:
bed-gz: BGZIP compressed file, usually.bed.gzbed: Regular BED file, usually.bed
Processing Options:
-
--segment-max-length <SEGMENT_MAX_LENGTH>— Maximum length of a segment in basesUsed for splitting work between threads. Tweak this to adjust memory usage.
Default value:
100000 -
--segment-overlap <SEGMENT_OVERLAP>— Number of bases to overlap between segmentsHelpful to avoid missing variants at the edges of segments.
Default value:
500 -
-@,--threads <TOTAL_THREADS>— Number of threads to use for processing the BAM file. Will use all available threads when not specified.Note that VCF writing might use additional threads internally for compression. This can be overwritten with
--vcf-threads.Default value:
10
rastair convert
Convert between different file formats
Usage: rastair convert [OPTIONS]
Filter Options:
-
-c,--cpgs-only— Report CpGs only and default to BED outputOnly report positions that are CpGs in the reference or variants that would result in a de-novo CpG.
If combined with
--all, non-passing de-novo CpG positions and CpGs in the reference but without coverage in the sample will also be reported.Default value:
false -
--bed-include-empty— Include CpG positions with zero coverageThis can be useful to get a complete list of CpG positions in the output BED file. Note that this requires the input data to contain a complete list of CpG positions, e.g. by using the
--cpgs-onlyoption when calling methylation. -
--bed-ml <ML_THRESHOLD>— Minimum ML score to consider a position as variantThis does nothing if the input data does not contain ML scores.
Default value:
0.50
Input Options:
-
-i,--input <INPUT>— Input fileDefault value:
- -
-f,--input-format <INPUT_FORMAT>— Input file format, guessed from file extension if not specifiedPossible values:
vcf: Text-based VCF format (.vcf)bcf: Binary VCF format (.bcf)vcf-compressed: Compressed text-based VCF format (.vcf.gz)mpk.lz4
Output Options:
-
-o,--output <OUTPUT>— Output fileDefault value:
- -
-F,--output-format <OUTPUT_FORMAT>— Output file format, guessed from file extension if not specifiedPossible values:
vcf: Text-based VCF format (.vcf)bcf: Binary VCF format (.bcf)vcf-compressed: Compressed text-based VCF format (.vcf.gz)mpk.lz4bed: Regular BED file, usually.bedbed-gz: BGZIP compressed file, usually.bed.gz
-
--all— Output all positions, even if they do not pass filters.If combined with
--cpgs-only, only CpG positions will be reported, including non-passing de-novo CpGs, and those without coverage.
Processing Options:
-
--error-model <ERROR_MODEL>Default value:
novaseq6000Possible values:
miseq: MiSeq https://support.illumina.com/sequencing/sequencing_instruments/miseq.htmlminiseq: MiniSeq https://support.illumina.com/sequencing/sequencing_instruments/miniseq.htmlnextseq500: NextSeq500 https://support.illumina.com/sequencing/sequencing_instruments/nextseq-500.htmlnextseq550: NextSeq550 https://support.illumina.com/sequencing/sequencing_instruments/nextseq-550.htmlhiseq2500: HiSeq2500 https://support.illumina.com/sequencing/sequencing_instruments/hiseq_2500.htmlnovaseq6000: NovaSeq6000 https://support.illumina.com/sequencing/sequencing_instruments/novaseq-6000.htmlhiseqxten: HiSeq X Ten https://support.illumina.com/sequencing/sequencing_instruments/hiseq-x.html
rastair view
View internal format as JSON lines
Usage: rastair view [OPTIONS] <INPUT>
Arguments:
<INPUT>— Message Pack file to view
Output Options:
-
-o,--output <OUTPUT>— Message Pack file to viewDefault value:
-
rastair mbias
Calculate conversion per base position in read
This will produce a mbias.html file with information about conversion counts relative to read position.
Please note that this is currently implemented as an R script. Unless you're using the official Docker image, you need to install R and the necessary packages yourself.
Usage: rastair mbias [OPTIONS] <BED_FILE>
Arguments:
<BED_FILE>— Input per-read BED file (can be gzipped)
Filter Options:
-
--region <REGION>— Genomic region -
--include-flag <INCLUDE_FLAG>— Include bitflag as integerDefault value:
3 -
--exclude-flag <EXCLUDE_FLAG>— Exclude bitflag as integerDefault value:
3852 -
--read-length <READ_LENGTH>— Read length as integer
Options:
-
--r-script-dir <R_SCRIPT_DIR>— Override directory to find R scriptsWhen not set, tries to look for
$rastair_path/scriptsand./scripts[env:R_SCRIPT_DIR]
Output Options:
-
--output-prefix <OUTPUT_PREFIX>— Output path prefixDefault value:
.
Processing Options:
-
--tabix-path <TABIX_PATH>— Path to tabix executableDefault value:
tabix
rastair license
Show license -- rastair is licensed under a non-commercial use licence
Usage: rastair license
This document was generated automatically by
clap-markdown.