Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

modBAM

The modBAM format is a BAM/SAM file that contains modified base information in the MM and ML tags as specified by the Optional Fields Specification. This format is useful for storing per-read modification information alongside the alignment data. In Rastair, we're mainly interested in storing CpG methylation information, which is represented as 5-methylcytosine (5mC) on the C and G on the opposite strand.

Change of read sequence

The read sequence in the modBAM file will differ from the read sequence in the input BAM file.

Since Rastair deals with reads from TAPS, methylated Cs are represented as Ts in the read sequence. However, both the MM tag specification and other tools like modkit expect the fundamental base to be in the read sequence. That means, for a methylated CpG, the read sequence should contain a C on the forward strand and a G on the reverse strand, with only the MM tag indicating the modification instead of the presence of a T or A in the sequence.

To be compatible with this, Rastair will rewrite the read sequence at methylated positions when writing the modBAM output.