UMI binary

UMI is the binary used to process reads containing UMI sequences.

UMI syntax

The general syntax of the UMI binary is:

sentieon umi MODE [OPTIONS]

The supported modes (MODE) for this command are:

  • extract: pre-process FASTQ files containing reads with UMI sequences. The syntax of the extract MODE is:

    sentieon umi extract [OPTIONS] read_structure fastq1 [fastq2] [fastq3]
    

    where:

    • read_structure is the logical structure of the reads. It consists of a collection of integer+character pairs describing the #bases+type; the type can be M for molecular barcode, T for template and S for skip. The read structure consists of comma separated groups, where each group will be read from the corresponding input FASTQ (first group from first FASTQ, second group from second FASTQ…)

    • fastq1/2/3 are the FASTQ files. Up to 3 input FASTQ files are supported to allow the use case when the UMI sequence is already in a separate FASTQ file.

    The optional arguments (OPTIONS) for the UMI binary using the extract MODE include:

    • -o OUTPUT: the location and filename of the output file. If ommitted, the output will be stdout.

    • -d: if present, the extraction will be done in duplex mode.

    • --umi_tag TAG: the logic UMI tag. The default value is XR.