Arguments Correspondence

Introduction

This documents describes how to execute the Broad institute GATK Best Practices described in https://www.broadinstitute.org/gatk/guide/best-practices using the Sentieon® Genomics software. The document also described the correspondence between arguments of the different tools used.

This document should help you determine how to convert your existing pipelines to using Sentieon® and allow you to provide feedback to the Sentieon® team on what arguments are required for your work but are unavailable in the Sentieon® Genomics software.

Correspondence of tools

The table below shows the Sentieon® tool that implement functionality consistent with existing GATK pipeline tools.

Table 1 Broad/GATK matching tools
Sentieon tool GATK pipeline tool Version correspondence for 202308.02
Sentieon BWA BWA BWA 0.7.17
Sentieon STAR STAR STAR 2.7.10b
Sentieon minimap2 minimap2 minimap2 2.26 (r1175)
Dedup and LocusCollector Picard MarkDuplicates Picard 2.9.0
Realigner RealignerTargetCreator and IndelRealigner GATK 3.7/GATK3.8
QualCal BaseRecalibrator GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1
ReadWriter PrintReads GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1
QualCal AnalyzeCovariates GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1
Genotyper UnifiedGenotyper GATK 3.7/GATK3.8
Haplotyper HaplotypeCaller GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1
GVCFtyper GenotypeGVCFs GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1
VarCal VariantRecalibrator GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1
ApplyVarCal ApplyRecalibration/ApplyVQSR GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1
TNsnv MuTect MuTect 1.1.5
TNhaplotyper MuTect2 GATK 3.7/GATK3.8
TNhaplotyper2 GATK4 Mutect2 GATK 4.2.0.0
RNASplitReadsAtJunction SplitNCigarReads GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1
AlignmentStat Picard CollectAlignmentSummaryMetrics Picard 2.9.0
BaseDistributionByCycle Picard CollectBaseDistributionByCycle Picard 2.9.0
CollectVCMetrics Picard CollectVariantCallingMetrics Picard 2.9.0
ContaminationAssessment ContEst GATK 3.7/GATK3.8
CoverageMetrics DepthOfCoverage GATK 3.7/GATK3.8
GCBias Picard CollectGcBiasMetrics Picard 2.9.0
HsMetricAlgo Picard CollectHsMetrics Picard 2.9.0
InsertSizeMetricAlgo Picard CollectInsertSizeMetrics Picard 2.9.0
MeanQualityByCycle Picard MeanQualityByCycle Picard 2.9.0
QualDistribution Picard QualityScoreDistribution Picard 2.9.0
QualityYield Picard CollectQualityYieldMetrics Picard 2.9.0
SequenceArtifactMetricsAlgo Picard CollectSequencingArtifactMetrics Picard 2.9.0
WgsMetricsAlgo Picard CollectWgsMetrics Picard 2.9.0
ContaminationModel GetPileupSummaries and CalculateContamination GATK 4.2.0.0
OrientationBias Mutect2 and LearnReadOrientationModel GATK 4.2.0.0
TNfilter FilterMutectCalls GATK 4.2.0.0

Detailed description per stage

Map to Reference - Alignment

GATK Best Practices command line

bwa mem -M -R '@RG\tID:GROUP_NAME \tSM:SAMPLE_NAME \tPL:PLATFORM' -p \
   -t NUMBER_THREADS REFERENCE.FASTA SAMPLE.FQ > ALIGNED.SAM
java -jar picard.jar SortSam INPUT=ALIGNED.SAM \
   OUTPUT=SORTED.SAM SORT_ORDER=coordinate
samtools view -bS SORTED.SAM > SORTED.BAM
samtools index SORTED.BAM

Sentieon® command line

sentieon bwa mem -M -R '@RG\tID:GROUP_NAME \tSM:SAMPLE_NAME \tPL:PLATFORM' -p \
   -t NUMBER_THREADS REFERENCE.FASTA SAMPLE.FQ | sentieon util sort \
   -o SORTED.BAM -t NUMBER_THREADS --sam2bam -i -

The BWA alignment command is identical, except that we recommend that the results from BWA be piped to the sorting stage in Sentieon®, instead of outputting to a SAM file.

The sorting using Sentieon® can only be ordered by coordinate.

Sentieon® will automatically create an index file for the sorted bam file.

Mark Duplicates - Dedup

GATK Best Practices command line

java -jar picard.jar MarkDuplicates INPUT=SORTED.BAM \
   OUTPUT=DEDUP.BAM METRICS_FILE=DEDUP_METRICS.TXT \
   REMOVE_DUPLICATES=true
java -jar picard.jar BuildBamIndex INPUT=DEDUP.BAM

Sentieon® command line

sentieon driver -t NUMBER_THREADS -i SORTED.BAM --algo LocusCollector\
   --fun score_info SCORE.TXT.GZ
sentieon driver -t NUMBER_THREADS -i SORTED.BAM --algo Dedup --rmdup \
   --score_info SCORE.TXT.GZ --metrics DEDUP_METRICS.TXT DEDUP.BAM

The last argument of the Sentieon® command line is the output bam file. Sentieon® will automatically create an index file for the deduped bam file.

Table 2 Argument correspondence for Dedup
Picard option Sentieon option Meaning
INPUT=SORTED.BAM -i SORTED.BAM Input the bam file
OUTPUT=DEDUP.BAM N/A Output bam file
METRICS_FILE=METRICS.TXT --metrics METRICS.TXT Output metrics
REMOVE_DUPLICATES=true --rmdup Remove duplicates from bam
OPTICAL_DUPLICATE_PIXEL_DISTANCE=DISTANCE --optical_dup_pix_dist DISTANCE Optical duplicate distance

Realign Indels - Realignment

GATK3 Best Practices command line

java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator \
   -R REFERENCE.FASTA -I DEDUP.BAM -L INTERVAL \
   -known KNOWN_SITES.VCF -o REALIGNEMNT_TARGETS.LIST
java -jar GenomeAnalysisTK.jar -T IndelRealigner \
   -R REFERENCE.FASTA -I DEDUP.BAM \
   -targetIntervals REALIGNEMNT_TARGETS.LIST \
   -known KNOWN_SITES.VCF -o REALIGNED.BAM

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i DEDUP.BAM \
   --algo Realigner -k KNOWN_SITES.VCF --interval_list INTERVAL \
   REALIGNED.BAM

The last argument of the Sentieon® command line is the output bam file.

Table 3 Argument correspondence for Realign
GATK option Sentieon option Meaning
-I DEDUP.BAM -i DEDUP.BAM Input the bam file
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-o REALIGNED.BAM N/A Output bam file
-known KNOWN_SITES.VCF -k KNOWN_SITES.VCF Known sites
-L INTERVAL --interval_list INTERVAL Interval to restrict calculation

Recalibrate Bases - BQSR

BQSR - calculate recalibration

GATK3 Best Practices command line to generate the recalibration table

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator \
   -R REFERENCE.FASTA -I REALIGNED.BAM -L INTERVAL \
   -knownSites KNOWN_SITES.VCF -o RECAL_DATA.TABLE

GATK4 Best Practices command line to generate the recalibration table

gatk BaseRecalibrator \
   -R REFERENCE.FASTA -I REALIGNED.BAM -L INTERVAL \
   --enable-baq --known-sites KNOWN_SITES.VCF \
   -O RECAL_DATA.TABLE

Sentieon command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i REALIGNED.BAM\
   --interval INTERVAL \
   --algo QualCal -k KNOWN_SITES.VCF RECAL_DATA.TABLE

The last argument of the Sentieon® command line is the recalibrated data table.

GATK4 removed the calculation of Per-Base Alignmnet Qualities (BAQ) to help reduce the runtime of BaseRecalibrator, https://github.com/broadinstitute/gatk/issues/2060. Adding the argument --enable-baq turns on BAQ calculation in the GATK4, which matches the behavior of Sentieon® QualCal algorithm.

Table 4 Argument correspondence - calculate BQSR - GATK3
GATK3 option Sentieon option Meaning
-I REALIGNED.BAM -i REALIGNED.BAM Input the bam file
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-o RECAL_DATA.TABLE N/A Output file
-knownSites KNOWN_SITES.VCF -k KNOWN_SITES.VCF Known sites
-L INTERVAL --interval INTERVAL Interval to restrict calculation
Table 5 Argument correspondence - calculate BQSR - GATK4
GATK4 option Sentieon option Meaning
-I REALIGNED.BAM -i REALIGNED.BAM Input the bam file
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-O RECAL_DATA.TABLE N/A Output file
--known-sites KNOWN_SITES.VCF -k KNOWN_SITES.VCF Known sites
-L INTERVAL --interval INTERVAL Interval to restrict calculation

BQSR - apply recalibration

GATK3 Best Practices command line to apply recalibration

java -jar GenomeAnalysisTK.jar -T PrintReads \
   -R REFERENCE.FASTA -I REALIGNED.BAM -L INTERVAL \
   -BQSR RECAL_DATA.TABLE -o RECALED.BAM

GATK4 Best Practices command line to apply recalibration

gatk ApplyBQSR \
   -R REFERENCE.FASTA -I REALIGNED.BAM -L INTERVAL \
   -bqsr RECAL_DATA.TABLE -O RECALED.BAM

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i REALIGNED.BAM\
   -q RECAL_DATA.TABLE --interval INTERVAL \
   --algo QualCal -k KNOWN_SITES RECAL_DATA.TABLE.POST \
   --algo ReadWriter RECALED.BAM

The last argument of the Sentieon® command line is the output bam file.

The Sentieon® ReadWriter command can be run together either with the step generating the RECAL_DATA.TABLE.POST above, or with the variant calling step to speed up the pipeline.

GATK4 removed the base quality score recalibration of INDELs when using default settings, so the GATK4 Best Practices command line shown above will not produce the same results as those produced by Sentieon®; in particular, the Sentieon® BAM output will contain BI/BD tags from the INDEL recalibration that will be missing from the GATK4 BAM output. This removal of INDEL recalibration was done to reduce GATK runtime at the expense of accuracy, so it is not recommend when using Sentieon® as the speed improvement is neglegible. In order to fully match the results of GATK4 ApplyBQSR, it is possible to use the --read_filter QualCalFilter option instead of the -q RECAL_DATA.TABLE in the ReadWriter command, which allows skipping the INDEL recalibration:

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i REALIGNED.BAM\
   --read_filter QualCalFilter,table=RECAL_DATA.TABLE.table,indel=false \
   --interval INTERVAL \
   --algo ReadWriter RECALED.BAM
Table 6 Argument correspondence - apply BQSR - GATK3
GATK3 option Sentieon option Meaning
-I REALIGNED.BAM -i REALIGNED.BAM Input the bam file
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-o RECALED.BAM N/A Output file
-L INTERVAL --interval INTERVAL Interval to restrict calculation
-BQSR RECAL_DATA.TABLE -q RECAL_DATA.TABLE Recalibration table
Table 7 Argument correspondence - apply BQSR - GATK4
GATK4 option Sentieon option Meaning
-I REALIGNED.BAM -i REALIGNED.BAM Input the bam file
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-O RECALED.BAM N/A Output file
-L INTERVAL --interval INTERVAL Interval to restrict calculation
-bqsr RECAL_DATA.TABLE -q RECAL_DATA.TABLE Recalibration table

BQSR - plot recalibration

GATK3 Best Practices command line to plot BQSR metrics

java -jar GenomeAnalysisTK.jar -T BaseRecalibrator \
   -R REFERENCE.FASTA -I REALIGNED.BAM -L INTERVAL \
   -knownSites KNOWN_SITES.VCF -BQSR RECAL_DATA.TABLE \
   -o RECAL_DATA.TABLE.POST
java -jar GenomeAnalysisTK.jar -T AnalyzeCovariates \
   -R REFERENCE.FASTA -before RECAL_DATA.TABLE \
   -after RECAL_DATA.TABLE.POST -csv RECAL_RESULT.CSV -plots BQSR.PDF

GATK4 Best Practices command line to plot BQSR metrics

 gatk BaseRecalibrator \
    -R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
    --enable-baq --known-sites KNOWN_SITES.VCF \
    -O RECAL_DATA.TABLE.POST
gatk AnalyzeCovariates \
   -before RECAL_DATA.TABLE -after RECAL_DATA.TABLE.POST \
   -csv RECAL_RESULT.CSV -plots BQSR.PDF

Sentieon® command line

sentieon driver -t NUMBER_THREADS --algo QualCal --plot --before \
   RECAL_DATA.TABLE --after RECAL_DATA.TABLE.POST RECAL_RESULT.CSV
sentieon plot QualCal -o BQSR.PDF RECAL_RESULT.CSV
Table 8 Argument correspondence - plot BQSR - GATK3/GATK4
GATK option Sentieon option Meaning
-R REFERENCE.FASTA N/A Reference file
-before RECAL_DATA.TABLE --before RECAL_DATA.TABLE Recalibration table
-after RECAL_DATA.TABLE --after RECAL_DATA.TABLE After-recalibration table
-plots BQSR.PDF -o BQSR.PDF Report file
-csv RECAL_RESULT.CSV N/A Output csv file

Unified Genotyper - Genotyper

GATK3 Best Practices command line

java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper \
   -R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
   -D DBSNP.VCF --glm [SNP/INDEL/BOTH] -mbq QUALITY \
   -stand_emit_conf CONFIDENCE -stand_call_conf CONFIDENCE \
   --output_mode [EMIT_VARIANTS_ONLY/EMIT_ALL_CONFIDENT_SITES/EMIT_ALL_SITES] \
   -o OUTPUT.VCF

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i RECALED.BAM\
   --interval INTERVAL \
   --algo Genotyper \
   -d DBSNP.VCF --var_type [SNP/INDEL/BOTH] --min_base_qual QUALITY \
   --emit_conf CONFIDENCE --call_conf CONFIDENCE \
   --emit_mode [VARIANT/CONFIDENT/ALL] \
   OUTPUT.VCF

The last argument of the Sentieon® command line is the variant vcf file. The tool will output a compressed VCF file when using .gz extension.

Bear in mind that since GATK 3.7, the stand_emit_conf is no longer supported, and the default value for stand_call_conf has been changed from 30 to 10, while the default in Sentieon® call_conf stayed at 30.

Table 9 Argument correspondence - UnifiedGenotyper
GATK option Sentieon option Meaning
-I RECALED.BAM -i RECALED.BAM Input the bam file
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-D DBSNP.VCF -d DBSNP.VCF dbSNP file
--glm [SNP/INDEL/BOTH] --var_type [SNP/INDEL/BOTH] Variant output type
-mbq QUALITY --min_base_qual QUALITY Minimum base quality
-stand_emit_conf CONFIDENCE --emit_conf CONFIDENCE Emit confidence threshold
-stand_call_conf CONFIDENCE --call_conf CONFIDENCE Call confidence threshold
--output_mode MODE --emit_mode MODE Emit mode
-ploidy PLOIDY --ploidy PLOIDY Ploidy of the sample
-o OUTPUT.VCF N/A Output variant file
-alleles GIVEN.VCF -gt_mode GENOTYPE_GIVEN_ALLELES --given GIVEN.VCF Perform variant calling using only the variants provided in the GIVEN_VCF

HaplotypeCaller - Halotyper

GATK3 Best Practices command line - VCF output

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller \
   -R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
   -D DBSNP.VCF -mbq QUALITY --minPruning FACTOR \
   -stand_emit_conf CONFIDENCE -stand_call_conf CONFIDENCE \
   -pcrModel [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
   --output_mode [EMIT_VARIANTS_ONLY/EMIT_ALL_CONFIDENT_SITES/EMIT_ALL_SITES] \
   -o OUTPUT.VCF

GATK4 Best Practices command line - VCF output

gatk HaplotypeCaller \
   -R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
   -D DBSNP.VCF -mbq QUALITY --min-pruning FACTOR \
   -stand-call-conf CONFIDENCE -new-qual false \
   --pcr-indel-model [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
   --output-mode [EMIT_VARIANTS_ONLY/EMIT_ALL_CONFIDENT_SITES/EMIT_ALL_SITES] \
   -O OUTPUT.VCF

Sentieon® command line - VCF output

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i RECALED.BAM\
   --interval INTERVAL \
   --algo Haplotyper -d DBSNP.VCF \
   --min_base_qual QUALITY --prune_factor FACTOR \
   --emit_conf CONFIDENCE --call_conf CONFIDENCE \
   --emit_mode [VARIANT/CONFIDENT/ALL]
   --pcr_indel_model [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
   OUTPUT.VCF

GATK3 Best Practices command line - gVCF output

java -jar GenomeAnalysisTK.jar -T HaplotypeCaller \
   -R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
   -D DBSNP.VCF -mbq QUALITY --minPruning FACTOR \
   -stand_emit_conf CONFIDENCE -stand_call_conf CONFIDENCE \
   -pcrModel [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
   --emitRefConfidence GVCF \
   -o OUTPUT.VCF

GATK4 Best Practices command line - gVCF output

gatk HaplotypeCaller \
   -R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
   -D DBSNP.VCF -mbq QUALITY --min-pruning FACTOR \
   -stand-call-conf CONFIDENCE \
   --pcr-indel-model [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
   -ERC GVCF \
   -O OUTPUT.VCF

Sentieon® command line - gVCF output

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i RECALED.BAM \
   --interval INTERVAL \
   --algo Haplotyper -d DBSNP.VCF \
   --min_base_qual QUALITY --prune_factor FACTOR \
   --emit_conf CONFIDENCE --call_conf CONFIDENCE \
   --emit_mode GVCF
   --pcr_indel_model [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
   OUTPUT.VCF

The last argument of the Sentieon® command line is the output vcf file. The tool will output a compressed VCF file when using .gz extension.

Bear in mind that since GATK 3.7, the stand_emit_conf is no longer supported. Also, the default value for stand_call_conf was changed from 30 to 10 in the GATK 3.7 to GATK 4.0 and was reverted to 30 in the GATK 4.1, while the default in Sentieon® call_conf has remained at 30.

Since the GATK 4.1 -newQual is default genotyping model.

Table 10 Argument correspondence - HaplotypeCaller - GATK3
GATK3 option Sentieon option Meaning
-I RECALED.BAM -i RECALED.BAM Input the bam file
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-D DBSNP.VCF -d DBSNP.VCF dbSNP file
-mbq QUALITY --min_base_qual QUALITY Minimum base quality
-stand_emit_conf CONFIDENCE --emit_conf CONFIDENCE Emit confidence threshold
-stand_call_conf CONFIDENCE --call_conf CONFIDENCE Call confidence threshold
--output_mode MODE --emit_mode MODE Emit mode
--emitRefConfidence GVCF --emit_mode gvcf Produce a g.vcf output
-ploidy PLOIDY --ploidy PLOIDY Ploidy of the sample
-o OUTPUT.VCF N/A Output variant file
-alleles GIVEN.VCF -gt_mode GENOTYPE_GIVEN_ALLELES --given GIVEN.VCF Perform variant calling using only the variants provided in the GIVEN_VCF
-L INTERVAL --interval INTERVAL Interval to restrict calculation
-mmq QUALITY --min_map_qual QUALITY Minimum mapping quality
--minPruning FACTOR --prune_factor FACTOR Pruning factor
-pcrModel MODEL --pcr_indel_model MODEL PCR model
--dontUseSoftClippedBases --trim_soft_clip Trim off soft-clipped bases
--annotation ANNOTATION --annotation ANNOTATION Annotations to apply to the variant calls
--excludeAnnotation ANNOTATION --annotation !ANNOTATION Annotations to exclude in the variant calls by using the ‘!’ prefix
-newQual --genotype_model multinomial Use the new simplified allele count model
Table 11 Argument correspondence - HaplotypeCaller - GATK4
GATK4 option Sentieon option Meaning
-I RECALED.BAM -i RECALED.BAM Input the bam file
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-D DBSNP.VCF -d DBSNP.VCF dbSNP file
-mbq QUALITY --min_base_qual QUALITY Minimum base quality
N/A --emit_conf CONFIDENCE Emit confidence threshold
-stand-call-conf CONFIDENCE --call_conf CONFIDENCE Call confidence threshold
--output-mode MODE --emit_mode MODE Emit mode
-ERC GVCF --emit_mode gvcf Produce a g.vcf output
-ploidy PLOIDY --ploidy PLOIDY Ploidy of the sample
-O OUTPUT.VCF N/A Output variant file
--alleles GIVEN.VCF --genotyping-mode GENOTYPE_GIVEN_ALLELES --given GIVEN.VCF Perform variant calling using only the variants provided in the GIVEN_VCF
-L INTERVAL --interval INTERVAL Interval to restrict calculation
--minimum-mapping-quality QUALITY --min_map_qual QUALITY Minimum mapping quality
--min-pruning FACTOR --prune_factor FACTOR Pruning factor
--pcr-indel-model MODEL --pcr_indel_model MODEL PCR model
--dont-use-soft-clipped-bases --trim_soft_clip Trim off soft-clipped bases
--annotation ANNOTATION --annotation ANNOTATION Annotations to apply to the variant calls
--annotations-to-exclude ANNOTATION --annotation !ANNOTATION Annotations to exclude in the variant calls by using the ‘!’ prefix
-new-qual --genotype_model multinomial Use the new simplified allele count model

Joint Genotype - GVCFtyper

GATK3 Best Practices command line

java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs \
   -R REFERENCE.FASTA -L INTERVAL \
   -D DBSNP.VCF \
   -stand_emit_conf CONFIDENCE -stand_call_conf CONFIDENCE \
   -V INPUT_GVCF_1 -V INPUT_GVCF_2 -V INPUT_GVCF_3 \
   -o OUTPUT.VCF

GATK4 Best Practices command line

gatk GenotypeGVCFs \
   -R REFERENCE.FASTA -L INTERVAL \
   -D DBSNP.VCF -new-qual false \
   -stand-call-conf CONFIDENCE \
   -V INPUT_GVCF_1 -V INPUT_GVCF_2 -V INPUT_GVCF_3 \
   -O OUTPUT.VCF

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   --interval INTERVAL \
   --algo GVCFtyper \
   -d DBSNP.VCF \
   --emit_conf CONFIDENCE --call_conf CONFIDENCE \
   --emit_mode [VARIANT/CONFIDENT/ALL] \
   -v INPUT_GVCF_1 -v INPUT_GVCF_2 -v INPUT_GVCF_3 \
   OUTPUT.VCF

The last argument of the Sentieon® command line is the output vcf file. The tool will output a compressed VCF file when using .gz extension.

Bear in mind that since GATK 3.7, the stand_emit_conf is no longer supported. Also, the default value for stand_call_conf was changed from 30 to 10 in the GATK 3.7 to GATK 4.0 and was reverted to 30 in the GATK 4.1, while the default in Sentieon® call_conf has remained at 30.

Since the GATK 4.1 -newQual is default genotyping model.

Table 12 Argument correspondence - GenotypeGVCF - GATK3
GATK3 option Sentieon option Meaning
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-L INTERVAL --interval INTERVAL Interval to restrict calculation
-D DBSNP.VCF -d DBSNP.VCF dbSNP file
-stand_emit_conf CONFIDENCE --emit_conf CONFIDENCE Emit confidence threshold
-stand_call_conf CONFIDENCE --call_conf CONFIDENCE Call confidence threshold
N/A --emit_mode MODE Emit mode
-V INPUT_GVCF_X -v INPUT_GVCF_X g.vcf input files
-o OUTPUT.VCF N/A Output variant file
-newQual --genotype_model multinomial Use the new simplified allele count model
Table 13 Argument correspondence - GenotypeGVCF - GATK4
GATK4 option Sentieon option Meaning
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-L INTERVAL --interval INTERVAL Interval to restrict calculation
-D DBSNP.VCF -d DBSNP.VCF dbSNP file
N/A --emit_conf CONFIDENCE Emit confidence threshold
-stand-call-conf CONFIDENCE --call_conf CONFIDENCE Call confidence threshold
N/A --emit_mode MODE Emit mode
-V INPUT_GVCF_X -v INPUT_GVCF_X g.vcf input files
-O OUTPUT.VCF N/A Output variant file
-new-qual --genotype_model multinomial Use the new simplified allele count model

Filter Variants - VQSR

VQSR - calculate recalibration

GATK3 Best Practices command line

java -jar GenomeAnalysisTK.jar -T VariantRecalibrator \
   -R REFERENCE.FASTA -input INPUT.VCF \
   -an ANNOTATION_1 -an ANNOTATION_2 … \
   -mode [SNP/INDEL] \
   --resource:RESOURCE_PARAM RESOURCE.VCF …\
   -tranche TRANCH_THRES -tranche TRANCH_THRES …\
   --maxGaussians MAX_GAUSS --maxNegativeGaussians MAX_GAUSS \
   --maxIterations MAX_ITERATIONS \
   --aggregate AGREGATE_VCF \
   -tranchesFile TRANCHES_FILE \
   -rscriptFile R_PLOT_FILE \
   -recalFile RECAL_FILE

GATK4 Best Practices command line

gatk VariantRecalibrator \
   -R REFERENCE.FASTA -V INPUT.VCF \
   -an ANNOTATION_1 -an ANNOTATION_2 … \
   -mode [SNP/INDEL] \
   --resource:RESOURCE_PARAM RESOURCE.VCF …\
   -tranche TRANCH_THRES -tranche TRANCH_THRES …\
   --max-gaussians MAX_GAUSS --max-negative-gaussians MAX_GAUSS \
   --max-iterations MAX_ITERATIONS \
   --aggregate AGREGATE_VCF \
   --tranches-file TRANCHES_FILE \
   --rscript-file R_PLOT_FILE \
   -O RECAL_FILE

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   --algo VarCal -v INPUT.VCF \
   --annotation ANNOTATION_1 --annotation ANNOTATION_2 …\
   --var_type [SNP/INDEL]  \
   --resource RESOURCE.VCF --resource_param RESOURCE_PARAM …\
   --tranche TRANCH_THRES --tranche TRANCH_THRES …\
   --max_gaussian MAX_GAUSS --max_neg_gaussian MAX_GAUSS \
   --max_iter MAX_ITERATIONS \
   --nthr NUMBER_THREADS_EM --srand RANDOM_SEED \
   --aggregate_data AGREGATE_VCF \
   --tranches_file TRANCHES_FILE \
   --plot_file PLOT_FILE \
   RECAL_FILE

The last argument of the Sentieon® command line is the output recal file.

The resource argument in Sentieon® is split into 2 consecutive arguments, one with the resource file and one with the resource parameters.

Table 14 Argument correspondence - calculate VQSR - GATK3
GATK3 option Sentieon option Meaning
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-input INPUT.VCF -v INPUT.VCF vcf input file
-an ANNONTATION --annotation ANNOTATION Annotation to use
-mode [SNP/INDEL] --var_type [SNP/INDEL] Mode to use
--resource --resource/--resource_param Resources to use
-tranche TRANCH_THRES --tranche TRANCH_THRES Thresholds for tranches
--maxGaussians MAX_GAUSS --max_gaussians MAX_GAUSS Max number of Gaussians used for positive model
--maxNegativeGaussians MAX_GAUSS --max_neg_gaussians MAX_GAUSS Max number of Gaussians used for negative model
--maxIterations MAX_ITERATIONS --max_iter MAX_ITERATIONS Max number of iterations
N/A --srand RANDOM_SEED Random seed for the EM calculation
--aggregate AGREGATE_VCF --aggregate_data AGREGATE_VCF Input aggregate data
-tranchesFile TRANCHES_FILE --tranches_file TRANCHES_FILE Output tranches file
-rscriptFile R_PLOT_FILE --plot_file PLOT_FILE Output file for plotting
-recalFile RECAL_FILE N/A Output recalibration file
-MQCap NUMBER --max_mq NUMBER Maximum MQ in the data
Table 15 Argument correspondence - calculate VQSR - GATK4
GATK4 option Sentieon option Meaning
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
--variant INPUT.VCF -v INPUT.VCF vcf input file
-an ANNONTATION --annotation ANNOTATION Annotation to use
-mode [SNP/INDEL] --var_type [SNP/INDEL] Mode to use
--resource --resource/--resource_param Resources to use
-tranche TRANCH_THRES --tranche TRANCH_THRES Thresholds for tranches
--max-gaussians MAX_GAUSS --max_gaussians MAX_GAUSS Max number of Gaussians used for positive model
--max-negative-gaussians MAX_GAUSS --max_neg_gaussians MAX_GAUSS Max number of Gaussians used for negative model
--max-iterations MAX_ITERATIONS --max_iter MAX_ITERATIONS Max number of iterations
N/A --srand RANDOM_SEED Random seed for the EM calculation
--aggregate AGREGATE_VCF --aggregate_data AGREGATE_VCF Input aggregate data
--tranches-file TRANCHES_FILE --tranches_file TRANCHES_FILE Output tranches file
--rscript-file R_PLOT_FILE --plot_file PLOT_FILE Output file for plotting
-O RECAL_FILE N/A Output recalibration file
-mq-cap NUMBER --max_mq NUMBER Maximum MQ in the data

VQSR - apply recalibration

GATK3 Best Practices command line

java -jar GenomeAnalysisTK.jar -T ApplyRecalibration \
   -R REFERENCE.FASTA -input INPUT.VCF \
   -mode [SNP/INDEL] --ts_filter_level SENSITIVITY \
   -tranchesFile TRANCHES_FILE -recalFile RECAL_FILE \
   -o OUTPUT.VCF

GATK4 Best Practices command line

gatk ApplyVQSR \
   -R REFERENCE.FASTA -V INPUT.VCF \
   -mode [SNP/INDEL] -ts-filter-level SENSITIVITY \
   --tranches-file TRANCHES_FILE --recal-file RECAL_FILE \
   -O OUTPUT.VCF

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   --algo ApplyVarCal -v INPUT.VCF \
   --var_type [SNP/INDEL] --sensitivity SENSITIVITY \
   --tranches_file TRANCHES_FILE --recal RECAL_FILE \
   OUTPUT.VCF

The last argument of the Sentieon® command line is the output vcf file. The tool will output a compressed VCF file when using .gz extension.

Table 16 Argument correspondence - apply VQSR - GATK3
GATK option Sentieon option Meaning
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-input INPUT.VCF -v INPUT.VCF vcf input file
--ts_filter_level SENSITIVITY --sensitivity SENSITIVITY Sensitivity
-mode [SNP/INDEL] --var_type [SNP/INDEL] Mode to use
-tranchesFile TRANCHES_FILE --tranches_file TRANCHES_FILE Input tranches file
-recalFile RECAL_FILE --recal RECAL_FILE Input recalibration file
-o OUTPUT.VCF N/A Output variant file
Table 17 Argument correspondence - apply VQSR - GATK4
GATK option Sentieon option Meaning
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-V INPUT.VCF -v INPUT.VCF vcf input file
-ts-filter-level SENSITIVITY --sensitivity SENSITIVITY Sensitivity
-mode [SNP/INDEL] --var_type [SNP/INDEL] Mode to use
--tranches-file TRANCHES_FILE --tranches_file TRANCHES_FILE Input tranches file
--recal-file RECAL_FILE --recal RECAL_FILE Input recalibration file
-O OUTPUT.VCF N/A Output variant file

MuTect - TNsnv

MuTect Best Practices command line

java -jar mutect.jar -T MuTect \
   -R REFERENCE.FASTA -L INTERVAL \
   -I:normal NORMAL_RECALED.BAM -I:tumor TUMOR_RECALED.BAM \
   --dbsnp DBSNP.VCF -o CALL_STATS_OUTPUT.TXT -vcf OUTPUT.VCF

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i NORMAL_RECALED.BAM -i TUMOR_RECALED.BAM \
   --interval INTERVAL \
   --algo TNsnv --dbsnp DBSNP.VCF \
   --tumor_sample TUMOR_SM --normal_sample NORMAL_SM \
   -call_stats_out CALL_STATS_OUTPUT.TXT OUTPUT.VCF

The last argument of the Sentieon® command line is the output vcf file. The tool will output a compressed VCF file when using .gz extension.

The normal_sample and tumor_sample arguments are required in Sentieon as the BAM files are not explicitly differentiated, and could be a single co-realigned BAM file.

Table 18 Argument correspondence - TNsnv
GATK option Sentieon option Meaning
N/A -i COREALIGNED.BAM Input the corealigned bam file
-I:normal NORMAL_RECALED.BAM -i NORMAL_RECALED.BAM Input the bam files
-I:tumor TUMOR_RECALED.BAM -i TUMOR_RECALED.BAM Input the bam files
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
--normal_sample_name NORMAL_SAMPLE --normal_sample NORMAL_SAMPLE Input normal sample name
--tumor_sample_name TUMOR_SAMPLE --tumor_sample TUMOR_SAMPLE Input tumor sample name
--dbsnp DBSNP.VCF --dbsnp DBSNP.VCF dbSNP file
--cosmic COSMIC.VCF --cosmic COSMIC.VCF Input cosmic VCF file
--normal_panel PON.VCF --pon PON.VCF Input panel-of-normal VCF file
--artifact_detection_mode --detect_pon Turn on mode to detect artifacts in normal sample, used to generate the panel-of-normal
-vcf OUTPUT.VCF N/A Output tumor variants file
-o CALL_STATS.OUT --call_stats_out CALL_STATS.OUT Output call statistics file
--coverage_file COVERAGE_FILE --stdcov_out COVERAGE_FILE Output standard coverage wiggle file
--tumor_depth_file FILE --tumor_depth_out FILE Output wiggle file of depth of tumor reads
--normal_depth_file FILE --normal_depth_out FILE Output wiggle file of depth of normal reads
--power_file FILE --power_out FILE Output power file
--min_qscore QUALITY --min_base_qual QUALITY Filtering quality of the bases used in variant calling
--initial_tumor_lod NUMBER --min_init_tumor_lod NUMBER Minimum tumor log odds in the initial pass calling variants
--tumor_lod NUMBER --min_tumor_lod NUMBER Minimum tumor log odds in the final call of variants
--normal_lod NUMBER --min_normal_lod NUMBER Minimum normal log odds used to check that the tumor variant is not a normal variant
--fraction_contamination NUMBER --contamination_frac NUMBER Estimation of the contamination fraction from other samples
--minimum_mutation_cell_fraction NUMBER --min_cell_mutation_frac NUMBER Minimum fraction of cells which have mutation
--strand_artifact_lod NUMBER --min_strand_bias_lod NUMBER Minimum log odds for calling strand bias
--strand_artifact_power_threshold NUMBER --min_strand_bias_power NUMBER Minimum power for calling strand bias
--dbsnp_normal_lod NUMBER --min_dbsnp_normal_lod NUMBER Minimum log odds for calling normal non-variant at dbsnp sites
--minimum_normal_allele_fraction NUMBER --min_normal_allele_frac NUMBER Minimum allele fraction to be considered in normal
--tumor_f_pretest NUMBER --min_tumor_allele_frac NUMBER Minimum allelic fraction in tumor sample
--gap_events_threshold NUMBER --max_indel NUMBER Maximum of nearby indel events that are allowed
--heavily_clipped_read_fraction NUMBER --max_read_clip_frac NUMBER Maximum fraction of soft/hard clipped bases in a read
--fraction_mapq0_threshold NUMBER --max_mapq0_frac NUMBER Maximum ratio of reads whose mapq are 0 used to determine poor mapped area
--pir_median_threshold NUMBER --min_pir_median NUMBER Minimum read position median
--pir_mad_threshold NUMBER --min_pir_mad NUMBER Minimum read position median absolute deviation
--required_maximum_alt_all-ele_mapping_quality_score NUMBER --max_alt_mapq NUMBER Required maximum value of alt allele mapping quality score
--max_alt_alleles_in_normal_count NUMBER --max_normal_alt_cnt NUMBER Maximum alt alleles count in normal pileup
--max_alt_alleles_in_normal_qscore_sum NUMBER --max_normal_alt_qsum NUMBER Maximum quality score sum of alt allele in normal pileup
--max_alt_allele_in_normal_fraction NUMBER --max_normal_alt_frac NUMBER Maximum fraction of alt allele in normal pileup
--power_constant_af NUMBER --power_allele_frac NUMBER Allele fraction used in power calculations

MuTect2 - TNhaplotyper

GATK3 MuTect2 Best Practices command line

java -jar GenomeAnalysisTK.jar -T MuTect2 \
   -R REFERENCE.FASTA -L INTERVAL \
   -I:normal NORMAL_RECALED.BAM -I:tumor TUMOR_RECALED.BAM \
   -D DBSNP.VCF -o OUTPUT.VCF

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i NORMAL_RECALED.BAM -i TUMOR_RECALED.BAM \
   --interval INTERVAL \
   --algo TNhaplotyper --dbsnp DBSNP.VCF \
   --tumor_sample TUMOR_SM --normal_sample NORMAL_SM \
   OUTPUT.VCF

The last argument of the Sentieon® command line is the output vcf file. The tool will output a compressed VCF file when using .gz extension.

The normal_sample and tumor_sample arguments are required in Sentieon as the BAM files are not explicitly differentiated, and could be a single co-realigned BAM file.

Table 19 Argument correspondence - TNhaplotyper
GATK option Sentieon option Meaning
N/A -i COREALIGNED.BAM Input the corealigned bam file
-I:normal NORMAL_RECALED.BAM -i NORMAL_RECALED.BAM Input the bam files
-I:tumor TUMOR_RECALED.BAM -i TUMOR_RECALED.BAM Input the bam files
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
N/A --normal_sample NORMAL_SAMPLE Input normal sample name
N/A --tumor_sample TUMOR_SAMPLE Input tumor sample name
-D DBSNP.VCF --dbsnp DBSNP.VCF dbSNP file
--cosmic COSMIC.VCF --cosmic COSMIC.VCF Input cosmic VCF file
--normal_panel PON.VCF --pon PON.VCF Input panel-of-normal VCF file
--artifact_detection_mode --detect_pon Turn on mode to detect artifacts in normal sample. It is used to generate the panel-of-normals
-o OUTPUT.VCF N/A Output tumor variants file.
-mbq QUALITY --min_base_qual QUALITY Filtering quality of the bases used in variant calling
--minPruning FACTOR --prune_factor FACTOR Pruning factor
-pcrModel MODEL --pcr_indel_model MODEL PCR model
--initial_tumor_lod NUMBER --min_init_tumor_lod NUMBER Minimum tumor log odds in the initial pass calling variants
--initial_normal_lod NUMBER --min_init_normal_lod NUMBER Minimum normal log odds in the initial pass calling variants
--tumor_lod NUMBER --min_tumor_lod NUMBER Minimum tumor log odds in the final call of variants
--normal_lod NUMBER --min_normal_lod NUMBER Minimum normal log odds used to check that the tumor variant is not a normal variant
--max_alt_alleles_in_normal_count NUMBER --max_normal_alt_cnt NUMBER Maximum alt alleles count in normal pileup
--max_alt_alleles_in_normal_qscore_sum NUMBER --max_normal_alt_qsum NUMBER Maximum quality score sum of alt allele in normal pileup
--max_alt_allele_in_normal_fraction NUMBER --max_normal_alt_frac NUMBER Maximum fraction of alt allele in normal pileup
-contaminationFile TAB_FILE --tumor_contamination_frac NUMBER Estimation of the contamination fraction from other samples on the tumor sample
  --normal_contamination_frac NUMBER Estimation of the contamination fraction from other samples on the normal sample

GATK4 Mutect2 - TNhaplotyper2 and TNfilter

GATK4 Mutect2 Best Practies command line

gatk Mutect2 -R REFERENCE.FASTA -I TUMOR_RECALED.BAM \
   -tumor TUMOR_SM -I NORMAL_RECALED.BAM -normal NORMAL_SM \
   --germline-resource GNOMAD.VCF -O TMP.VCF \
   --f1r2-tar-gz F1R2.TAR.GZ -L INTERVAL
gatk GetPileupSummaries -R REFERENCE.FASTA -I TUMOR_RECALED.BAM \
   -V GNOMAD.VCF -O TUMOR.PILEUPS -L INTERVAL
gatk GetPileupSummaries -R REFERENCE.FASTA -I NORMAL_RECALED.BAM \
   -V GNOMAD.VCF -O NORMAL.PILEUPS -L INTERVAL
gatk LearnReadOrientationModel -I F1R2.TAR.GZ -O PRIORS
gatk CalculateContamination -I TUMOR.PILEUPS \
   -matched NORMAL.PILEUPS --tumor-segmentation SEGMENTS \
   -O CONTAMINATION.TABLE
gatk FilterMutectCalls -V TMP.VCF -R REFERENCE.FASTA \
   -O OUTPUT.VCF --contamination-table CONTAMINATION.TABLE \
   --tumor-segmentation SEGMENTS -ob-priors PRIORS \
   --stats TMP.VCF.stats --filtering-stats OUTPUT.VCF.stats

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i TUMOR_RECALED.BAM -i NORMAL_RECALED.BAM \
   --interval INTERVAL \
   --algo TNhaplotyper2 \
   --tumor_sample TUMOR_SM --normal_sample NORMAL_SM \
   --germline_vcf GNOMAD.VCF TMP.VCF \
   --algo OrientationBias --tumor_sample TUMOR_SM PRIORS \
   --algo ContaminationModel \
   --tumor_sample TUMOR_SM --normal_sample NORMAL_SM \
   -v GNOMAD.VCF --tumor_segments SEGMENTS CONTAMINATION.TABLE
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   --algo TNfilter -v TMP.VCF --tumor_sample TUMOR_SM \
   --normal_sample NORMAL_SM \
   --contamination CONTAMINATION.TABLE
   --tumor_segments SEGMENTS --orientation_priors PRIORS \
   OUTPUT.VCF

Some VCFs split multi-allelic sites into separate records. Mutect2 will ignore later split multi-allelic records in VCFs passed through the --germline-resource argument while TNhaplotyper2 will process all multi-allelic records in VCFs passed through the --germline_vcf argument. This can cause different results when the GNOMAD.VCF has multi-allelic sites split into separate records.

The above commands use the same VCF file, GNOMAD.VCF, for both Mutect2/TNhaplotyper2 and GetPileupSummaries/ContaminationModel. The GATK best practices typically uses a GnomAD VCF for Mutect2 but uses the same GnomAD VCF filtered for variants with AF > 0.01 || AF < 0.2 in GetPileupSummaries. Using the same VCF with --min_af 0.01 --max_af 0.2 (default) has the same effect.

Table 20 Argument correspondence - Mutect2 and TNhaplotyper2 - GATK4
GATK4 option Sentieon option Meaning
-I TUMOR_RECALED.BAM -i TUMOR_RECALED.BAM Input the bam files
-I NORMAL_RECALED.BAM -i NORMAL_RECALED.BAM Input the bam files
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference fasta
-L INTERVAL --interval INTERVAL Interval to restrict calculation
-tumor TUMOR_SM --tumor_sample TUMOR_SM Input tumor sample name
-normal NORMAL_SM --normal_sample NORMAL_SM Input normal sample name
-pon PON_FILE --pon PON_FILE A panel-of-normal file
-mbq MIN_BQ --min_base_qual MIN_BQ Minimum base quality
--min-pruning PRUNE --prune_factor PRUNE Pruning factor in local assembly
--pcr-indel-model INDEL_MODEL --pcr_indel_model INDEL_MODEL PCR indel error model
-init-lod INIT_T_LOD --min_init_tumor_lod INIT_T_LOD Minimum tumorLOD for candidate selection
-emit-lod T_LOD --min_tumor_lod T_LOD Minimum tumorLOD for called variants
--normal-lod N_LOD --min_normal_lod N_LOD Minimum normalLOD for called variants
--germline-resource GERMLINE.VCF --germline_vcf GERMLINE.VCF A germline VCF containing allele frequencies
--af-of-alleles-not-in-resource AF --default_af AF Allele frequency for variants not found in the germline VCF
--max-population-af MAX_AF --max_germline_af MAX_AF Maximum germline allele frequency in tumor-only mode
--genotype-pon-sites true --call_pon_sites Call candidate variants in the PoN
--callable-depth --callable_depth Minimum depth to be considered for statistics

Arguments in the OrientationBias algo map to arguments in both Mutect2 and LearnReadOrientationModel.

Table 21 Argument correspondence - Mutect2/LearnReadOrientationModel and OrientationBias
GATK4 option Sentieon option Meaning
-I TUMOR_RECALED.BAM -i TUMOR_RECALED.BAM Input the bam files
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference fasta
-tumor TUMOR_SM --tumor_sample TUMOR_SM Input tumor sample name
--f1r2-min-bq MIN_BQ --min_base_qual MIN_BQ Minimum base quality
--f1r2-median-mq MIN_MAPQ --min_median_map_qual MIN_MAPQ Minimum median mapping quality
--f1r2-max-depth MAX_DEPTH --max_depth MAX_DEPTH Sites with a higher depth will be grouped

Arguments in the ContaminationModel algo map to arguments in both GetPileupSummaries and CalculateContamination.

Table 22 Argument correspondence - GetPileupSummaries/CalculateContamination and ContaminationModel
GATK4 option Sentieon option Meaning
--input TUMOR_RECALED.BAM -i TUMOR_RECALED.BAM Input the bam files
--reference REFERENCE.FASTA -r REFERENCE.FASTA Reference fasta
--intervals INTERVAL --interval INTERVAL Interval to restrict calculation
N/A --tumor_sample TUMOR_SM Input tumor sample name
N/A --normal_sample NORMAL_SM Input normal sample name
--min-mapping-quality MIN_MAPQ --min_map_qual MIN_MAPQ Minimum mapping quality
--variant GNOMAD.VCF -v GNOMAD.VCF A VCF with population allele frequencies
--minimum-population-allele-frequency MIN_AF --min_af MIN_AF Minimum population allele frequency
--maximum-population-allele-frequency MIN_AF --max_af MAX_AF Maximum population allele frequency
--tumor-segmentation SEGMENTS --tumor_segments SEGMENTS Allele frequency segmentations output
Table 23 Argument correspondence - FilterMutectCalls and TNfilter
GATK4 option Sentieon option Meaning
-V TMP.VCF -v TMP.VCF The input VCF
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference fasta
N/A --tumor_sample TUMOR_SM Input tumor sample name
N/A --normal_sample NORMAL_SM Input normal sample name
--contamination-table CONTAMINATION.TABLE --contamination CONTAMINATION.TABLE The contamination table file
--tumor-segmentation SEGMENTS --tumor_segments SEGMENTS The tumor segmentation table
--ob-priors PRIORS --orientation_priors PRIORS The orientation prior table
--threshold-strategy STRATEGY --threshold_strategy STRATEGY Method to determine the filtering threshold
--f-score-beta BETA --f_score_beta BETA Relative weight of recall to precision in the F-score
--false-discovery-rate MAX_FP --max_fp_rate MAX_FP Maximum expected false-positive rate
--initial-threshold --threshold THRESHOLD Threshold for the constant strategy
--min-median-base-quality MIN_BQ --min_median_base_qual MIN_BQ Minimum median base quality
--max-events-in-region MAX_COUNT --max_event_count MAX_COUNT Maximum number of events in an active region
--unique-alt-read-count MIN_READS --unique_alt_reads MIN_READS Minimum number of unique reads supporting the alt allele
--max-median-fragment-length-difference MAX_MFRL --max_mfrl_diff MAX_MFRL Maximum median fragment length difference
--distance-on-haplotype MAX_DIST --max_haplotype_distance MAX_DIST Maximum distance to determine an artifact
--min-allele-fraction MIN_AF --min_tumor_af MIN_AF Minimum alternate allele fraction in the tumor sample
--min-median-mapping-quality MIN_MAPQ --min_median_map_qual MIN_MAPQ Minimum median mapping quality
--long-indel-length MAX_LEN --long_indel_length MAX_LEN Longer indels will use the reference mapping quality
--max-alt-allele-count MAX_ALT --max_alt_count MAX_ALT Maximum number of alternate alleles at a site
--max-n-ratio MAX_N --max_n_ratio MAX_N Maximum ratio of N to alt bases
--normal-p-value-threshold P_VALUE --normal_p_value P_VALUE P-value theshold for normal artifacts
--min-median-read-position MIN_DIST --min_median_pos MIN_DIST Minimum median distanct to the end of the read
--min-slippage-length MIN_SLIPPAGE --min_slippage_length MIN_SLIPPAGE Minimum length for polymerase slippage in STR regions
--pcr-slippage-rate SLIPPAGE_RATE --slippage_rate SLIPPAGE_RATE The rate of PCR slippage
--min-reads-per-strand MIN_ALT --min_alt_reads_per_strand MIN_ALT Number of reads supporting the alt allele per strand

SplitNCigarReads - RNASplitReadsAtJunction

GATK3 SplitNCigarReads Best Practices command line

java -jar GenomeAnalysisTK.jar -T SplitNCigarReads \
   -R REFERENCE.FASTA -I DEDUPED.BAM -o SPLIT.BAM \
   -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 \
   -U ALLOW_N_CIGAR_READS

GATK4 SplitNCigarReads Best Practices command line

gatk SplitNCigarReads \
   -R REFERENCE.FASTA -I DEDUPED.BAM -O SPLIT.BAM

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i DEDUPED.BAM --algo RNASplitReadsAtJunction \
   --reassign_mapq 255:60 SPLIT.BAM

The last argument of the Sentieon® command line is the output bam file.

Table 24 Argument correspondence - SplitNCigarReads and RNASplitReadsAtJunction - GATK3
GATK3 option Sentieon option Meaning
-I DEDUPED.BAM -i DEDUPED.BAM Input the bam files
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference fasta
-rf ReassignOneMappingQuality -RMQF IN_QUAL -RMQT OUT_QUAL --reassign_mapq IN_QUAL:OUT_QUAL Reassign Mapping Quality from STAR
--doNotFixOverhangs --ignore_overhang Whether to ignore overhang
--maxBasesInOverhang NUMBER --overhang_max_bases NUMBER Max number of bases allowed in a hard-clipped overhang. Overhang will not be clipped if there are more than this value of bases
--maxMismatchesInOverhang NUMBER --overhang_max_mismatches NUMBER Max number of mismatches allowed in a non-hard-clipped overhang. Complete overhang will be hard-clipped if # of mismatches is above this value
Table 25 Argument correspondence - SplitNCigarReads and RNASplitReadsAtJunction - GATK4
GATK4 option Sentieon option Meaning
-I DEDUPED.BAM -i DEDUPED.BAM Input the bam files
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference fasta
N/A --reassign_mapq IN_QUAL:OUT_QUAL Reassign Mapping Quality from STAR
--do-not-fix-overhangs --ignore_overhang Whether to ignore overhang
--max-bases-in-overhang NUMBER --overhang_max_bases NUMBER Max number of bases allowed in a hard-clipped overhang. Overhang will not be clipped if there are more than this value of bases
--max-mismatches-in-overhang NUMBER --overhang_max_mismatches NUMBER Max number of mismatches allowed in a non-hard-clipped overhang. Complete overhang will be hard-clipped if # of mismatches is above this value

CollectAlignmentSummaryMetrics - AlignmentStat

Picard CollectAlignmentSummaryMetrics command line

java -jar picard.jar CollectAlignmentSummaryMetrics \
   I=ALIGNED.BAM O=ALN_METRICS.TXT \
   R=REFERENCE.FASTA \
   ADAPTER_SEQUENCE=ADAPTERS_SEQ

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i ALIGNED.BAM --algo AlignmentStat \
   --adapter_seq ADAPTERS_SEQ ALN_METRICS.TXT
Table 26 Argument correspondence - CollectAlignmentSummaryMetrics and AlignmentStat
Picard option Sentieon option Meaning
I=ALIGNED.BAM -i ALIGNED.BAM Input the bam files
O=ALN_METRICS.TXT N/A Output metrics
R=REFERENCE.FASTA -r REFERENCE.FASTA Reference file
ADAPTER_SEQUENCE=ADAPTERS_SEQ --adapter_seq ADAPTERS_SEQ A string of adapters

CollectBaseDistributionByCycle - BaseDistributionByCycle

Picard CollectBaseDistributionByCycle command line

java -jar picard.jar CollectBaseDistributionByCycle \
   I=ALIGNED.BAM O=BASE_DISTRIBUTION_METRICS.TXT \
   CHART_OUTPUT=BASE_DISTRIBUTION.PDF

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i ALIGNED.BAM --algo BaseDistributionByCycle \
   BASE_DISTRIBUTION_METRICS.TXT
Table 27 Argument correspondence - CollectBaseDistributionByCycle and BaseDistributionByCycle
Picard option Sentieon option Meaning
I=ALIGNED.BAM -i ALIGNED.BAM Input the bam files
O=BASE_DISTRIBUTION_METRICS.TXT N/A Output metrics
CHART_OUTPUT=BASE_DISTRIBUTION.PDF N/A Output chart
ALIGNED_READS_ONLY=true --aligned_reads_only true Calculate the base distribution over aligned reads only
PF_READS_ONLY=true --pf_reads_only true Calculate the base distribution over PF reads only

CollectVariantCallingMetrics - CollectVCMetrics

Picard CollectVariantCallingMetrics command line

java -jar picard.jar CollectVariantCallingMetrics \
   I=CALLS.VCF O=VC_METRICS_OUT DBSNP=DBSNP.VCF

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   --algo CollectVCMetrics -d DBSNP.VCF -v CALLS.VCF \
   VC_METRICS_OUT
Table 28 Argument correspondence - CollectVCMetrics and CollectVariantCallingMetrics
Picard option Sentieon option Meaning
I=CALLS.VCF -v CALLS.VCF vcf input file
O=VC_METRICS_OUT N/A Output basename
DBSNP=DBSNP.VCF -d DBSNP.VCF dbSNP file

ContEst - ContaminationAssessment

GATK3 Best Practices command line

java -jar GenomeAnalysisTK.jar -T ContEst -I TUMOR_RECALED.BAM \
   -R REFERENCE.FASTA -pf POPULATION.VCF --genotypes GENOTYPES.VCF \
   -o OUTPUT.TXT

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i TUMOR_RECALED.BAM \
   --algo ContaminationAssessment --pop_vcf POPULATION.VCF \
   --genotype_vcf GENOTYPES.VCF OUTPUT.TXT
Table 29 Argument correspondence - ContaminationAssessment and ContEst
GATK option Sentieon option Meaning
-I TUMOR_RECALED.BAM -i TUMOR_RECALED.BAM Input the bam files
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference fasta
-pf POPULATION.VCF --pop_vcf POPULATION.VCF The VCF file containing allele frequency information for the population
--genotypes GENOTYPES.VCF --genotype_vcf GENOTYPES.VCF The VCF file containing variants reported for the inidividual
-llc [META/SAMPLE/READGROUP] --type [META/SAMPLE/READGROUP] Assess contamination by sample, lane or all reads
--min_qscore MIN_BQ --min_base_qual MIN_BQ Any bases with a quality less than MIN_BQ will be ignored
--min_mapq MIN_MAPQ --min_map_qual MIN_MAPQ Any reads with a mapping quality less than MIN_MAPQ will be ignored
-mbc MINIMUM_BASE_COUNT --min_basecount MINIMUM_BASE_COUNT The minimum number of bases present at a locus for contamination to be assessed
--beta_threshold TRIM --trim_thresh TRIM Theshold that will be used to trim sites
--trim_fraction TRIM_FRACTION --trim_frac TRIM_FRACTION Maximum fraction of sites that may be trimmed
-pc PRECISION --precision PRECISION The precision on the output percent number
-br BASE_REPORT --base_report BASE_REPORT The output file that will contain an extended report on the processed data
-population POPULATION --population POPULATION A population for the baseline allele frequency of the sample
-o OUTPUT.TXT N/A The output file

DepthOfCoverage - CoverageMetrics

GATK3 Best Practices command

java -jar GenomeAnalysisTK.jar -T DepthOfCoverage \
   -R REFERENCE.FASTA -I DEDUPED.BAM \
   -geneList GENE_LIST.REFSEQ -ct THRESHOLD \
   -o OUTPUT_BASE

GATK4 Best Practices command

gatk DepthOfCoverage \
   -R REFERENCE.FASTA -I DEDUPED.BAM \
   -gene-list GENE_LIST.REFSEQ \
   --summary-coverage-threshold THRESHOLD \
   -O OUTPUT_BASE

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i DEDUPED.BAM --algo CoverageMetrics \
   --gene_list GENE_LIST.REFSEQ --cov_thresh THRESHOLD \
   OUTPUT_BASE
Table 30 Argument correspondence - CoverageMetrics and DepthOfCoverage - GATK3
GATK3 option Sentieon option Meaning
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-I DEDUPED.BAM -i DEDUPED.BAM Input the bam files
-geneList GENE_LIST.REFSEQ --gene_list GENE_LIST.REFSEQ RefSeq file used to aggregate results to the gene level
--countType COUNT_TYPE --count_type COUNT_TYPE Determines how to deal with overlapping paried ends
-pt PARTITION --partition PARTITION Determines how to partition the data
-ct THRESHOLD --cov_thresh THRESHOLD Add aggregation metrics for the percentage of bases with coverage greater than THRESHOLD
--start MIN_DEPTH --histogram_low MIN_DEPTH The smallest histogram bin
--stop MAX_DEPTH --histogram_high MAX_DEPTH The largest histogram bin
--nBins NUM_BINS --histogram_bin_count NUM_BINS The number of histogram bins
-mmq MIN_MAPQ --min_map_qual MIN_MAPQ Minimum mapping quality of reads used
--maxMappingQuality MAX_MAPQ --max_map_qual MAX_MAPQ Maximum mapping quality of reads used
-mbq MIN_BASEQ --min_base_qual MIN_BASEQ Minimum base quality of bases used
--maxBaseQuality MAX_BASEQ --max_base_qual MAX_BASEQ Maximum base quality of bases used
-omitBaseOutput --omit_base_output Omit output of the per locus coverage
-omitSampleSummary --omit_sample_stat Omit output of the summary results
-omitLocusTable --omit_locus_stat Omit output of histogram files
-omitIntervals --omit_interval_stat Omit output of interval statistics
-baseConts --print_base_counts Include the number of "ACGTND" in the output per locus coverage
--includeRefNSites --include_ref_N Include coverage data in loci where the reference genome is set to N
--ignoreDeletionSites --ignore_del_sites Ignore coverage data in loci where there are deletions
-dels --include_del Include deletions and add deletion counts
-o OUTPUT_BASE N/A Output file basename
Table 31 Argument correspondence - CoverageMetrics and DepthOfCoverage - GATK4
GATK4 option Sentieon option Meaning
-R REFERENCE.FASTA -r REFERENCE.FASTA Reference file
-I DEDUPED.BAM -i DEDUPED.BAM Input the bam files
-gene-list GENE_LIST.REFSEQ --gene_list GENE_LIST.REFSEQ RefSeq file used to aggregate results to the gene level
--count-type COUNT_TYPE --count_type COUNT_TYPE Determines how to deal with overlapping paried ends
-pt PARTITION --partition PARTITION Determines how to partition the data
--summary-coverage-threshold THRESHOLD --cov_thresh THRESHOLD Add aggregation metrics for the percentage of bases with coverage greater than THRESHOLD
--nBins NUM_BINS --histogram_bin_count NUM_BINS The number of histogram bins
-mbq MIN_BASEQ --min_base_qual MIN_BASEQ Minimum base quality of bases used
--min-base-quality MAX_BASEQ --max_base_qual MAX_BASEQ Maximum base quality of bases used
--omit-depth-output-at-each-base --omit_base_output Omit output of the per locus coverage
--omit-per-sample-statistics --omit_sample_stat Omit output of the summary results
--omit-locus-table --omit_locus_stat Omit output of histogram files
--omit-interval-statistics --omit_interval_stat Omit output of interval statistics
--print-base-counts --print_base_counts Include the number of "ACGTND" in the output per locus coverage
--include-ref-n-sites --include_ref_N Include coverage data in loci where the reference genome is set to N
--ignore-deletion-sites --ignore_del_sites Ignore coverage data in loci where there are deletions
--include-deletions --include_del Include deletions and add deletion counts
-O OUTPUT_BASE N/A Output file basename

CollectGcBiasMetrics - GCBias

Picard CollectGcBiasMetrics command line

java -jar picard.jar CollectGcBiasMetrics \
   I=DEDUPED.BAM O=GC_METRICS.TXT CHART=GC_BIAS.PDF \
   S=SUMMARY.TXT R=REFERENCE.FASTA ASSUME_SORTED=true

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i DEDUPED.BAM --algo GCBias --summary SUMMARY.TXT \
   GC_METRICS.TXT
sentieon plot GCBias -o GC_BIAS.PDF GC_METRICS.TXT
Table 32 Argument correspondence - GCBias and CollectGcBiasMetrics
Picard option Sentieon option Meaning
I=DEDUPED.BAM -i DEDUPED.BAM Input the bam files
R=REFERENCE.FASTA -r REFERENCE.FASTA Reference file
O=GC_METRICS.TXT N/A GC bias metrics results
CHART=GC_BIAS.PDF -o GC_BIAS.PDF GC bias metrics report
S=SUMMARY.TXT --summary SUMMARY.TXT GC bias metrics summary results
LEVEL=LEVEL --accum_level LEVEL The accumulation level

CollectHsMetrics - HsMetricAlgo

Picard CollectHsMetrics command line

java -jar picard.jar CollectHsMetrics \
   I=DEDUPED.BAM O=HS_METRICS.TXT R=REFERENCE.FASTA \
   BAIT_INTERVALS=BAITS TARGET_INTERVALS=TARGETS

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i DEDUPED.BAM --algo HsMetricAlgo --targets_list TARGETS \
   --baits_list BAITS HS_METRICS.TXT
Table 33 Argument correspondence - HsMetricAlgo and CollectHsMetrics
Picard option Sentieon option Meaning
I=DEDUPED.BAM -i DEDUPED.BAM Input the bam files
R=REFERENCE.FASTA -r REFERENCE.FASTA Reference file
O=HS_METRICS.TXT N/A HS metrics results
BAIT_INTERVALS=BAITS --baits_list BAITS Interval list input file of baits
TARGET_INTERVALS=TARGETS --targets_list TARGETS Interval list input file of targets
CLIP_OVERLAPPING_READS --clip_overlapping_reads Clip overlapping reads
MINIMUM_MAPPING_QUALITY=MIN_MAPQ --min_map_qual MIN_MAPQ Minimum read mapping quality
MINIMUM_BASE_QUALITY=MIN_BASEQ --min_base_qual MIN_BASEQ Minimum base quality
COVERAGE_CAP=COVERAGE --coverage_cap COVERAGE Maximum coverage limit in the histogram

CollectInsertSizeMetrics - InsertSizeMetricAlgo

Picard CollectInsertSizeMetrics command line

java -jar picard.jar CollectInsertSizeMetrics \
   I=DEDUPED.BAM O=IS_METRICS.TXT R=REFERENCE.FASTA \
   H=IS_METRICS.PDF

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i DEDUPED.BAM --algo InsertSizeMetricAlgo \
   IS_METRICS.TXT
sentieon plot InsertSizeMetricAlgo -o IS_METRICS.PDF IS_METRICS.TXT
Table 34 Argument correspondence - InsertSizeMetricAlgo and CollectInsertSizeMetrics
Picard option Sentieon option Meaning
I=DEDUPED.BAM -i DEDUPED.BAM Input the bam files
O=IS_METRICS.TXT N/A IS metrics results
R=REFERENCE.FASTA -r REFERENCE.FASTA Reference file
H=IS_METRICS.PDF -o IS_METRICS.PDF Insert size metrics report

MeanQualityByCycle - MeanQualityByCycle

Picard MeanQualityByCycle command line

java -jar picard.jar MeanQualityByCycle \
   I=DEDUPED.BAM O=MQ_METRICS.TXT R=REFERENCE.FASTA \
   CHART=MQ_METRICS.PDF

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i DEDUPED.BAM --algo MeanQualityByCycle \
   MQ_METRICS.TXT
sentieon plot MeanQualityByCycle -o MQ_METRICS.PDF MQ_METRICS.TXT
Table 35 Argument correspondence - MeanQualityByCycle and MeanQualityByCycle
Picard option Sentieon option Meaning
I=DEDUPED.BAM -i DEDUPED.BAM Input the bam files
O=MQ_METRICS.TXT N/A MQ metrics results
R=REFERENCE.FASTA -r REFERENCE.FASTA Reference file
CHART=MQ_METRICS.PDF -o MQ_METRICS.PDF Mean quality metrics report

QualityScoreDistribution - QualDistribution

Picard QualityScoreDistribution command line

java -jar picard.jar QualityScoreDistribution \
   I=DEDUPED.BAM O=QD_METRICS.TXT \
   CHART=QD_METRICS.PDF

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i DEDUPED.BAM --algo QualDistribution \
   QD_METRICS.TXT
sentieon plot QualDistribution -o QD_METRICS.PDF QD_METRICS.TXT
Table 36 Argument correspondence - QualDistribution and QualityScoreDistribution
Picard option Sentieon option Meaning
I=DEDUPED.BAM -i DEDUPED.BAM Input the bam files
O=QD_METRICS.TXT N/A QD metrics results
N/A -r REFERENCE.FASTA Reference file
CHART=QD_METRICS.PDF -o QD_METRICS.PDF Quality distribution metrics report

CollectQualityYieldMetrics - QualityYield

Picard CollectQualityYieldMetrics command line

java -jar picard.jar CollectQualityYieldMetrics \
   I=DEDUPED.BAM O=YIELD_METRICS.TXT

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i DEDUPED.BAM --algo QualityYield \
   YIELD_METRICS.TXT
Table 37 Argument correspondence - QualityYield and CollectQualityYieldMetrics
Picard option Sentieon option Meaning
I=DEDUPED.BAM -i DEDUPED.BAM Input the bam files
O=YIELD_METRICS.TXT N/A Quality yield metrics results
INCLUDE_SECONDARY_ALIGNMENTS=true --include_supplementary Include supplementary alignments in the calculation
INCLUDE_SUPPLEMENTAL_ALIGNMENTS=true --include_secondary Include secondary alignments in the calculation

CollectSequencingArtifactMetrics - SequenceArtifactMetricsAlgo

Picard CollectSequencingArtifactMetrics command line

java -jar picard.jar CollectSequencingArtifactMetrics \
   I=DEDUPED.BAM O=ARTIFACT_METRICS_BASE R=REFERENCE.FASTA \
   DB_SNP=DBSNP.VCF
java -jar picard.jar ConvertSequencingArtifactToOxoG \
   I=DEDUPED.BAM O=ARTIFACT_METRICS_BASE R=REFERENCE.FASTA \
   OUTPUT_BASE=oxog_metrics

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i DEDUPED.BAM --algo SequenceArtifactMetricsAlgo \
   --dbsnp DBSNP.VCF ARTIFACT_METRICS_BASE
Table 38 Argument correspondence - SequenceArtifactMetricsAlgo and CollectSequencingArtifactMetrics
Picard option Sentieon option Meaning
I=DEDUPED.BAM -i DEDUPED.BAM Input the bam files
O=ARTIFACT_METRICS_BASE N/A Artifact metrics output base
R=REFERENCE.FASTA -r REFERENCE.FASTA Reference file
DB_SNP=DBSNP.VCF --dbsnp DBSNP.VCF A dbSNP file to exclude known polymorphisms
Q=MIN_BQ --min_base_qual MIN_BQ Minimum base quality for a base to be included
MQ=MIN_MAPQ --min_map_qual MIN_MAPQ Minimum mapping quality for a read to be included
MIN_INS=MIN_INSERT --min_insert_size MIN_INSERT Minimum insert size to include a read
MAX_INS=MAX_INSERT --max_insert_size MAX_INSERT Maximum insert size to include a read
UNPAIRED=true --include_unpaired Include unpaired reads
TANDEM=true --tandem_reads Include tandem reads
INCLUDE_DUPLICATES=true --include_duplicates Include duplicate reads
INCLUDE_NON_PF_READS=true --include_non_pf_reads Include non-PF reads
CONTEXT_SIZE=CONTEXT --context_size CONTEXT The number of context bases to include on each size

CollectWgsMetrics - WgsMetricsAlgo

Picard CollectWgsMetrics command line

java -jar picard.jar CollectWgsMetrics \
   I=DEDUPED.BAM O=WGS_METRICS.TXT R=REFERENCE.FASTA

Sentieon® command line

sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
   -i DEDUPED.BAM --algo WgsMetricsAlgo \
   WGS_METRICS.TXT
Table 39 Argument correspondence - WgsMetricsAlgo and CollectWgsMetrics
Picard option Sentieon option Meaning
I=DEDUPED.BAM -i DEDUPED.BAM Input the bam files
O=WGS_METRICS.TXT N/A WGS metrics results
R=REFERENCE.FASTA -r REFERENCE.FASTA Reference file
MQ=MIN_MAPQ --min_map_qual MIN_MAPQ Minimum mapping quality for a read to be included
Q=MIN_BQ --min_base_qual MIN_BQ Minimum base quality for a base to be included
CAP=COVERAGE_CAP --coverage_cap COVERAGE_CAP Maximum coverage limit for the histogram
COUNT_UNPAIRED=true --include_unpaired true Count unpaired reads and paired reads with one end unmapped
INCLUDE_BQ_HISTOGRAM=true --base_qual_histogram true Report a base quality histogram
SAMPLE_SIZE=SAMPLE_SIZE --sample_size SAMPLE_SIZE Sample size used for theoretical het sensitivity sampling

Other differences in usage

Sentieon® refers to tools as algorithms, so the option -T in GATK3 corresponds to the option --algo in Sentieon® .

Sentieon® produces log files directly to stdout and stderr, so the option -log is not available.

Sentieon® tries to use as many threads as the system has available, while GATK uses 1 thread by default. As such omitting option -nt in GATK, is not the same as omitting the option -t in Sentieon® .

Sentieon® does not do any down-sampling, so the following options are not available: --downsample_to_coverage, --downsample_to_fraction, --downsampling_type, …

Other general level arguments that are currently supported by Sentieon are:

  • --bam_compression: for algorithms that output a bam
  • --cram_write_options: for algorithms that output a cram