Arguments Correspondence¶
Introduction¶
This documents describes how to execute the Broad institute GATK Best Practices described in https://www.broadinstitute.org/gatk/guide/best-practices using the Sentieon® Genomics software. The document also described the correspondence between arguments of the different tools used.
This document should help you determine how to convert your existing pipelines to using Sentieon® and allow you to provide feedback to the Sentieon® team on what arguments are required for your work but are unavailable in the Sentieon® Genomics software.
Correspondence of tools¶
The table below shows the Sentieon® tool that implement functionality consistent with existing GATK pipeline tools.
Sentieon tool |
GATK pipeline tool |
Version correspondence for 202308.03 |
---|---|---|
Sentieon BWA |
BWA |
BWA 0.7.17 |
Sentieon STAR |
STAR |
STAR 2.7.10b |
Sentieon minimap2 |
minimap2 |
minimap2 2.26 (r1175) |
Dedup and LocusCollector |
Picard MarkDuplicates |
Picard 2.9.0 |
Realigner |
RealignerTargetCreator and IndelRealigner |
GATK 3.7/GATK3.8 |
QualCal |
BaseRecalibrator |
GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1 |
ReadWriter |
PrintReads |
GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1 |
QualCal |
AnalyzeCovariates |
GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1 |
Genotyper |
UnifiedGenotyper |
GATK 3.7/GATK3.8 |
Haplotyper |
HaplotypeCaller |
GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1 |
GVCFtyper |
GenotypeGVCFs |
GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1 |
VarCal |
VariantRecalibrator |
GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1 |
ApplyVarCal |
ApplyRecalibration/ApplyVQSR |
GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1 |
TNsnv |
MuTect |
MuTect 1.1.5 |
TNhaplotyper |
MuTect2 |
GATK 3.7/GATK3.8 |
TNhaplotyper2 |
GATK4 Mutect2 |
GATK 4.2.0.0 |
RNASplitReadsAtJunction |
SplitNCigarReads |
GATK 3.7/GATK3.8/GATK 4.0/GATK 4.1 |
AlignmentStat |
Picard CollectAlignmentSummaryMetrics |
Picard 2.9.0 |
BaseDistributionByCycle |
Picard CollectBaseDistributionByCycle |
Picard 2.9.0 |
CollectVCMetrics |
Picard CollectVariantCallingMetrics |
Picard 2.9.0 |
ContaminationAssessment |
ContEst |
GATK 3.7/GATK3.8 |
CoverageMetrics |
DepthOfCoverage |
GATK 3.7/GATK3.8 |
GCBias |
Picard CollectGcBiasMetrics |
Picard 2.9.0 |
HsMetricAlgo |
Picard CollectHsMetrics |
Picard 2.9.0 |
InsertSizeMetricAlgo |
Picard CollectInsertSizeMetrics |
Picard 2.9.0 |
MeanQualityByCycle |
Picard MeanQualityByCycle |
Picard 2.9.0 |
QualDistribution |
Picard QualityScoreDistribution |
Picard 2.9.0 |
QualityYield |
Picard CollectQualityYieldMetrics |
Picard 2.9.0 |
SequenceArtifactMetricsAlgo |
Picard CollectSequencingArtifactMetrics |
Picard 2.9.0 |
WgsMetricsAlgo |
Picard CollectWgsMetrics |
Picard 2.9.0 |
ContaminationModel |
GetPileupSummaries and CalculateContamination |
GATK 4.2.0.0 |
OrientationBias |
Mutect2 and LearnReadOrientationModel |
GATK 4.2.0.0 |
TNfilter |
FilterMutectCalls |
GATK 4.2.0.0 |
Detailed description per stage¶
Map to Reference - Alignment¶
GATK Best Practices command line
bwa mem -M -R '@RG\tID:GROUP_NAME \tSM:SAMPLE_NAME \tPL:PLATFORM' -p \
-t NUMBER_THREADS REFERENCE.FASTA SAMPLE.FQ > ALIGNED.SAM
java -jar picard.jar SortSam INPUT=ALIGNED.SAM \
OUTPUT=SORTED.SAM SORT_ORDER=coordinate
samtools view -bS SORTED.SAM > SORTED.BAM
samtools index SORTED.BAM
Sentieon® command line
sentieon bwa mem -M -R '@RG\tID:GROUP_NAME \tSM:SAMPLE_NAME \tPL:PLATFORM' -p \
-t NUMBER_THREADS REFERENCE.FASTA SAMPLE.FQ | sentieon util sort \
-o SORTED.BAM -t NUMBER_THREADS --sam2bam -i -
The BWA alignment command is identical, except that we recommend that the results from BWA be piped to the sorting stage in Sentieon®, instead of outputting to a SAM file.
The sorting using Sentieon® can only be ordered by coordinate.
Sentieon® will automatically create an index file for the sorted bam file.
Mark Duplicates - Dedup¶
GATK Best Practices command line
java -jar picard.jar MarkDuplicates INPUT=SORTED.BAM \
OUTPUT=DEDUP.BAM METRICS_FILE=DEDUP_METRICS.TXT \
REMOVE_DUPLICATES=true
java -jar picard.jar BuildBamIndex INPUT=DEDUP.BAM
Sentieon® command line
sentieon driver -t NUMBER_THREADS -i SORTED.BAM --algo LocusCollector\
--fun score_info SCORE.TXT.GZ
sentieon driver -t NUMBER_THREADS -i SORTED.BAM --algo Dedup --rmdup \
--score_info SCORE.TXT.GZ --metrics DEDUP_METRICS.TXT DEDUP.BAM
The last argument of the Sentieon® command line is the output bam file. Sentieon® will automatically create an index file for the deduped bam file.
Picard option |
Sentieon option |
Meaning |
---|---|---|
INPUT=SORTED.BAM |
-i SORTED.BAM |
Input the bam file |
OUTPUT=DEDUP.BAM |
N/A |
Output bam file |
METRICS_FILE=METRICS.TXT |
--metrics METRICS.TXT |
Output metrics |
REMOVE_DUPLICATES=true |
--rmdup |
Remove duplicates from bam |
OPTICAL_DUPLICATE_PIXEL_DISTANCE=DISTANCE |
--optical_dup_pix_dist DISTANCE |
Optical duplicate distance |
Realign Indels - Realignment¶
GATK3 Best Practices command line
java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator \
-R REFERENCE.FASTA -I DEDUP.BAM -L INTERVAL \
-known KNOWN_SITES.VCF -o REALIGNEMNT_TARGETS.LIST
java -jar GenomeAnalysisTK.jar -T IndelRealigner \
-R REFERENCE.FASTA -I DEDUP.BAM \
-targetIntervals REALIGNEMNT_TARGETS.LIST \
-known KNOWN_SITES.VCF -o REALIGNED.BAM
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i DEDUP.BAM \
--algo Realigner -k KNOWN_SITES.VCF --interval_list INTERVAL \
REALIGNED.BAM
The last argument of the Sentieon® command line is the output bam file.
GATK option |
Sentieon option |
Meaning |
---|---|---|
-I DEDUP.BAM |
-i DEDUP.BAM |
Input the bam file |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-o REALIGNED.BAM |
N/A |
Output bam file |
-known KNOWN_SITES.VCF |
-k KNOWN_SITES.VCF |
Known sites |
-L INTERVAL |
--interval_list INTERVAL |
Interval to restrict calculation |
Recalibrate Bases - BQSR¶
BQSR - calculate recalibration¶
GATK3 Best Practices command line to generate the recalibration table
java -jar GenomeAnalysisTK.jar -T BaseRecalibrator \
-R REFERENCE.FASTA -I REALIGNED.BAM -L INTERVAL \
-knownSites KNOWN_SITES.VCF -o RECAL_DATA.TABLE
GATK4 Best Practices command line to generate the recalibration table
gatk BaseRecalibrator \
-R REFERENCE.FASTA -I REALIGNED.BAM -L INTERVAL \
--enable-baq --known-sites KNOWN_SITES.VCF \
-O RECAL_DATA.TABLE
Sentieon command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i REALIGNED.BAM\
--interval INTERVAL \
--algo QualCal -k KNOWN_SITES.VCF RECAL_DATA.TABLE
The last argument of the Sentieon® command line is the recalibrated data table.
GATK4 removed the calculation of Per-Base Alignmnet Qualities (BAQ) to help
reduce the runtime of BaseRecalibrator,
https://github.com/broadinstitute/gatk/issues/2060. Adding the argument
--enable-baq
turns on BAQ calculation in the GATK4, which matches the
behavior of Sentieon® QualCal algorithm.
GATK3 option |
Sentieon option |
Meaning |
---|---|---|
-I REALIGNED.BAM |
-i REALIGNED.BAM |
Input the bam file |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-o RECAL_DATA.TABLE |
N/A |
Output file |
-knownSites KNOWN_SITES.VCF |
-k KNOWN_SITES.VCF |
Known sites |
-L INTERVAL |
--interval INTERVAL |
Interval to restrict calculation |
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
-I REALIGNED.BAM |
-i REALIGNED.BAM |
Input the bam file |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-O RECAL_DATA.TABLE |
N/A |
Output file |
--known-sites KNOWN_SITES.VCF |
-k KNOWN_SITES.VCF |
Known sites |
-L INTERVAL |
--interval INTERVAL |
Interval to restrict calculation |
BQSR - apply recalibration¶
GATK3 Best Practices command line to apply recalibration
java -jar GenomeAnalysisTK.jar -T PrintReads \
-R REFERENCE.FASTA -I REALIGNED.BAM -L INTERVAL \
-BQSR RECAL_DATA.TABLE -o RECALED.BAM
GATK4 Best Practices command line to apply recalibration
gatk ApplyBQSR \
-R REFERENCE.FASTA -I REALIGNED.BAM -L INTERVAL \
-bqsr RECAL_DATA.TABLE -O RECALED.BAM
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i REALIGNED.BAM\
-q RECAL_DATA.TABLE --interval INTERVAL \
--algo QualCal -k KNOWN_SITES RECAL_DATA.TABLE.POST \
--algo ReadWriter RECALED.BAM
The last argument of the Sentieon® command line is the output bam file.
The Sentieon® ReadWriter command can be run together either with the step generating the RECAL_DATA.TABLE.POST above, or with the variant calling step to speed up the pipeline.
GATK4 removed the base quality score recalibration of INDELs when using default settings,
so the GATK4 Best Practices command line shown above will not produce the same results
as those produced by Sentieon®; in particular, the Sentieon® BAM output will contain BI/BD
tags from the INDEL recalibration that will be missing from the GATK4 BAM output. This
removal of INDEL recalibration was done to reduce GATK runtime at the expense of accuracy,
so it is not recommend when using Sentieon® as the speed improvement is neglegible. In order
to fully match the results of GATK4 ApplyBQSR, it is possible to use the --read_filter QualCalFilter
option instead of the -q RECAL_DATA.TABLE
in the ReadWriter command, which allows
skipping the INDEL recalibration:
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i REALIGNED.BAM\
--read_filter QualCalFilter,table=RECAL_DATA.TABLE.table,indel=false \
--interval INTERVAL \
--algo ReadWriter RECALED.BAM
GATK3 option |
Sentieon option |
Meaning |
---|---|---|
-I REALIGNED.BAM |
-i REALIGNED.BAM |
Input the bam file |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-o RECALED.BAM |
N/A |
Output file |
-L INTERVAL |
--interval INTERVAL |
Interval to restrict calculation |
-BQSR RECAL_DATA.TABLE |
-q RECAL_DATA.TABLE |
Recalibration table |
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
-I REALIGNED.BAM |
-i REALIGNED.BAM |
Input the bam file |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-O RECALED.BAM |
N/A |
Output file |
-L INTERVAL |
--interval INTERVAL |
Interval to restrict calculation |
-bqsr RECAL_DATA.TABLE |
-q RECAL_DATA.TABLE |
Recalibration table |
BQSR - plot recalibration¶
GATK3 Best Practices command line to plot BQSR metrics
java -jar GenomeAnalysisTK.jar -T BaseRecalibrator \
-R REFERENCE.FASTA -I REALIGNED.BAM -L INTERVAL \
-knownSites KNOWN_SITES.VCF -BQSR RECAL_DATA.TABLE \
-o RECAL_DATA.TABLE.POST
java -jar GenomeAnalysisTK.jar -T AnalyzeCovariates \
-R REFERENCE.FASTA -before RECAL_DATA.TABLE \
-after RECAL_DATA.TABLE.POST -csv RECAL_RESULT.CSV -plots BQSR.PDF
GATK4 Best Practices command line to plot BQSR metrics
gatk BaseRecalibrator \
-R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
--enable-baq --known-sites KNOWN_SITES.VCF \
-O RECAL_DATA.TABLE.POST
gatk AnalyzeCovariates \
-before RECAL_DATA.TABLE -after RECAL_DATA.TABLE.POST \
-csv RECAL_RESULT.CSV -plots BQSR.PDF
Sentieon® command line
sentieon driver -t NUMBER_THREADS --algo QualCal --plot --before \
RECAL_DATA.TABLE --after RECAL_DATA.TABLE.POST RECAL_RESULT.CSV
sentieon plot QualCal -o BQSR.PDF RECAL_RESULT.CSV
GATK option |
Sentieon option |
Meaning |
---|---|---|
-R REFERENCE.FASTA |
N/A |
Reference file |
-before RECAL_DATA.TABLE |
--before RECAL_DATA.TABLE |
Recalibration table |
-after RECAL_DATA.TABLE |
--after RECAL_DATA.TABLE |
After-recalibration table |
-plots BQSR.PDF |
-o BQSR.PDF |
Report file |
-csv RECAL_RESULT.CSV |
N/A |
Output csv file |
Unified Genotyper - Genotyper¶
GATK3 Best Practices command line
java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper \
-R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
-D DBSNP.VCF --glm [SNP/INDEL/BOTH] -mbq QUALITY \
-stand_emit_conf CONFIDENCE -stand_call_conf CONFIDENCE \
--output_mode [EMIT_VARIANTS_ONLY/EMIT_ALL_CONFIDENT_SITES/EMIT_ALL_SITES] \
-o OUTPUT.VCF
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i RECALED.BAM\
--interval INTERVAL \
--algo Genotyper \
-d DBSNP.VCF --var_type [SNP/INDEL/BOTH] --min_base_qual QUALITY \
--emit_conf CONFIDENCE --call_conf CONFIDENCE \
--emit_mode [VARIANT/CONFIDENT/ALL] \
OUTPUT.VCF
The last argument of the Sentieon® command line is the variant vcf file. The tool will output a compressed VCF file when using .gz extension.
Bear in mind that since GATK 3.7, the stand_emit_conf
is no longer
supported, and the default value for stand_call_conf
has been changed
from 30 to 10, while the default in Sentieon® call_conf
stayed at 30.
GATK option |
Sentieon option |
Meaning |
---|---|---|
-I RECALED.BAM |
-i RECALED.BAM |
Input the bam file |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-D DBSNP.VCF |
-d DBSNP.VCF |
dbSNP file |
--glm [SNP/INDEL/BOTH] |
--var_type [SNP/INDEL/BOTH] |
Variant output type |
-mbq QUALITY |
--min_base_qual QUALITY |
Minimum base quality |
-stand_emit_conf CONFIDENCE |
--emit_conf CONFIDENCE |
Emit confidence threshold |
-stand_call_conf CONFIDENCE |
--call_conf CONFIDENCE |
Call confidence threshold |
--output_mode MODE |
--emit_mode MODE |
Emit mode |
-ploidy PLOIDY |
--ploidy PLOIDY |
Ploidy of the sample |
-o OUTPUT.VCF |
N/A |
Output variant file |
-alleles GIVEN.VCF -gt_mode GENOTYPE_GIVEN_ALLELES |
--given GIVEN.VCF |
Perform variant calling using only the variants provided in the GIVEN_VCF |
HaplotypeCaller - Halotyper¶
GATK3 Best Practices command line - VCF output
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller \
-R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
-D DBSNP.VCF -mbq QUALITY --minPruning FACTOR \
-stand_emit_conf CONFIDENCE -stand_call_conf CONFIDENCE \
-pcrModel [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
--output_mode [EMIT_VARIANTS_ONLY/EMIT_ALL_CONFIDENT_SITES/EMIT_ALL_SITES] \
-o OUTPUT.VCF
GATK4 Best Practices command line - VCF output
gatk HaplotypeCaller \
-R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
-D DBSNP.VCF -mbq QUALITY --min-pruning FACTOR \
-stand-call-conf CONFIDENCE -new-qual false \
--pcr-indel-model [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
--output-mode [EMIT_VARIANTS_ONLY/EMIT_ALL_CONFIDENT_SITES/EMIT_ALL_SITES] \
-O OUTPUT.VCF
Sentieon® command line - VCF output
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i RECALED.BAM\
--interval INTERVAL \
--algo Haplotyper -d DBSNP.VCF \
--min_base_qual QUALITY --prune_factor FACTOR \
--emit_conf CONFIDENCE --call_conf CONFIDENCE \
--emit_mode [VARIANT/CONFIDENT/ALL]
--pcr_indel_model [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
OUTPUT.VCF
GATK3 Best Practices command line - gVCF output
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller \
-R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
-D DBSNP.VCF -mbq QUALITY --minPruning FACTOR \
-stand_emit_conf CONFIDENCE -stand_call_conf CONFIDENCE \
-pcrModel [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
--emitRefConfidence GVCF \
-o OUTPUT.VCF
GATK4 Best Practices command line - gVCF output
gatk HaplotypeCaller \
-R REFERENCE.FASTA -I RECALED.BAM -L INTERVAL \
-D DBSNP.VCF -mbq QUALITY --min-pruning FACTOR \
-stand-call-conf CONFIDENCE \
--pcr-indel-model [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
-ERC GVCF \
-O OUTPUT.VCF
Sentieon® command line - gVCF output
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i RECALED.BAM \
--interval INTERVAL \
--algo Haplotyper -d DBSNP.VCF \
--min_base_qual QUALITY --prune_factor FACTOR \
--emit_conf CONFIDENCE --call_conf CONFIDENCE \
--emit_mode GVCF
--pcr_indel_model [HOSTILE/AGGRESSIVE/CONSERVATIVE/NONE] \
OUTPUT.VCF
The last argument of the Sentieon® command line is the output vcf file. The tool will output a compressed VCF file when using .gz extension.
Bear in mind that since GATK 3.7, the stand_emit_conf
is no longer
supported. Also, the default value for stand_call_conf
was changed from 30
to 10 in the GATK 3.7 to GATK 4.0 and was reverted to 30 in the GATK 4.1,
while the default in Sentieon® call_conf
has remained at 30.
Since the GATK 4.1 -newQual
is default genotyping model.
GATK3 option |
Sentieon option |
Meaning |
---|---|---|
-I RECALED.BAM |
-i RECALED.BAM |
Input the bam file |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-D DBSNP.VCF |
-d DBSNP.VCF |
dbSNP file |
-mbq QUALITY |
--min_base_qual QUALITY |
Minimum base quality |
-stand_emit_conf CONFIDENCE |
--emit_conf CONFIDENCE |
Emit confidence threshold |
-stand_call_conf CONFIDENCE |
--call_conf CONFIDENCE |
Call confidence threshold |
--output_mode MODE |
--emit_mode MODE |
Emit mode |
--emitRefConfidence GVCF |
--emit_mode gvcf |
Produce a g.vcf output |
-ploidy PLOIDY |
--ploidy PLOIDY |
Ploidy of the sample |
-o OUTPUT.VCF |
N/A |
Output variant file |
-alleles GIVEN.VCF -gt_mode GENOTYPE_GIVEN_ALLELES |
--given GIVEN.VCF |
Perform variant calling using only the variants provided in the GIVEN_VCF |
-L INTERVAL |
--interval INTERVAL |
Interval to restrict calculation |
-mmq QUALITY |
--min_map_qual QUALITY |
Minimum mapping quality |
--minPruning FACTOR |
--prune_factor FACTOR |
Pruning factor |
-pcrModel MODEL |
--pcr_indel_model MODEL |
PCR model |
--dontUseSoftClippedBases |
--trim_soft_clip |
Trim off soft-clipped bases |
--annotation ANNOTATION |
--annotation ANNOTATION |
Annotations to apply to the variant calls |
--excludeAnnotation ANNOTATION |
--annotation !ANNOTATION |
Annotations to exclude in the variant calls by using the ‘!’ prefix |
-newQual |
--genotype_model multinomial |
Use the new simplified allele count model |
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
-I RECALED.BAM |
-i RECALED.BAM |
Input the bam file |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-D DBSNP.VCF |
-d DBSNP.VCF |
dbSNP file |
-mbq QUALITY |
--min_base_qual QUALITY |
Minimum base quality |
N/A |
--emit_conf CONFIDENCE |
Emit confidence threshold |
-stand-call-conf CONFIDENCE |
--call_conf CONFIDENCE |
Call confidence threshold |
--output-mode MODE |
--emit_mode MODE |
Emit mode |
-ERC GVCF |
--emit_mode gvcf |
Produce a g.vcf output |
-ploidy PLOIDY |
--ploidy PLOIDY |
Ploidy of the sample |
-O OUTPUT.VCF |
N/A |
Output variant file |
--alleles GIVEN.VCF --genotyping-mode GENOTYPE_GIVEN_ALLELES |
--given GIVEN.VCF |
Perform variant calling using only the variants provided in the GIVEN_VCF |
-L INTERVAL |
--interval INTERVAL |
Interval to restrict calculation |
--minimum-mapping-quality QUALITY |
--min_map_qual QUALITY |
Minimum mapping quality |
--min-pruning FACTOR |
--prune_factor FACTOR |
Pruning factor |
--pcr-indel-model MODEL |
--pcr_indel_model MODEL |
PCR model |
--dont-use-soft-clipped-bases |
--trim_soft_clip |
Trim off soft-clipped bases |
--annotation ANNOTATION |
--annotation ANNOTATION |
Annotations to apply to the variant calls |
--annotations-to-exclude ANNOTATION |
--annotation !ANNOTATION |
Annotations to exclude in the variant calls by using the ‘!’ prefix |
-new-qual |
--genotype_model multinomial |
Use the new simplified allele count model |
Joint Genotype - GVCFtyper¶
GATK3 Best Practices command line
java -jar GenomeAnalysisTK.jar -T GenotypeGVCFs \
-R REFERENCE.FASTA -L INTERVAL \
-D DBSNP.VCF \
-stand_emit_conf CONFIDENCE -stand_call_conf CONFIDENCE \
-V INPUT_GVCF_1 -V INPUT_GVCF_2 -V INPUT_GVCF_3 \
-o OUTPUT.VCF
GATK4 Best Practices command line
gatk GenotypeGVCFs \
-R REFERENCE.FASTA -L INTERVAL \
-D DBSNP.VCF -new-qual false \
-stand-call-conf CONFIDENCE \
-V INPUT_GVCF_1 -V INPUT_GVCF_2 -V INPUT_GVCF_3 \
-O OUTPUT.VCF
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
--interval INTERVAL \
--algo GVCFtyper \
-d DBSNP.VCF \
--emit_conf CONFIDENCE --call_conf CONFIDENCE \
--emit_mode [VARIANT/CONFIDENT/ALL] \
-v INPUT_GVCF_1 -v INPUT_GVCF_2 -v INPUT_GVCF_3 \
OUTPUT.VCF
The last argument of the Sentieon® command line is the output vcf file. The tool will output a compressed VCF file when using .gz extension.
Bear in mind that since GATK 3.7, the stand_emit_conf
is no longer
supported. Also, the default value for stand_call_conf
was changed from 30
to 10 in the GATK 3.7 to GATK 4.0 and was reverted to 30 in the GATK 4.1,
while the default in Sentieon® call_conf
has remained at 30.
Since the GATK 4.1 -newQual
is default genotyping model.
GATK3 option |
Sentieon option |
Meaning |
---|---|---|
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-L INTERVAL |
--interval INTERVAL |
Interval to restrict calculation |
-D DBSNP.VCF |
-d DBSNP.VCF |
dbSNP file |
-stand_emit_conf CONFIDENCE |
--emit_conf CONFIDENCE |
Emit confidence threshold |
-stand_call_conf CONFIDENCE |
--call_conf CONFIDENCE |
Call confidence threshold |
N/A |
--emit_mode MODE |
Emit mode |
-V INPUT_GVCF_X |
-v INPUT_GVCF_X |
g.vcf input files |
-o OUTPUT.VCF |
N/A |
Output variant file |
-newQual |
--genotype_model multinomial |
Use the new simplified allele count model |
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-L INTERVAL |
--interval INTERVAL |
Interval to restrict calculation |
-D DBSNP.VCF |
-d DBSNP.VCF |
dbSNP file |
N/A |
--emit_conf CONFIDENCE |
Emit confidence threshold |
-stand-call-conf CONFIDENCE |
--call_conf CONFIDENCE |
Call confidence threshold |
N/A |
--emit_mode MODE |
Emit mode |
-V INPUT_GVCF_X |
-v INPUT_GVCF_X |
g.vcf input files |
-O OUTPUT.VCF |
N/A |
Output variant file |
-new-qual |
--genotype_model multinomial |
Use the new simplified allele count model |
Filter Variants - VQSR¶
VQSR - calculate recalibration¶
GATK3 Best Practices command line
java -jar GenomeAnalysisTK.jar -T VariantRecalibrator \
-R REFERENCE.FASTA -input INPUT.VCF \
-an ANNOTATION_1 -an ANNOTATION_2 … \
-mode [SNP/INDEL] \
--resource:RESOURCE_PARAM RESOURCE.VCF …\
-tranche TRANCH_THRES -tranche TRANCH_THRES …\
--maxGaussians MAX_GAUSS --maxNegativeGaussians MAX_GAUSS \
--maxIterations MAX_ITERATIONS \
--aggregate AGREGATE_VCF \
-tranchesFile TRANCHES_FILE \
-rscriptFile R_PLOT_FILE \
-recalFile RECAL_FILE
GATK4 Best Practices command line
gatk VariantRecalibrator \
-R REFERENCE.FASTA -V INPUT.VCF \
-an ANNOTATION_1 -an ANNOTATION_2 … \
-mode [SNP/INDEL] \
--resource:RESOURCE_PARAM RESOURCE.VCF …\
-tranche TRANCH_THRES -tranche TRANCH_THRES …\
--max-gaussians MAX_GAUSS --max-negative-gaussians MAX_GAUSS \
--max-iterations MAX_ITERATIONS \
--aggregate AGREGATE_VCF \
--tranches-file TRANCHES_FILE \
--rscript-file R_PLOT_FILE \
-O RECAL_FILE
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
--algo VarCal -v INPUT.VCF \
--annotation ANNOTATION_1 --annotation ANNOTATION_2 …\
--var_type [SNP/INDEL] \
--resource RESOURCE.VCF --resource_param RESOURCE_PARAM …\
--tranche TRANCH_THRES --tranche TRANCH_THRES …\
--max_gaussian MAX_GAUSS --max_neg_gaussian MAX_GAUSS \
--max_iter MAX_ITERATIONS \
--nthr NUMBER_THREADS_EM --srand RANDOM_SEED \
--aggregate_data AGREGATE_VCF \
--tranches_file TRANCHES_FILE \
--plot_file PLOT_FILE \
RECAL_FILE
The last argument of the Sentieon® command line is the output recal file.
The resource argument in Sentieon® is split into 2 consecutive arguments, one with the resource file and one with the resource parameters.
GATK3 option |
Sentieon option |
Meaning |
---|---|---|
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-input INPUT.VCF |
-v INPUT.VCF |
vcf input file |
-an ANNONTATION |
--annotation ANNOTATION |
Annotation to use |
-mode [SNP/INDEL] |
--var_type [SNP/INDEL] |
Mode to use |
--resource |
--resource/--resource_param |
Resources to use |
-tranche TRANCH_THRES |
--tranche TRANCH_THRES |
Thresholds for tranches |
--maxGaussians MAX_GAUSS |
--max_gaussians MAX_GAUSS |
Max number of Gaussians used for positive model |
--maxNegativeGaussians MAX_GAUSS |
--max_neg_gaussians MAX_GAUSS |
Max number of Gaussians used for negative model |
--maxIterations MAX_ITERATIONS |
--max_iter MAX_ITERATIONS |
Max number of iterations |
N/A |
--srand RANDOM_SEED |
Random seed for the EM calculation |
--aggregate AGREGATE_VCF |
--aggregate_data AGREGATE_VCF |
Input aggregate data |
-tranchesFile TRANCHES_FILE |
--tranches_file TRANCHES_FILE |
Output tranches file |
-rscriptFile R_PLOT_FILE |
--plot_file PLOT_FILE |
Output file for plotting |
-recalFile RECAL_FILE |
N/A |
Output recalibration file |
-MQCap NUMBER |
--max_mq NUMBER |
Maximum MQ in the data |
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
--variant INPUT.VCF |
-v INPUT.VCF |
vcf input file |
-an ANNONTATION |
--annotation ANNOTATION |
Annotation to use |
-mode [SNP/INDEL] |
--var_type [SNP/INDEL] |
Mode to use |
--resource |
--resource/--resource_param |
Resources to use |
-tranche TRANCH_THRES |
--tranche TRANCH_THRES |
Thresholds for tranches |
--max-gaussians MAX_GAUSS |
--max_gaussians MAX_GAUSS |
Max number of Gaussians used for positive model |
--max-negative-gaussians MAX_GAUSS |
--max_neg_gaussians MAX_GAUSS |
Max number of Gaussians used for negative model |
--max-iterations MAX_ITERATIONS |
--max_iter MAX_ITERATIONS |
Max number of iterations |
N/A |
--srand RANDOM_SEED |
Random seed for the EM calculation |
--aggregate AGREGATE_VCF |
--aggregate_data AGREGATE_VCF |
Input aggregate data |
--tranches-file TRANCHES_FILE |
--tranches_file TRANCHES_FILE |
Output tranches file |
--rscript-file R_PLOT_FILE |
--plot_file PLOT_FILE |
Output file for plotting |
-O RECAL_FILE |
N/A |
Output recalibration file |
-mq-cap NUMBER |
--max_mq NUMBER |
Maximum MQ in the data |
VQSR - apply recalibration¶
GATK3 Best Practices command line
java -jar GenomeAnalysisTK.jar -T ApplyRecalibration \
-R REFERENCE.FASTA -input INPUT.VCF \
-mode [SNP/INDEL] --ts_filter_level SENSITIVITY \
-tranchesFile TRANCHES_FILE -recalFile RECAL_FILE \
-o OUTPUT.VCF
GATK4 Best Practices command line
gatk ApplyVQSR \
-R REFERENCE.FASTA -V INPUT.VCF \
-mode [SNP/INDEL] -ts-filter-level SENSITIVITY \
--tranches-file TRANCHES_FILE --recal-file RECAL_FILE \
-O OUTPUT.VCF
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
--algo ApplyVarCal -v INPUT.VCF \
--var_type [SNP/INDEL] --sensitivity SENSITIVITY \
--tranches_file TRANCHES_FILE --recal RECAL_FILE \
OUTPUT.VCF
The last argument of the Sentieon® command line is the output vcf file. The tool will output a compressed VCF file when using .gz extension.
GATK option |
Sentieon option |
Meaning |
---|---|---|
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-input INPUT.VCF |
-v INPUT.VCF |
vcf input file |
--ts_filter_level SENSITIVITY |
--sensitivity SENSITIVITY |
Sensitivity |
-mode [SNP/INDEL] |
--var_type [SNP/INDEL] |
Mode to use |
-tranchesFile TRANCHES_FILE |
--tranches_file TRANCHES_FILE |
Input tranches file |
-recalFile RECAL_FILE |
--recal RECAL_FILE |
Input recalibration file |
-o OUTPUT.VCF |
N/A |
Output variant file |
GATK option |
Sentieon option |
Meaning |
---|---|---|
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-V INPUT.VCF |
-v INPUT.VCF |
vcf input file |
-ts-filter-level SENSITIVITY |
--sensitivity SENSITIVITY |
Sensitivity |
-mode [SNP/INDEL] |
--var_type [SNP/INDEL] |
Mode to use |
--tranches-file TRANCHES_FILE |
--tranches_file TRANCHES_FILE |
Input tranches file |
--recal-file RECAL_FILE |
--recal RECAL_FILE |
Input recalibration file |
-O OUTPUT.VCF |
N/A |
Output variant file |
MuTect - TNsnv¶
MuTect Best Practices command line
java -jar mutect.jar -T MuTect \
-R REFERENCE.FASTA -L INTERVAL \
-I:normal NORMAL_RECALED.BAM -I:tumor TUMOR_RECALED.BAM \
--dbsnp DBSNP.VCF -o CALL_STATS_OUTPUT.TXT -vcf OUTPUT.VCF
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i NORMAL_RECALED.BAM -i TUMOR_RECALED.BAM \
--interval INTERVAL \
--algo TNsnv --dbsnp DBSNP.VCF \
--tumor_sample TUMOR_SM --normal_sample NORMAL_SM \
-call_stats_out CALL_STATS_OUTPUT.TXT OUTPUT.VCF
The last argument of the Sentieon® command line is the output vcf file. The tool will output a compressed VCF file when using .gz extension.
The normal_sample
and tumor_sample
arguments are required in Sentieon
as the BAM files are not explicitly differentiated, and could be a
single co-realigned BAM file.
GATK option |
Sentieon option |
Meaning |
---|---|---|
N/A |
-i COREALIGNED.BAM |
Input the corealigned bam file |
-I:normal NORMAL_RECALED.BAM |
-i NORMAL_RECALED.BAM |
Input the bam files |
-I:tumor TUMOR_RECALED.BAM |
-i TUMOR_RECALED.BAM |
Input the bam files |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
--normal_sample_name NORMAL_SAMPLE |
--normal_sample NORMAL_SAMPLE |
Input normal sample name |
--tumor_sample_name TUMOR_SAMPLE |
--tumor_sample TUMOR_SAMPLE |
Input tumor sample name |
--dbsnp DBSNP.VCF |
--dbsnp DBSNP.VCF |
dbSNP file |
--cosmic COSMIC.VCF |
--cosmic COSMIC.VCF |
Input cosmic VCF file |
--normal_panel PON.VCF |
--pon PON.VCF |
Input panel-of-normal VCF file |
--artifact_detection_mode |
--detect_pon |
Turn on mode to detect artifacts in normal sample, used to generate the panel-of-normal |
-vcf OUTPUT.VCF |
N/A |
Output tumor variants file |
-o CALL_STATS.OUT |
--call_stats_out CALL_STATS.OUT |
Output call statistics file |
--coverage_file COVERAGE_FILE |
--stdcov_out COVERAGE_FILE |
Output standard coverage wiggle file |
--tumor_depth_file FILE |
--tumor_depth_out FILE |
Output wiggle file of depth of tumor reads |
--normal_depth_file FILE |
--normal_depth_out FILE |
Output wiggle file of depth of normal reads |
--power_file FILE |
--power_out FILE |
Output power file |
--min_qscore QUALITY |
--min_base_qual QUALITY |
Filtering quality of the bases used in variant calling |
--initial_tumor_lod NUMBER |
--min_init_tumor_lod NUMBER |
Minimum tumor log odds in the initial pass calling variants |
--tumor_lod NUMBER |
--min_tumor_lod NUMBER |
Minimum tumor log odds in the final call of variants |
--normal_lod NUMBER |
--min_normal_lod NUMBER |
Minimum normal log odds used to check that the tumor variant is not a normal variant |
--fraction_contamination NUMBER |
--contamination_frac NUMBER |
Estimation of the contamination fraction from other samples |
--minimum_mutation_cell_fraction NUMBER |
--min_cell_mutation_frac NUMBER |
Minimum fraction of cells which have mutation |
--strand_artifact_lod NUMBER |
--min_strand_bias_lod NUMBER |
Minimum log odds for calling strand bias |
--strand_artifact_power_threshold NUMBER |
--min_strand_bias_power NUMBER |
Minimum power for calling strand bias |
--dbsnp_normal_lod NUMBER |
--min_dbsnp_normal_lod NUMBER |
Minimum log odds for calling normal non-variant at dbsnp sites |
--minimum_normal_allele_fraction NUMBER |
--min_normal_allele_frac NUMBER |
Minimum allele fraction to be considered in normal |
--tumor_f_pretest NUMBER |
--min_tumor_allele_frac NUMBER |
Minimum allelic fraction in tumor sample |
--gap_events_threshold NUMBER |
--max_indel NUMBER |
Maximum of nearby indel events that are allowed |
--heavily_clipped_read_fraction NUMBER |
--max_read_clip_frac NUMBER |
Maximum fraction of soft/hard clipped bases in a read |
--fraction_mapq0_threshold NUMBER |
--max_mapq0_frac NUMBER |
Maximum ratio of reads whose mapq are 0 used to determine poor mapped area |
--pir_median_threshold NUMBER |
--min_pir_median NUMBER |
Minimum read position median |
--pir_mad_threshold NUMBER |
--min_pir_mad NUMBER |
Minimum read position median absolute deviation |
--required_maximum_alt_all-ele_mapping_quality_score NUMBER |
--max_alt_mapq NUMBER |
Required maximum value of alt allele mapping quality score |
--max_alt_alleles_in_normal_count NUMBER |
--max_normal_alt_cnt NUMBER |
Maximum alt alleles count in normal pileup |
--max_alt_alleles_in_normal_qscore_sum NUMBER |
--max_normal_alt_qsum NUMBER |
Maximum quality score sum of alt allele in normal pileup |
--max_alt_allele_in_normal_fraction NUMBER |
--max_normal_alt_frac NUMBER |
Maximum fraction of alt allele in normal pileup |
--power_constant_af NUMBER |
--power_allele_frac NUMBER |
Allele fraction used in power calculations |
MuTect2 - TNhaplotyper¶
GATK3 MuTect2 Best Practices command line
java -jar GenomeAnalysisTK.jar -T MuTect2 \
-R REFERENCE.FASTA -L INTERVAL \
-I:normal NORMAL_RECALED.BAM -I:tumor TUMOR_RECALED.BAM \
-D DBSNP.VCF -o OUTPUT.VCF
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i NORMAL_RECALED.BAM -i TUMOR_RECALED.BAM \
--interval INTERVAL \
--algo TNhaplotyper --dbsnp DBSNP.VCF \
--tumor_sample TUMOR_SM --normal_sample NORMAL_SM \
OUTPUT.VCF
The last argument of the Sentieon® command line is the output vcf file. The tool will output a compressed VCF file when using .gz extension.
The normal_sample
and tumor_sample
arguments are required in Sentieon
as the BAM files are not explicitly differentiated, and could be a
single co-realigned BAM file.
GATK option |
Sentieon option |
Meaning |
---|---|---|
N/A |
-i COREALIGNED.BAM |
Input the corealigned bam file |
-I:normal NORMAL_RECALED.BAM |
-i NORMAL_RECALED.BAM |
Input the bam files |
-I:tumor TUMOR_RECALED.BAM |
-i TUMOR_RECALED.BAM |
Input the bam files |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
N/A |
--normal_sample NORMAL_SAMPLE |
Input normal sample name |
N/A |
--tumor_sample TUMOR_SAMPLE |
Input tumor sample name |
-D DBSNP.VCF |
--dbsnp DBSNP.VCF |
dbSNP file |
--cosmic COSMIC.VCF |
--cosmic COSMIC.VCF |
Input cosmic VCF file |
--normal_panel PON.VCF |
--pon PON.VCF |
Input panel-of-normal VCF file |
--artifact_detection_mode |
--detect_pon |
Turn on mode to detect artifacts in normal sample. It is used to generate the panel-of-normals |
-o OUTPUT.VCF |
N/A |
Output tumor variants file. |
-mbq QUALITY |
--min_base_qual QUALITY |
Filtering quality of the bases used in variant calling |
--minPruning FACTOR |
--prune_factor FACTOR |
Pruning factor |
-pcrModel MODEL |
--pcr_indel_model MODEL |
PCR model |
--initial_tumor_lod NUMBER |
--min_init_tumor_lod NUMBER |
Minimum tumor log odds in the initial pass calling variants |
--initial_normal_lod NUMBER |
--min_init_normal_lod NUMBER |
Minimum normal log odds in the initial pass calling variants |
--tumor_lod NUMBER |
--min_tumor_lod NUMBER |
Minimum tumor log odds in the final call of variants |
--normal_lod NUMBER |
--min_normal_lod NUMBER |
Minimum normal log odds used to check that the tumor variant is not a normal variant |
--max_alt_alleles_in_normal_count NUMBER |
--max_normal_alt_cnt NUMBER |
Maximum alt alleles count in normal pileup |
--max_alt_alleles_in_normal_qscore_sum NUMBER |
--max_normal_alt_qsum NUMBER |
Maximum quality score sum of alt allele in normal pileup |
--max_alt_allele_in_normal_fraction NUMBER |
--max_normal_alt_frac NUMBER |
Maximum fraction of alt allele in normal pileup |
-contaminationFile TAB_FILE |
--tumor_contamination_frac NUMBER |
Estimation of the contamination fraction from other samples on the tumor sample |
--normal_contamination_frac NUMBER |
Estimation of the contamination fraction from other samples on the normal sample |
GATK4 Mutect2 - TNhaplotyper2 and TNfilter¶
GATK4 Mutect2 Best Practies command line
gatk Mutect2 -R REFERENCE.FASTA -I TUMOR_RECALED.BAM \
-tumor TUMOR_SM -I NORMAL_RECALED.BAM -normal NORMAL_SM \
--germline-resource GNOMAD.VCF -O TMP.VCF \
--f1r2-tar-gz F1R2.TAR.GZ -L INTERVAL
gatk GetPileupSummaries -R REFERENCE.FASTA -I TUMOR_RECALED.BAM \
-V GNOMAD.VCF -O TUMOR.PILEUPS -L INTERVAL
gatk GetPileupSummaries -R REFERENCE.FASTA -I NORMAL_RECALED.BAM \
-V GNOMAD.VCF -O NORMAL.PILEUPS -L INTERVAL
gatk LearnReadOrientationModel -I F1R2.TAR.GZ -O PRIORS
gatk CalculateContamination -I TUMOR.PILEUPS \
-matched NORMAL.PILEUPS --tumor-segmentation SEGMENTS \
-O CONTAMINATION.TABLE
gatk FilterMutectCalls -V TMP.VCF -R REFERENCE.FASTA \
-O OUTPUT.VCF --contamination-table CONTAMINATION.TABLE \
--tumor-segmentation SEGMENTS -ob-priors PRIORS \
--stats TMP.VCF.stats --filtering-stats OUTPUT.VCF.stats
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i TUMOR_RECALED.BAM -i NORMAL_RECALED.BAM \
--interval INTERVAL \
--algo TNhaplotyper2 \
--tumor_sample TUMOR_SM --normal_sample NORMAL_SM \
--germline_vcf GNOMAD.VCF TMP.VCF \
--algo OrientationBias --tumor_sample TUMOR_SM PRIORS \
--algo ContaminationModel \
--tumor_sample TUMOR_SM --normal_sample NORMAL_SM \
-v GNOMAD.VCF --tumor_segments SEGMENTS CONTAMINATION.TABLE
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
--algo TNfilter -v TMP.VCF --tumor_sample TUMOR_SM \
--normal_sample NORMAL_SM \
--contamination CONTAMINATION.TABLE
--tumor_segments SEGMENTS --orientation_priors PRIORS \
OUTPUT.VCF
Some VCFs split multi-allelic sites into separate records. Mutect2 will ignore later
split multi-allelic records in VCFs passed through the --germline-resource
argument
while TNhaplotyper2 will process all multi-allelic records in VCFs passed through
the --germline_vcf
argument. This can cause different results when the GNOMAD.VCF
has multi-allelic sites split into separate records.
The above commands use the same VCF file, GNOMAD.VCF
, for both Mutect2/TNhaplotyper2
and GetPileupSummaries/ContaminationModel. The GATK best practices typically uses
a GnomAD VCF for Mutect2 but uses the same GnomAD VCF filtered for variants with AF > 0.01 || AF < 0.2
in GetPileupSummaries. Using the same VCF with --min_af 0.01 --max_af 0.2
(default)
has the same effect.
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
-I TUMOR_RECALED.BAM |
-i TUMOR_RECALED.BAM |
Input the bam files |
-I NORMAL_RECALED.BAM |
-i NORMAL_RECALED.BAM |
Input the bam files |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference fasta |
-L INTERVAL |
--interval INTERVAL |
Interval to restrict calculation |
-tumor TUMOR_SM |
--tumor_sample TUMOR_SM |
Input tumor sample name |
-normal NORMAL_SM |
--normal_sample NORMAL_SM |
Input normal sample name |
-pon PON_FILE |
--pon PON_FILE |
A panel-of-normal file |
-mbq MIN_BQ |
--min_base_qual MIN_BQ |
Minimum base quality |
--min-pruning PRUNE |
--prune_factor PRUNE |
Pruning factor in local assembly |
--pcr-indel-model INDEL_MODEL |
--pcr_indel_model INDEL_MODEL |
PCR indel error model |
-init-lod INIT_T_LOD |
--min_init_tumor_lod INIT_T_LOD |
Minimum tumorLOD for candidate selection |
-emit-lod T_LOD |
--min_tumor_lod T_LOD |
Minimum tumorLOD for called variants |
--normal-lod N_LOD |
--min_normal_lod N_LOD |
Minimum normalLOD for called variants |
--germline-resource GERMLINE.VCF |
--germline_vcf GERMLINE.VCF |
A germline VCF containing allele frequencies |
--af-of-alleles-not-in-resource AF |
--default_af AF |
Allele frequency for variants not found in the germline VCF |
--max-population-af MAX_AF |
--max_germline_af MAX_AF |
Maximum germline allele frequency in tumor-only mode |
--genotype-pon-sites true |
--call_pon_sites |
Call candidate variants in the PoN |
--callable-depth |
--callable_depth |
Minimum depth to be considered for statistics |
Arguments in the OrientationBias algo map to arguments in both Mutect2 and LearnReadOrientationModel.
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
-I TUMOR_RECALED.BAM |
-i TUMOR_RECALED.BAM |
Input the bam files |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference fasta |
-tumor TUMOR_SM |
--tumor_sample TUMOR_SM |
Input tumor sample name |
--f1r2-min-bq MIN_BQ |
--min_base_qual MIN_BQ |
Minimum base quality |
--f1r2-median-mq MIN_MAPQ |
--min_median_map_qual MIN_MAPQ |
Minimum median mapping quality |
--f1r2-max-depth MAX_DEPTH |
--max_depth MAX_DEPTH |
Sites with a higher depth will be grouped |
Arguments in the ContaminationModel algo map to arguments in both GetPileupSummaries and CalculateContamination.
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
--input TUMOR_RECALED.BAM |
-i TUMOR_RECALED.BAM |
Input the bam files |
--reference REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference fasta |
--intervals INTERVAL |
--interval INTERVAL |
Interval to restrict calculation |
N/A |
--tumor_sample TUMOR_SM |
Input tumor sample name |
N/A |
--normal_sample NORMAL_SM |
Input normal sample name |
--min-mapping-quality MIN_MAPQ |
--min_map_qual MIN_MAPQ |
Minimum mapping quality |
--variant GNOMAD.VCF |
-v GNOMAD.VCF |
A VCF with population allele frequencies |
--minimum-population-allele-frequency MIN_AF |
--min_af MIN_AF |
Minimum population allele frequency |
--maximum-population-allele-frequency MIN_AF |
--max_af MAX_AF |
Maximum population allele frequency |
--tumor-segmentation SEGMENTS |
--tumor_segments SEGMENTS |
Allele frequency segmentations output |
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
-V TMP.VCF |
-v TMP.VCF |
The input VCF |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference fasta |
N/A |
--tumor_sample TUMOR_SM |
Input tumor sample name |
N/A |
--normal_sample NORMAL_SM |
Input normal sample name |
--contamination-table CONTAMINATION.TABLE |
--contamination CONTAMINATION.TABLE |
The contamination table file |
--tumor-segmentation SEGMENTS |
--tumor_segments SEGMENTS |
The tumor segmentation table |
--ob-priors PRIORS |
--orientation_priors PRIORS |
The orientation prior table |
--threshold-strategy STRATEGY |
--threshold_strategy STRATEGY |
Method to determine the filtering threshold |
--f-score-beta BETA |
--f_score_beta BETA |
Relative weight of recall to precision in the F-score |
--false-discovery-rate MAX_FP |
--max_fp_rate MAX_FP |
Maximum expected false-positive rate |
--initial-threshold |
--threshold THRESHOLD |
Threshold for the constant strategy |
--min-median-base-quality MIN_BQ |
--min_median_base_qual MIN_BQ |
Minimum median base quality |
--max-events-in-region MAX_COUNT |
--max_event_count MAX_COUNT |
Maximum number of events in an active region |
--unique-alt-read-count MIN_READS |
--unique_alt_reads MIN_READS |
Minimum number of unique reads supporting the alt allele |
--max-median-fragment-length-difference MAX_MFRL |
--max_mfrl_diff MAX_MFRL |
Maximum median fragment length difference |
--distance-on-haplotype MAX_DIST |
--max_haplotype_distance MAX_DIST |
Maximum distance to determine an artifact |
--min-allele-fraction MIN_AF |
--min_tumor_af MIN_AF |
Minimum alternate allele fraction in the tumor sample |
--min-median-mapping-quality MIN_MAPQ |
--min_median_map_qual MIN_MAPQ |
Minimum median mapping quality |
--long-indel-length MAX_LEN |
--long_indel_length MAX_LEN |
Longer indels will use the reference mapping quality |
--max-alt-allele-count MAX_ALT |
--max_alt_count MAX_ALT |
Maximum number of alternate alleles at a site |
--max-n-ratio MAX_N |
--max_n_ratio MAX_N |
Maximum ratio of N to alt bases |
--normal-p-value-threshold P_VALUE |
--normal_p_value P_VALUE |
P-value theshold for normal artifacts |
--min-median-read-position MIN_DIST |
--min_median_pos MIN_DIST |
Minimum median distanct to the end of the read |
--min-slippage-length MIN_SLIPPAGE |
--min_slippage_length MIN_SLIPPAGE |
Minimum length for polymerase slippage in STR regions |
--pcr-slippage-rate SLIPPAGE_RATE |
--slippage_rate SLIPPAGE_RATE |
The rate of PCR slippage |
--min-reads-per-strand MIN_ALT |
--min_alt_reads_per_strand MIN_ALT |
Number of reads supporting the alt allele per strand |
SplitNCigarReads - RNASplitReadsAtJunction¶
GATK3 SplitNCigarReads Best Practices command line
java -jar GenomeAnalysisTK.jar -T SplitNCigarReads \
-R REFERENCE.FASTA -I DEDUPED.BAM -o SPLIT.BAM \
-rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 \
-U ALLOW_N_CIGAR_READS
GATK4 SplitNCigarReads Best Practices command line
gatk SplitNCigarReads \
-R REFERENCE.FASTA -I DEDUPED.BAM -O SPLIT.BAM
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i DEDUPED.BAM --algo RNASplitReadsAtJunction \
--reassign_mapq 255:60 SPLIT.BAM
The last argument of the Sentieon® command line is the output bam file.
GATK3 option |
Sentieon option |
Meaning |
---|---|---|
-I DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference fasta |
-rf ReassignOneMappingQuality -RMQF IN_QUAL -RMQT OUT_QUAL |
--reassign_mapq IN_QUAL:OUT_QUAL |
Reassign Mapping Quality from STAR |
--doNotFixOverhangs |
--ignore_overhang |
Whether to ignore overhang |
--maxBasesInOverhang NUMBER |
--overhang_max_bases NUMBER |
Max number of bases allowed in a hard-clipped overhang. Overhang will not be clipped if there are more than this value of bases |
--maxMismatchesInOverhang NUMBER |
--overhang_max_mismatches NUMBER |
Max number of mismatches allowed in a non-hard-clipped overhang. Complete overhang will be hard-clipped if # of mismatches is above this value |
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
-I DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference fasta |
N/A |
--reassign_mapq IN_QUAL:OUT_QUAL |
Reassign Mapping Quality from STAR |
--do-not-fix-overhangs |
--ignore_overhang |
Whether to ignore overhang |
--max-bases-in-overhang NUMBER |
--overhang_max_bases NUMBER |
Max number of bases allowed in a hard-clipped overhang. Overhang will not be clipped if there are more than this value of bases |
--max-mismatches-in-overhang NUMBER |
--overhang_max_mismatches NUMBER |
Max number of mismatches allowed in a non-hard-clipped overhang. Complete overhang will be hard-clipped if # of mismatches is above this value |
CollectAlignmentSummaryMetrics - AlignmentStat¶
Picard CollectAlignmentSummaryMetrics command line
java -jar picard.jar CollectAlignmentSummaryMetrics \
I=ALIGNED.BAM O=ALN_METRICS.TXT \
R=REFERENCE.FASTA \
ADAPTER_SEQUENCE=ADAPTERS_SEQ
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i ALIGNED.BAM --algo AlignmentStat \
--adapter_seq ADAPTERS_SEQ ALN_METRICS.TXT
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=ALIGNED.BAM |
-i ALIGNED.BAM |
Input the bam files |
O=ALN_METRICS.TXT |
N/A |
Output metrics |
R=REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
ADAPTER_SEQUENCE=ADAPTERS_SEQ |
--adapter_seq ADAPTERS_SEQ |
A string of adapters |
CollectBaseDistributionByCycle - BaseDistributionByCycle¶
Picard CollectBaseDistributionByCycle command line
java -jar picard.jar CollectBaseDistributionByCycle \
I=ALIGNED.BAM O=BASE_DISTRIBUTION_METRICS.TXT \
CHART_OUTPUT=BASE_DISTRIBUTION.PDF
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i ALIGNED.BAM --algo BaseDistributionByCycle \
BASE_DISTRIBUTION_METRICS.TXT
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=ALIGNED.BAM |
-i ALIGNED.BAM |
Input the bam files |
O=BASE_DISTRIBUTION_METRICS.TXT |
N/A |
Output metrics |
CHART_OUTPUT=BASE_DISTRIBUTION.PDF |
N/A |
Output chart |
ALIGNED_READS_ONLY=true |
--aligned_reads_only true |
Calculate the base distribution over aligned reads only |
PF_READS_ONLY=true |
--pf_reads_only true |
Calculate the base distribution over PF reads only |
CollectVariantCallingMetrics - CollectVCMetrics¶
Picard CollectVariantCallingMetrics command line
java -jar picard.jar CollectVariantCallingMetrics \
I=CALLS.VCF O=VC_METRICS_OUT DBSNP=DBSNP.VCF
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
--algo CollectVCMetrics -d DBSNP.VCF -v CALLS.VCF \
VC_METRICS_OUT
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=CALLS.VCF |
-v CALLS.VCF |
vcf input file |
O=VC_METRICS_OUT |
N/A |
Output basename |
DBSNP=DBSNP.VCF |
-d DBSNP.VCF |
dbSNP file |
ContEst - ContaminationAssessment¶
GATK3 Best Practices command line
java -jar GenomeAnalysisTK.jar -T ContEst -I TUMOR_RECALED.BAM \
-R REFERENCE.FASTA -pf POPULATION.VCF --genotypes GENOTYPES.VCF \
-o OUTPUT.TXT
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA -i TUMOR_RECALED.BAM \
--algo ContaminationAssessment --pop_vcf POPULATION.VCF \
--genotype_vcf GENOTYPES.VCF OUTPUT.TXT
GATK option |
Sentieon option |
Meaning |
---|---|---|
-I TUMOR_RECALED.BAM |
-i TUMOR_RECALED.BAM |
Input the bam files |
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference fasta |
-pf POPULATION.VCF |
--pop_vcf POPULATION.VCF |
The VCF file containing allele frequency information for the population |
--genotypes GENOTYPES.VCF |
--genotype_vcf GENOTYPES.VCF |
The VCF file containing variants reported for the inidividual |
-llc [META/SAMPLE/READGROUP] |
--type [META/SAMPLE/READGROUP] |
Assess contamination by sample, lane or all reads |
--min_qscore MIN_BQ |
--min_base_qual MIN_BQ |
Any bases with a quality less than MIN_BQ will be ignored |
--min_mapq MIN_MAPQ |
--min_map_qual MIN_MAPQ |
Any reads with a mapping quality less than MIN_MAPQ will be ignored |
-mbc MINIMUM_BASE_COUNT |
--min_basecount MINIMUM_BASE_COUNT |
The minimum number of bases present at a locus for contamination to be assessed |
--beta_threshold TRIM |
--trim_thresh TRIM |
Theshold that will be used to trim sites |
--trim_fraction TRIM_FRACTION |
--trim_frac TRIM_FRACTION |
Maximum fraction of sites that may be trimmed |
-pc PRECISION |
--precision PRECISION |
The precision on the output percent number |
-br BASE_REPORT |
--base_report BASE_REPORT |
The output file that will contain an extended report on the processed data |
-population POPULATION |
--population POPULATION |
A population for the baseline allele frequency of the sample |
-o OUTPUT.TXT |
N/A |
The output file |
DepthOfCoverage - CoverageMetrics¶
GATK3 Best Practices command
java -jar GenomeAnalysisTK.jar -T DepthOfCoverage \
-R REFERENCE.FASTA -I DEDUPED.BAM \
-geneList GENE_LIST.REFSEQ -ct THRESHOLD \
-o OUTPUT_BASE
GATK4 Best Practices command
gatk DepthOfCoverage \
-R REFERENCE.FASTA -I DEDUPED.BAM \
-gene-list GENE_LIST.REFSEQ \
--summary-coverage-threshold THRESHOLD \
-O OUTPUT_BASE
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i DEDUPED.BAM --algo CoverageMetrics \
--gene_list GENE_LIST.REFSEQ --cov_thresh THRESHOLD \
OUTPUT_BASE
GATK3 option |
Sentieon option |
Meaning |
---|---|---|
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-I DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
-geneList GENE_LIST.REFSEQ |
--gene_list GENE_LIST.REFSEQ |
RefSeq file used to aggregate results to the gene level |
--countType COUNT_TYPE |
--count_type COUNT_TYPE |
Determines how to deal with overlapping paried ends |
-pt PARTITION |
--partition PARTITION |
Determines how to partition the data |
-ct THRESHOLD |
--cov_thresh THRESHOLD |
Add aggregation metrics for the percentage of bases with coverage greater than THRESHOLD |
--start MIN_DEPTH |
--histogram_low MIN_DEPTH |
The smallest histogram bin |
--stop MAX_DEPTH |
--histogram_high MAX_DEPTH |
The largest histogram bin |
--nBins NUM_BINS |
--histogram_bin_count NUM_BINS |
The number of histogram bins |
-mmq MIN_MAPQ |
--min_map_qual MIN_MAPQ |
Minimum mapping quality of reads used |
--maxMappingQuality MAX_MAPQ |
--max_map_qual MAX_MAPQ |
Maximum mapping quality of reads used |
-mbq MIN_BASEQ |
--min_base_qual MIN_BASEQ |
Minimum base quality of bases used |
--maxBaseQuality MAX_BASEQ |
--max_base_qual MAX_BASEQ |
Maximum base quality of bases used |
-omitBaseOutput |
--omit_base_output |
Omit output of the per locus coverage |
-omitSampleSummary |
--omit_sample_stat |
Omit output of the summary results |
-omitLocusTable |
--omit_locus_stat |
Omit output of histogram files |
-omitIntervals |
--omit_interval_stat |
Omit output of interval statistics |
-baseConts |
--print_base_counts |
Include the number of "ACGTND" in the output per locus coverage |
--includeRefNSites |
--include_ref_N |
Include coverage data in loci where the reference genome is set to N |
--ignoreDeletionSites |
--ignore_del_sites |
Ignore coverage data in loci where there are deletions |
-dels |
--include_del |
Include deletions and add deletion counts |
-o OUTPUT_BASE |
N/A |
Output file basename |
GATK4 option |
Sentieon option |
Meaning |
---|---|---|
-R REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
-I DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
-gene-list GENE_LIST.REFSEQ |
--gene_list GENE_LIST.REFSEQ |
RefSeq file used to aggregate results to the gene level |
--count-type COUNT_TYPE |
--count_type COUNT_TYPE |
Determines how to deal with overlapping paried ends |
-pt PARTITION |
--partition PARTITION |
Determines how to partition the data |
--summary-coverage-threshold THRESHOLD |
--cov_thresh THRESHOLD |
Add aggregation metrics for the percentage of bases with coverage greater than THRESHOLD |
--nBins NUM_BINS |
--histogram_bin_count NUM_BINS |
The number of histogram bins |
-mbq MIN_BASEQ |
--min_base_qual MIN_BASEQ |
Minimum base quality of bases used |
--min-base-quality MAX_BASEQ |
--max_base_qual MAX_BASEQ |
Maximum base quality of bases used |
--omit-depth-output-at-each-base |
--omit_base_output |
Omit output of the per locus coverage |
--omit-per-sample-statistics |
--omit_sample_stat |
Omit output of the summary results |
--omit-locus-table |
--omit_locus_stat |
Omit output of histogram files |
--omit-interval-statistics |
--omit_interval_stat |
Omit output of interval statistics |
--print-base-counts |
--print_base_counts |
Include the number of "ACGTND" in the output per locus coverage |
--include-ref-n-sites |
--include_ref_N |
Include coverage data in loci where the reference genome is set to N |
--ignore-deletion-sites |
--ignore_del_sites |
Ignore coverage data in loci where there are deletions |
--include-deletions |
--include_del |
Include deletions and add deletion counts |
-O OUTPUT_BASE |
N/A |
Output file basename |
CollectGcBiasMetrics - GCBias¶
Picard CollectGcBiasMetrics command line
java -jar picard.jar CollectGcBiasMetrics \
I=DEDUPED.BAM O=GC_METRICS.TXT CHART=GC_BIAS.PDF \
S=SUMMARY.TXT R=REFERENCE.FASTA ASSUME_SORTED=true
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i DEDUPED.BAM --algo GCBias --summary SUMMARY.TXT \
GC_METRICS.TXT
sentieon plot GCBias -o GC_BIAS.PDF GC_METRICS.TXT
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
R=REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
O=GC_METRICS.TXT |
N/A |
GC bias metrics results |
CHART=GC_BIAS.PDF |
-o GC_BIAS.PDF |
GC bias metrics report |
S=SUMMARY.TXT |
--summary SUMMARY.TXT |
GC bias metrics summary results |
LEVEL=LEVEL |
--accum_level LEVEL |
The accumulation level |
CollectHsMetrics - HsMetricAlgo¶
Picard CollectHsMetrics command line
java -jar picard.jar CollectHsMetrics \
I=DEDUPED.BAM O=HS_METRICS.TXT R=REFERENCE.FASTA \
BAIT_INTERVALS=BAITS TARGET_INTERVALS=TARGETS
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i DEDUPED.BAM --algo HsMetricAlgo --targets_list TARGETS \
--baits_list BAITS HS_METRICS.TXT
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
R=REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
O=HS_METRICS.TXT |
N/A |
HS metrics results |
BAIT_INTERVALS=BAITS |
--baits_list BAITS |
Interval list input file of baits |
TARGET_INTERVALS=TARGETS |
--targets_list TARGETS |
Interval list input file of targets |
CLIP_OVERLAPPING_READS |
--clip_overlapping_reads |
Clip overlapping reads |
MINIMUM_MAPPING_QUALITY=MIN_MAPQ |
--min_map_qual MIN_MAPQ |
Minimum read mapping quality |
MINIMUM_BASE_QUALITY=MIN_BASEQ |
--min_base_qual MIN_BASEQ |
Minimum base quality |
COVERAGE_CAP=COVERAGE |
--coverage_cap COVERAGE |
Maximum coverage limit in the histogram |
CollectInsertSizeMetrics - InsertSizeMetricAlgo¶
Picard CollectInsertSizeMetrics command line
java -jar picard.jar CollectInsertSizeMetrics \
I=DEDUPED.BAM O=IS_METRICS.TXT R=REFERENCE.FASTA \
H=IS_METRICS.PDF
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i DEDUPED.BAM --algo InsertSizeMetricAlgo \
IS_METRICS.TXT
sentieon plot InsertSizeMetricAlgo -o IS_METRICS.PDF IS_METRICS.TXT
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
O=IS_METRICS.TXT |
N/A |
IS metrics results |
R=REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
H=IS_METRICS.PDF |
-o IS_METRICS.PDF |
Insert size metrics report |
MeanQualityByCycle - MeanQualityByCycle¶
Picard MeanQualityByCycle command line
java -jar picard.jar MeanQualityByCycle \
I=DEDUPED.BAM O=MQ_METRICS.TXT R=REFERENCE.FASTA \
CHART=MQ_METRICS.PDF
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i DEDUPED.BAM --algo MeanQualityByCycle \
MQ_METRICS.TXT
sentieon plot MeanQualityByCycle -o MQ_METRICS.PDF MQ_METRICS.TXT
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
O=MQ_METRICS.TXT |
N/A |
MQ metrics results |
R=REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
CHART=MQ_METRICS.PDF |
-o MQ_METRICS.PDF |
Mean quality metrics report |
QualityScoreDistribution - QualDistribution¶
Picard QualityScoreDistribution command line
java -jar picard.jar QualityScoreDistribution \
I=DEDUPED.BAM O=QD_METRICS.TXT \
CHART=QD_METRICS.PDF
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i DEDUPED.BAM --algo QualDistribution \
QD_METRICS.TXT
sentieon plot QualDistribution -o QD_METRICS.PDF QD_METRICS.TXT
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
O=QD_METRICS.TXT |
N/A |
QD metrics results |
N/A |
-r REFERENCE.FASTA |
Reference file |
CHART=QD_METRICS.PDF |
-o QD_METRICS.PDF |
Quality distribution metrics report |
CollectQualityYieldMetrics - QualityYield¶
Picard CollectQualityYieldMetrics command line
java -jar picard.jar CollectQualityYieldMetrics \
I=DEDUPED.BAM O=YIELD_METRICS.TXT
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i DEDUPED.BAM --algo QualityYield \
YIELD_METRICS.TXT
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
O=YIELD_METRICS.TXT |
N/A |
Quality yield metrics results |
INCLUDE_SECONDARY_ALIGNMENTS=true |
--include_supplementary |
Include supplementary alignments in the calculation |
INCLUDE_SUPPLEMENTAL_ALIGNMENTS=true |
--include_secondary |
Include secondary alignments in the calculation |
CollectSequencingArtifactMetrics - SequenceArtifactMetricsAlgo¶
Picard CollectSequencingArtifactMetrics command line
java -jar picard.jar CollectSequencingArtifactMetrics \
I=DEDUPED.BAM O=ARTIFACT_METRICS_BASE R=REFERENCE.FASTA \
DB_SNP=DBSNP.VCF
java -jar picard.jar ConvertSequencingArtifactToOxoG \
I=DEDUPED.BAM O=ARTIFACT_METRICS_BASE R=REFERENCE.FASTA \
OUTPUT_BASE=oxog_metrics
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i DEDUPED.BAM --algo SequenceArtifactMetricsAlgo \
--dbsnp DBSNP.VCF ARTIFACT_METRICS_BASE
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
O=ARTIFACT_METRICS_BASE |
N/A |
Artifact metrics output base |
R=REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
DB_SNP=DBSNP.VCF |
--dbsnp DBSNP.VCF |
A dbSNP file to exclude known polymorphisms |
Q=MIN_BQ |
--min_base_qual MIN_BQ |
Minimum base quality for a base to be included |
MQ=MIN_MAPQ |
--min_map_qual MIN_MAPQ |
Minimum mapping quality for a read to be included |
MIN_INS=MIN_INSERT |
--min_insert_size MIN_INSERT |
Minimum insert size to include a read |
MAX_INS=MAX_INSERT |
--max_insert_size MAX_INSERT |
Maximum insert size to include a read |
UNPAIRED=true |
--include_unpaired |
Include unpaired reads |
TANDEM=true |
--tandem_reads |
Include tandem reads |
INCLUDE_DUPLICATES=true |
--include_duplicates |
Include duplicate reads |
INCLUDE_NON_PF_READS=true |
--include_non_pf_reads |
Include non-PF reads |
CONTEXT_SIZE=CONTEXT |
--context_size CONTEXT |
The number of context bases to include on each size |
CollectWgsMetrics - WgsMetricsAlgo¶
Picard CollectWgsMetrics command line
java -jar picard.jar CollectWgsMetrics \
I=DEDUPED.BAM O=WGS_METRICS.TXT R=REFERENCE.FASTA
Sentieon® command line
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i DEDUPED.BAM --algo WgsMetricsAlgo \
WGS_METRICS.TXT
Picard option |
Sentieon option |
Meaning |
---|---|---|
I=DEDUPED.BAM |
-i DEDUPED.BAM |
Input the bam files |
O=WGS_METRICS.TXT |
N/A |
WGS metrics results |
R=REFERENCE.FASTA |
-r REFERENCE.FASTA |
Reference file |
MQ=MIN_MAPQ |
--min_map_qual MIN_MAPQ |
Minimum mapping quality for a read to be included |
Q=MIN_BQ |
--min_base_qual MIN_BQ |
Minimum base quality for a base to be included |
CAP=COVERAGE_CAP |
--coverage_cap COVERAGE_CAP |
Maximum coverage limit for the histogram |
COUNT_UNPAIRED=true |
--include_unpaired true |
Count unpaired reads and paired reads with one end unmapped |
INCLUDE_BQ_HISTOGRAM=true |
--base_qual_histogram true |
Report a base quality histogram |
SAMPLE_SIZE=SAMPLE_SIZE |
--sample_size SAMPLE_SIZE |
Sample size used for theoretical het sensitivity sampling |
Other differences in usage¶
Sentieon® refers to tools as algorithms, so the option -T
in GATK3
corresponds to the option --algo
in Sentieon® .
Sentieon® produces log files directly to stdout and stderr, so the option
-log
is not available.
Sentieon® tries to use as many threads as the system has available, while
GATK uses 1 thread by default. As such omitting option -nt
in GATK, is
not the same as omitting the option -t
in Sentieon® .
Sentieon® does not do any down-sampling, so the following options are not
available: --downsample_to_coverage
, --downsample_to_fraction
,
--downsampling_type
, …
Other general level arguments that are currently supported by Sentieon are:
--bam_compression
: for algorithms that output a bam--cram_write_options
: for algorithms that output a cram