Description of output files and fields¶
Introduction¶
This document describes the output files of Sentieon® TNsnv, TNhaplotyper, TNhaplotyper2 and TNscope® algorithms and the meaning of the fields in those files. You can use the information in this document to better understand the files produced by Sentieon® tumor-normal variant calling software.
TNsnv¶
Introduction¶
An example command with TNsnv is as follows
sentieon driver -t NUMBER_THREDS -r REFERENCE.FASTA \
-i NORMAL_RECALED.BAM -i TUMOR_RECALED.BAM \
--interval INTERVAL \
--algo TNsnv --dbsnp DBSNP.VCF \
--tumor_sample TUMOR_SM --normal_sample NORMAL_SM \
-call_stats_out CALL_STATS_OUTPUT.TXT
--stdcov_out STD_COVERAGE.TXT \ Standard coverage output file
--q20cov_out Q20_COVERAGE.TXT \ Q20 coverage output file
--power_out POWER.TXT --tumor_depth_out TUMOR_DP.TXT \
--normal_depth_out NORMAL_DP.TXT OUTPUT.VCF
This command line produces the following required output files:
OUTPUT.VCF
In addition, the following optional output files are produced:
CALL_STATS_OUTPUT.TXT
STD_COVERAGE.TXT
Q20_COVERAGE.TXT
POWER.TXT
TUMOR_DP.TXT
NORMAL_DP.TXT
The OUTPUT.VCF of TNsnv contains only limited output information. Users who desired a more detailed output format should examine the CALL_STATS_OUTPUT.TXT file.
OUTPUT.VCF¶
The OUTPUT.VCF file conforms to the VCF 4.2 specification. More information on the VCF format can be found at https://samtools.github.io/hts-specs/VCFv4.2.pdf. The INFO field annotations are described in detail below.
INFO annotation |
Description |
---|---|
DB |
The variant is present in the VCF file supplied with the --dbsnp option |
MQ0 |
Total number of reads with Mapping Quality equal to 0 |
SOMATIC |
The variant occurs uniquely in the sample supplied with the --tumor_sample option |
VT |
Variant type, can be SNP, INS or DEL |
TNsnv also populates the FILTER field of the output VCF file. Variants are filtered using TNsnv internal quality filters. More information on the applied filters can be found in the failure_reasons row in the table in section 2.3.
FILTER |
Description |
---|---|
PASS |
The variant passes TNsnv internal quality filters |
REJECT |
The variant fails TNsnv internal quality filters |
Standard genotype fields are defined by the format specification. However, TNsnv also outputs the following non-standard fields.
GENOTYPE field |
Description |
---|---|
BQ |
Average base quality of bases supporting the alternate alleles |
FA |
Fraction of reads supporting the alternate allele |
SS |
Status of the variant. Not currently implemented, always set to 2 |
CALL_STATS_OUTPUT.TXT¶
The CALL_STATS_OUTPUT.TXT file is a tab-separated text file with the following columns for each candidate variant. The core statistic of the software is t_lod_fstar which is a measurement of the support for the mutation relative to the expected level of sequencing noise at the candidate site.
Column |
Description |
---|---|
Contig |
The contig (chromosome) with the candidate |
Position |
The genomic coordinate of the candidate along the contig |
Context |
The sequence 3bp to either side of the candidate |
Ref_allele |
The reference allele at the candidate site |
Alt_allele |
The alternate allele at the candidate site |
Tumor_name |
The name of the tumor sample with the candidate mutation |
Normal_name |
The name of the paired normal sample |
Score |
Variant score. Not currently implemented, always set to 0.0 |
Dbsnp_site |
The variant is present in the VCF file supplied with the --dbsnp option (DBSNP) or is novel (NOVEL) |
Covered |
The site has sufficient read coverage to detect a variant with a 0.3 allele fraction at 80% power |
Power |
The product of tumor power and normal power, described below. |
Tumor_power |
The power to detect a mutation at a 0.3 allele fraction at the observed sequencing depth in the tumor sample |
Normal_power |
The power to detect a germline mutation at this site taking into account the presence of the site in dbSNP at the observed sequencing depth in the normal sample |
Normal_power_nsp |
The power to detect a germline mutation in the normal sample given that the mutation is not in dbSNP |
Normal_power_wsp |
The power to detect a germline mutation in the normal sample given that the mutation is in dbSNP |
Total_reads |
Total number of reads in both the tumor and normal samples at this site |
Map_Q0_reads |
Total number of reads in both the tumor and normal samples with mapping quality 0 at this site |
Init_t_lod |
Log odds of the likelihood that the candidate mutation is real over the likelihood that the candidate mutation is a sequencing error before any read-based filters are applied |
t_lod_fstar |
Log odds of the likelihood that the candidate mutation is real over the likelihood that the candidate mutation is a sequencing error |
t_lod_fstar_forward |
t_lod_fstar calculated using only reads on the forward strand |
t_lod_fstar_reverse |
t_lod_fstar calculated using only reads on the reverse strand |
tumor_f |
Estimated allele fraction of the candidate mutation in the tumor sample |
Contaminant_fraction |
Estimate of contamination of normal cells in the tumor sample |
Contaminant_lod |
Log odds of the likelihood that the candidate is contamination over the likelihood that the candidate is a sequencing error |
t_q20_count |
Count of the number of reads in the tumor sample with a base quality of at least 20 |
t_ref_count |
Number of reads supporting the reference allele in the tumor sample |
t_alt_count |
Number of reads supporting the alternate allele in the tumor sample |
t_ref_sum |
Sum of the quality scores of the bases supporting the reference allele in the tumor sample |
t_alt_sum |
Sum of the quality scores of the bases supporting the alternate allele in the tumor sample |
t_ref_max_mapq |
The maximum mapping quality of tumor reads supporting the reference allele |
t_alt_max_mapq |
The maximum mapping quality of tumor reads supporting the alternate allele |
t_ins_count |
The number of reads in the tumor sample that have an insertion in the surrounding five bases |
t_del_count |
The number of reads in the tumor sample that have an insertion in the surrounding five bases |
Normal_best_gt |
The most likely genotype of the normal sample |
Init_n_lod |
Log odds of the likelihood that the normal sample is reference over the normal sample having the variant before any read-based filters are applied |
normal_f |
Estimated allele fraction of the candidate mutation in the normal sample |
n_q20_count |
Count of the number of reads in the normal sample with a base quality of at least 20 |
n_ref_count |
Number of reads supporting the reference allele in the normal sample |
n_alt_count |
Number of reads supporting the alternate allele in the normal sample |
n_ref_sum |
Sum of the quality scores of the bases supporting the reference allele in the normal sample |
n_alt_sum |
Sum of the quality scores of the bases supporting the alternate allele in the normal sample |
power_to_detect_positive_strand_artifact |
The power to detect strand bias to the positive strand at the given sequencing depth |
power_to_detect_negative_strand_artifact |
The power to detect strand bias to the negative strand at the given sequencing depth |
strand_bias_counts |
A vector of counts for the tumor sample in the order of (tumor_ref_pos, tumor_ref_neg, tumor_alt_pos, tumor_alt_neg) where ref and alt specify the reference and alternate alleles and pos and neg specify the positive and negative strands. The numbers do not match those in earlier columns due to differential filtering |
tumor_alt_fpir_median |
Median position along forward strand reads for bases supporting the alternate allele in the tumor sample |
tumor_alt_fpir_mad |
Mean absolute deviation of the positions along forward strand reads for bases supporting the alternate allele in the tumor sample |
tumor_alt_rpir_median |
Median position along reverse strand reads for bases supporting the alternate allele in the tumor sample |
tumor_alt_rpir_mad |
Mean absolute deviation of the positions along reverse strand reads for bases supporting the alternate allele in the tumor sample |
observed_in_normals_count |
The number of reads supporting the candidate mutation in the normal sample |
failure_reasons |
Reasons for rejecting the candidate somatic mutation. Possibilities include: (1) alt_allele_in_normal - The alternate allele has significant support in the normal sample. (2) clustered_read_position - The alternate allele is not distributed evenly over the length of the read. (3) fstar_tumor_lod - the candidate does not have significant support above noise. (4) germline_risk - there is evidence for the mutation in the normal sample at a dbSNP site (5) nearby_gap_events - Insertion and deletion events were identified at the locus. (6) normal_lod - there is evidence for the mutation in the normal sample. (7) poor_mapping_region_alternate_allele_mapq - Low mapping quality for the alternate allele. (8) poor_mapping_region_mapq0 - Too many reads with a mapping quality of 0 at the locus. (9) possible_contamination - Possible contamination of the normal sample with tumor. (10) strand_artifact - The mutation is likely a strand bias artifact. (11) triallelic_site - The site is not biallelic. |
judgement |
The candidate is a true somatic variant (KEEP) or the candidate is not a likely somatic variant (REJECT). |
STD_COVERAGE.TXT¶
A WIGGLE format file describing whether there is sufficient coverage to detect somatic variants at a 0.3 allele fraction in the tumor with 80% power. 1 indicates that the coverage at the locus passes this threshold, 0 otherwise.
Q20_COVERAGE.TXT¶
A WIGGLE format file describing whether there is sufficient coverage to detect somatic variants at a 0.3 allele fraction in the tumor with 80% power examining only bases with a quality of greater than 20. 1 indicates that the coverage at the locus passes this threshold, 0 otherwise.
POWER.TXT¶
A WIGGLE format file describing the power to detect a somatic variant at the observed coverage in the tumor and normal samples.
TUMOR_DP.TXT¶
A WIGGLE format file describing the observed sequence read depth in the tumor sample.
NORMAL_DP.TXT¶
A WIGGLE format file describing the observed sequence read depth in the normal sample.
TNhaplotyper¶
Introduction¶
An example command with TNhaplotyper is as follows
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i NORMAL_RECALED.BAM -i TUMOR_RECALED.BAM \
--interval INTERVAL \
--algo TNhaplotyper --dbsnp DBSNP.VCF \
--tumor_sample TUMOR_SM --normal_sample NORMAL_SM \
OUTPUT.VCF
This command line produces the following required output files:
OUTPUT.VCF
OUTPUT.VCF¶
The OUTPUT.VCF file conforms to the VCF 4.2 specification. More information on the VCF format can be found at https://samtools.github.io/hts-specs/VCFv4.2.pdf. The INFO field annotations are described in detail below.
The core statistics of the software are TLOD, which is a measure of the support for the mutation relative to the expected level of sequencing noise at the candidate site, and NLOD, which is a measure of the odds that the mutation is absent from the normal sample.
INFO annotation |
Description |
---|---|
DB |
The variant is present in the VCF file supplied with the --dbsnp option |
ECNT |
Number of candidate variants in the active region, typically the number of candidate variants in the +/- 50 to 300 bp region |
HCNT |
Number of haplotypes observed in the active region after assembly of the sequence reads |
MAX_ED |
Maximum edit distance between the observed haplotypes in the active region |
MIN_ED |
Minimum edit distance between the observed haplotypes in the active region |
NLOD |
Log odds that the variant is not present in the normal sample (confidence that the variant is not a germline variant) |
PON |
Number of times the variant is observed in the panel of normal samples |
RPA |
The number of times the repeat is present for each allele for an indel within a short tandem repeat |
RU |
The sequence of the repeated nucleotides for an indel within a short tandem repeat |
STR |
The variant is an expansion or contraction of a short tandem repeat |
TLOD |
Log odds that the variant is present in the tumor sample relative to the expected noise |
TNhaplotyper also populates the FILTER field for the variants.
FILTER |
Description |
---|---|
PASS |
The variant is confidently a somatic mutation |
alt_allele_in_normal |
The alternate allele is present in the paired normal sample and is unlikely to be a somatic variant |
clustered_events |
Multiple events are present on the same haplotype as the variant which is indicative of a false-positive call |
germline_risk |
There is evidence that the variant is present in the normal sample given that the variant is present in supplied dbSNP VCF and not present in the supplied COSMIC vcf |
homologous_mapping_event |
More than three events are present at this locus in the tumor which is indicate of a false-positive call |
low_t_alt_frac |
The variant is filtered due to a low alternate allele fraction in the tumor sample |
multi_event_alt_allele_in_normal |
Multiple events are present in the tumor sample and the alternate allele appears in the normal sample |
panel_of_normals |
The mutation is present in at least two samples in the panel of normals. |
str_contraction |
The mutation is a contraction of a short tandem repeat |
t_lod_fstar |
The mutation does not have significant support above noise |
triallelic_site |
The mutation occurs at a triallelic site |
Standard genotype fields are defined by the format specification. However, TNhaplotyper also outputs the following non-standard fields.
GENOTYPE |
Description |
---|---|
AF |
Fraction of reads supporting the alternate allele |
ALT_F1R2 |
The number of reads in the F1R2 orientation supporting the alternate allele |
ALT_F2R1 |
The number of reads in the F1R2 orientation supporting the alternate allele |
FOXOG |
The fraction of alt reads indicating OxoG error. OxoG error is induced by DNA oxidation during library preparation and is a frequent source of false-positive calls. See PMID: 23303777. |
PGT |
Physical phasing haplotype information describing how the alternate alleles are phased in relation to one another |
PID |
Physical phasing ID information, connecting records within a phasing group by using unique IDs within a given sample, but not across samples |
QSS |
Sum of base quality scores for each allele |
REF_F1R2 |
The number of reads in the F1R2 orientation supporting the reference allele |
REF_F2R1 |
The number of reads in the F2R1 orientation supporting the reference allele |
TNhaplotyper2¶
Introduction¶
An example command with TNhaplotyper2 is as follows
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i NORMAL_RECALED.BAM -i TUMOR_RECALED.BAM \
--algo TNhaplotyper2 --tumor_sample TUMOR_SM \
--normal_sample NORMAL_SM \
TMP.VCF \
--algo OrientationBias --tumor_sample TUMOR_SM \
ORIENTATION_DATA \
--algo ContaminationModel --tumor_sample TUMOR_SM \
--normal_sample NORMAL_SM \
--vcf GERMLINE_RESOURCE \
--tumor_segments CONTAMINATION_DATA.segments \
CONTAMINATION_DATA
sentieon driver -r REFERENCE.FASTA \
--algo TNfilter --tumor_sample TUMOR_SM \
--normal_sample NORMAL_SM \
-v TMP.VCF \
--contamination CONTAMINATION_DATA \
--tumor_segments CONTAMINATION_DATA.segments \
--orientation_priors ORIENTATION_DATA \
OUTPUT.VCF
This command line produces the following required output files:
OUTPUT.VCF
OUTPUT.VCF¶
The OUTPUT.VCF file conforms to the VCF 4.2 specification. More information on the VCF format can be found at https://samtools.github.io/hts-specs/VCFv4.2.pdf. The INFO field annotations are described in detail below.
The core statistics of the software are TLOD, which is a measure of the support for the mutation relative to the expected level of sequencing noise at the candidate site, and NLOD, which is a measure of the odds that the mutation is absent from the normal sample.
INFO annotation |
Description |
---|---|
AS_FilterStatus |
The filter status of each allele, with alleles separated by the pipe character |
AS_SB_TABLE |
Forward and reverse read counts for each allele, with alleles separated by the pipe character |
AS_UNIQ_ALT_READ_COUNT |
The number of reads with unique start and mate-end positions for each alternate allele |
CONTQ |
Phred-scaled probability that the variant alleles are not due to contamination |
DP |
Approximate read depth |
ECNT |
Number of candidate variants in the active region, typically the number of candidate variants in the +/- 50 to 300 bp region |
GERMQ |
The phred-scaled posterior probability that the alternate allele(s) are not germline variants |
MBQ |
Median base quality of each allele |
MFRL |
Median fragment length of each allele |
MMQ |
Median mapping quality of each allele |
MPOS |
Median distance from the end of the read for each alternate allele |
NALOD |
Negative log 10 odds of the variant being an an artifact in the normal sample with the same allele fraction as the tumor sample for each alternate allele |
NCount |
Count of N-bases in the read pileup |
NLOD |
Log odds that the variant is not present in the normal sample (confidence that the variant is not a germline variant) for each alternate allele |
OCM |
Number of reads supporting the alternate allele whose original alignment does not match the current contig |
PON |
The variant is found in the panel of normal samples |
POPAF |
Population allele frequency of the alternate alleles |
ROQ |
Phred-scaled probability that the variant alleles are not due to a read orientation artifact |
RPA |
The number of times the repeat is present for each allele for an indel within a short tandem repeat |
RU |
The sequence of the repeated nucleotides for an indel within a short tandem repeat |
SEQQ |
Phred-scaled probability that the variant alleles are not due to sequencing error |
STR |
The variant is an expansion or contraction of a short tandem repeat |
STRANDQ |
Phred-scaled probability of a strand-bias artifact |
STRQ |
Phred-scaled probability that the alternate alleles are errors due to polymerase slippage |
TLOD |
Log odds that the variant is present in the tumor sample relative to the expected noise |
TNfilter also populates the FILTER field for the variants.
FILTER |
Description |
---|---|
PASS |
The site contains at least one allele that passes all filters |
FAIL |
All variant alleles are filtered, but for different reasons |
base_qual |
The median base quality of bases supporting the alternate allele is too low |
clustered_events |
Multiple events are present on the same haplotype as the variant which is indicative of a false-positive call |
contamination |
The alternate allele is present due to contamination |
duplicate |
The alternate allele is overrepresented by apparent sequencing duplicates |
fragment |
A large difference is observed in the median fragment length for reads supporting the reference and alternate alleles |
germline |
There is evidence that the variant is germline |
haplotype |
Variant is on the same haplotype as other filtered variants |
low_allele_frac |
The variant allele fraction is below the threshold |
map_qual |
The median mapping quality of reads supporting the alternate allele is too low |
multiallelic |
The mutation occurs at a multialleleic site |
n_ratio |
Too many 'N' bases at the site |
normal_artifact |
The variant is likely an artifact in the normal sample |
orientation |
The variant is likely an artifact due to orientation bias |
panel_of_normals |
The site is present in the panel of normals |
position |
The allele is close to the ends of the reads |
slippage |
The variant is likely an artifact due to polymerase slippage |
strand_bias |
Evidence for the alternate allele comes from only one read direction |
strict_strand |
Evidence for the alternate allele is not significant on both directions |
weak_evidence |
The mutation does not have significant support above noise |
Standard genotype fields are defined by the format specification. However, TNhaplotyper2 also outputs the following non-standard fields.
GENOTYPE |
Description |
---|---|
AF |
Fraction of reads supporting the alternate allele |
AD |
Allelic depths for the reference and alternate alleles |
DP |
Approximate read depth |
F1R2 |
The number of reads in the F1R2 orientation supporting each allele |
F2R1 |
The number of reads in the F2R1 orientation supporting each allele |
PGT |
Physical phasing haplotype information describing how the alternate alleles are phased in relation to one another |
PID |
Physical phasing ID information, connecting records within a phasing group by using unique IDs within a given sample, but not across samples |
PS |
Phasing set; typically the position of the first variant in the set |
SB |
The forward and reverse read counts for the reference and alternate alleles |
TNscope®¶
Introduction¶
An example command with TNscope® is as follows
sentieon driver -t NUMBER_THREADS -r REFERENCE.FASTA \
-i NORMAL_RECALED.BAM -i TUMOR_RECALED.BAM \
--interval INTERVAL \
--algo TNscope --tumor_sample TUMOR_SM \
--normal_sample NORMAL_SM --dbsnp DBSNP.VCF OUTPUT.VCF
This command line produces the following required output files:
OUTPUT.VCF
OUTPUT.VCF¶
The OUTPUT.VCF file conforms to the VCF 4.2 specification. More information on the VCF format can be found at https://samtools.github.io/hts-specs/VCFv4.2.pdf. The INFO field annotations are described in detail below.
The core statistics of the software are TLOD, which is a measure of the support for the mutation relative to the expected level of sequencing noise at the candidate site, and NLOD, which is a measure of the odds that the mutation is absent from the normal sample.
INFO annotation |
Description |
---|---|
CIEND |
The confidence interval around the END position for imprecise structural variants |
CIPOS |
Confidence interval around POS for imprecise structural variants |
DB |
The variant is present in the VCF file supplied with the --dbsnp option |
DPR |
Average depth in the region surrounding the variant (+/-1bp) |
ECNT |
Number of candidate variants in the active region, typically the number of candidate variants in the +/- 50 to 300 bp region |
END |
The end position of the structural variant |
FS |
Phred-scale p-value using Fisher's exact test to detect strand bias |
HCNT |
The number of haplotypes observed in the active region after assembly of the sequence reads |
IMPRECISE |
The breakpoints of the structural variant are not precisely known |
MATEID |
Breakend mate |
MAX_ED |
Maximum edit distance between the observed haplotypes in the active region |
MIN_ED |
Minimum edit distance between the observed haplotypes in the active region |
NLOD |
Log odds that the variant is not present in the normal sample (confidence that the variant is not a germline variant) |
NLODF |
Log odds that the variant is not present in the normal sample (confidence that the variant is not a germline variant) given the allele fraction in the tumor sample |
PON |
Number of times the variant is observed in the panel of normal samples |
PV |
The p-value from a Fisher's exact test of the number of reads supporting the reference and alternate alleles in the tumor and normal samples |
PV2 |
The p-value from a Fisher's exact test of the number of reads supporting the reference and alternate alleles in the tumor and normal samples using only high-confidence reads |
RPA |
The number of times the repeat is present for each allele for an indel within a short tandem repeat |
RU |
The sequence of the repeated nucleotides for an indel within a short tandem repeat |
SOMATIC |
The variant occurs uniquely in the sample supplied with the --tumor_sample option |
SOR |
Symmetric Odds Ratio of 2x2 contingency table to detect strand bias |
STR |
The variant is an expansion or contraction of a short tandem repeat |
SVLEN |
The difference in length between REF and ALT alleles of structural variants |
SVTYPE |
The type of structural variant |
TLOD |
Log odds that the variant is present in the tumor sample relative to expected noise |
VAF |
The variant allele frequency. The fraction of reads supporting the alternate allele in the tumor sample. |
TNscope® also populates the FILTER field for the variants.
FILTER |
Description |
---|---|
PASS |
The variant is confidently a somatic mutation |
alt_allele_in_normal |
The alternate allele is present in the paired normal sample and is unlikely to be a somatic variant |
clustered_events |
Multiple events are present on the same haplotype as the variant which is indicative of a false-positive call |
germline_risk |
There is evidence that the variant is present in the normal sample given that the variant is present in supplied dbSNP VCF and not present in the supplied COSMIC vcf |
homologous_mapping_event |
More than three events are present at this locus in the tumor which is indicate of a false-positive call |
low_t_alt_frac |
The variant is filtered due to a low alternate allele fraction in the tumor sample |
multi_event_alt_allele_in_normal |
Multiple events are present in the tumor sample and the alternate allele appears in the normal sample |
panel_of_normals |
The mutation is observed in at least two samples in the panel of normals |
str_contraction |
The mutation is a contraction of a short tandem repeat |
t_lod_fstar |
The mutation does not have significant support above noise |
triallelic_site |
The mutation occurs at a triallelic site |
Standard genotype fields are defined by the format specification. However, TNscope® also outputs the following non-standard fields.
GENOTYPE field |
Description |
---|---|
AF |
Fraction of reads supporting the alternate allele |
AFDP |
Read depth used to calculate AF |
AFDPLOWMQ |
Read depth used to calculate AF including reads with low mapping quality |
AFLOWMQ |
Allele fraction of the event in the tumor including low mapq reads |
ALT_F1R2 |
The number of reads in the F1R2 orientation supporting the alternate allele |
ALT_F2R1 |
The number of reads in the F1R2 orientation supporting the alternate allele |
ALTHC |
Depth of reads supporting the highest confidence alternate allele |
ALTHCLOWMQ |
Depth of reads supporting the highest confidence alternate allele including reads with low mapping quality |
BaseQRankSumPS |
Z-score from Wilcoxon rank sum test of Alt vs. Ref base qualities per sample |
ClippingRankSumPS |
Z-score from Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases per sample |
DPHC |
Depth of high-confidence reads supporting the reference or alternate allele |
DPHCLOWMQ |
Depth of high-confidence reads supporting the reference or alternate allele including reads with low mapping quality |
FOXOG |
The fraction of alt reads indicating OxoG error. OxoG error is induced by DNA oxidation during library preparation and is a frequent source of false-positive calls. See PMID: 23303777. |
MQRankSumPS |
Z-score from Wilcoxon rank sum test of Alt vs. Ref read mapping qualities per sample |
NBQPS |
Mean Neighboring Base Quality, including 5bp on both sides per sample |
PGT |
Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another |
PID |
Physical phasing ID information, connecting records within a phasing group by using unique IDs within a given sample, but not across samples |
QSS |
Sum of base quality scores for each allele |
ReadPosEndDistPS |
Z-score from Wilcoxon rank sum test of mean distance from either end of read per sample |
ReadPosRankSumPS |
Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias per sample |
REF_F1R2 |
The number of reads in the F1R2 orientation supporting the reference allele |
REF_F2R1 |
The number of reads in the F2R1 orientation supporting the reference allele |