Calculate Variant Allele Fraction (VAF)
Estimate raw VAF, confidence interval, and purity-adjusted cancer cell fraction from sequencing read counts.
Expert Guide: How to Calculate Variant Allele Fraction Correctly
Variant allele fraction, often abbreviated VAF, is one of the most important quantitative values in modern genomics. It represents the proportion of sequencing reads that support a specific variant at a genomic position. Clinicians, molecular pathologists, translational researchers, and bioinformatics teams use VAF to interpret somatic mutations in cancer, track measurable residual disease, evaluate assay sensitivity, and prioritize variants for downstream reporting. Even though the formula is simple, high quality interpretation requires careful thinking about sequencing depth, statistical uncertainty, tumor purity, copy number state, and technology specific error rates.
At its most basic level, VAF answers this question: out of all reads that cover this locus, what fraction carries the alternate allele? If 50 out of 500 reads contain the variant, raw VAF is 10%. However, that 10% can mean very different biological scenarios depending on context. In plasma cell free DNA, 10% might indicate substantial disease burden. In tissue with low tumor purity, 10% may represent a clonal mutation diluted by normal DNA. In highly amplified loci, the same VAF can correspond to different cellular fractions because copy number changes alter expected allele proportions.
Core Formula for Raw VAF
The direct calculation is:
- VAF = ALT reads / Total reads
- Percent VAF = (ALT / Total) × 100
Where total reads means all high quality reads passing your filtering logic at that position, and ALT reads means reads supporting the variant allele after quality controls. Reference reads are simply total minus ALT.
Step by Step Process
- Confirm locus level read filters are applied consistently, including mapping quality and base quality thresholds.
- Count total usable reads at the variant site.
- Count ALT supporting reads for the candidate variant.
- Compute raw VAF from ALT divided by total depth.
- Estimate confidence intervals because sampling noise matters, especially at lower depth.
- Compare observed VAF against assay detection limits and known background error levels.
- If analyzing tumor tissue, adjust interpretation using tumor purity and copy number.
Why Confidence Intervals Matter
A reported VAF without uncertainty can be misleading. Sequencing depth creates binomial sampling variation. At shallow depth, the same true allele fraction can produce a wide range of observed values. For this reason, robust pipelines often report a confidence interval around the estimated VAF. This calculator uses a Wilson style 95% confidence interval, which performs well for proportions near 0 or 1 and for moderate sample sizes. Practical interpretation improves significantly when you can say, for example, 3.2% VAF with 95% confidence interval 2.4% to 4.3% rather than quoting a single point estimate.
Purity and Copy Number Adjustments
Raw VAF is a read level quantity, not directly a cell fraction. In tumor samples, non tumor cells dilute signal. Copy number gains or losses further distort expected allele balance. A common approximation for cancer cell fraction uses:
CCF ≈ [VAF × (purity × total copy number + (1 − purity) × 2)] / (purity × mutated copies)
Here, purity is expressed as a decimal, total copy number is the locus specific copy number in tumor cells, and mutated copies is how many tumor copies carry the variant. This formula helps distinguish clonal versus subclonal events, though full clonality analysis generally requires integrated models across many variants and segments.
| Method | Typical Depth Range | Common Practical Detection Range | Typical Use Case |
|---|---|---|---|
| Sanger Sequencing | Low to moderate read support | About 15% to 20% VAF | Orthogonal confirmation for higher VAF variants |
| Standard WES/WGS Somatic Calling | About 100x to 300x in many workflows | Often around 2% to 10% VAF depending on pipeline | Broad discovery with moderate sensitivity |
| Targeted Amplicon NGS | About 500x to 5000x+ | Roughly 0.5% to 2% VAF in validated assays | Focused panels and hotspot surveillance |
| UMI Error Corrected NGS | Often 5000x to 30000x raw reads | About 0.1% to 1% VAF depending on locus and chemistry | Low frequency somatic monitoring and MRD contexts |
| Digital PCR (ddPCR) | Partition based molecular counting | Near 0.01% to 0.1% in optimized settings | Very low VAF tracking for known variants |
Values above are widely used practical ranges seen in molecular diagnostics and translational studies. Exact performance depends on assay validation, DNA quality, locus context, and bioinformatics thresholds.
Depth and Statistical Precision: A Quantitative View
Suppose the true VAF is 5%. Sampling variation shrinks with depth according to binomial statistics. The approximate standard deviation is sqrt(p(1-p)/n), where p is true fraction and n is total depth. This directly explains why ultra deep targeted assays are better for low frequency detection.
| Total Depth (n) | Expected SD at True VAF = 5% | Approximate 95% Sampling Range | Interpretation Impact |
|---|---|---|---|
| 100 | About 2.18% | Roughly 0.7% to 9.3% | Wide uncertainty, low confidence for fine trend tracking |
| 500 | About 0.97% | Roughly 3.1% to 6.9% | Better precision for clinical trend use |
| 1000 | About 0.69% | Roughly 3.7% to 6.3% | Strong routine precision for many applications |
| 5000 | About 0.31% | Roughly 4.4% to 5.6% | High precision for low VAF longitudinal monitoring |
Common Interpretation Scenarios
- Germline heterozygous expectation: near 50% VAF in diploid regions, often with some technical spread.
- Germline homozygous expectation: near 100% ALT support if no mapping bias and adequate quality.
- Somatic clonal variant in pure diploid tumor: often around 50% for single copy heterozygous mutations.
- Somatic variant in mixed sample: VAF decreases as normal cell contamination increases.
- Copy number gain at locus: expected VAF may decrease or increase depending on which copies carry mutation.
- Subclonal architecture: lower VAF can reflect biologic heterogeneity, not just technical noise.
Quality Controls You Should Not Skip
- Set minimum depth thresholds for calling and reporting tiers.
- Use strand balance checks to avoid orientation artifacts.
- Filter problematic regions with known mapping ambiguity.
- Account for sequence context artifacts, including homopolymers and oxidative damage signatures.
- For very low VAF claims, use molecular barcodes or orthogonal confirmation when possible.
- Review batch level contamination metrics and index hopping risk in multiplexed runs.
Clinical and Research Applications
In oncology practice, VAF helps with treatment monitoring, resistance mutation tracking, and serial liquid biopsy interpretation. For hematologic malignancies, quantitative mutation trajectories can align with response depth and relapse risk. In solid tumors, VAF patterns can support clonal decomposition when integrated with purity and copy number. In pharmacogenomic and inherited disease settings, VAF can assist mosaicism assessment when high depth assays are used, although technical validation is essential before making clinical decisions.
In research, VAF is central to phylogenetic reconstruction and tumor evolution modeling. Multi region sampling combined with VAF and copy number lets teams infer truncal and branch mutations. Single time point VAF values are informative, but longitudinal sampling often delivers stronger insight by separating stable clonal drivers from therapy selected resistant subclones.
Practical Mistakes That Distort VAF
- Using total raw reads without quality filtering.
- Comparing VAF values across assays with very different error profiles as if they were equivalent.
- Ignoring tumor purity and copy number when inferring clonality.
- Over interpreting tiny changes that are inside expected confidence intervals.
- Assuming low VAF always means low biological relevance.
How to Use This Calculator Output
Use raw VAF as the foundational measurement. Then inspect the 95% confidence interval to understand statistical spread. Compare your observed VAF to assay specific limits of detection. Finally, if you are working in tumor samples, inspect the purity adjusted cancer cell fraction estimate as a biologically richer metric. If adjusted fraction exceeds 100%, that usually indicates model mismatch, inaccurate purity or copy number assumptions, or inconsistent mutation copy assignment.
Authoritative Sources and Further Reading
- National Cancer Institute (.gov): Variant allele frequency definition and context
- National Human Genome Research Institute (.gov): Sequencing technology background and performance context
- NCBI at NIH (.gov): Peer reviewed genomics and somatic variant interpretation literature
Bottom line: calculating VAF is mathematically straightforward, but interpreting VAF correctly is an expert task that combines statistics, assay physics, and tumor biology. If you apply depth aware confidence intervals, validate against method specific sensitivity, and account for purity plus copy number, your VAF interpretation will be significantly more reliable and clinically meaningful.