Calculate Variant Allele Fraction (VAF)

Estimate raw VAF, confidence interval, and purity-adjusted cancer cell fraction from sequencing read counts.

Total Depth (Total Reads)

Variant Reads (ALT)

Tumor Purity (%)

Total Local Copy Number

Mutated Copies in Tumor Cells

Assay Type

Enter values, then click Calculate VAF.

Expert Guide: How to Calculate Variant Allele Fraction Correctly

Variant allele fraction, often abbreviated VAF, is one of the most important quantitative values in modern genomics. It represents the proportion of sequencing reads that support a specific variant at a genomic position. Clinicians, molecular pathologists, translational researchers, and bioinformatics teams use VAF to interpret somatic mutations in cancer, track measurable residual disease, evaluate assay sensitivity, and prioritize variants for downstream reporting. Even though the formula is simple, high quality interpretation requires careful thinking about sequencing depth, statistical uncertainty, tumor purity, copy number state, and technology specific error rates.

At its most basic level, VAF answers this question: out of all reads that cover this locus, what fraction carries the alternate allele? If 50 out of 500 reads contain the variant, raw VAF is 10%. However, that 10% can mean very different biological scenarios depending on context. In plasma cell free DNA, 10% might indicate substantial disease burden. In tissue with low tumor purity, 10% may represent a clonal mutation diluted by normal DNA. In highly amplified loci, the same VAF can correspond to different cellular fractions because copy number changes alter expected allele proportions.

Core Formula for Raw VAF

The direct calculation is:

VAF = ALT reads / Total reads
Percent VAF = (ALT / Total) × 100

Where total reads means all high quality reads passing your filtering logic at that position, and ALT reads means reads supporting the variant allele after quality controls. Reference reads are simply total minus ALT.

Step by Step Process

Confirm locus level read filters are applied consistently, including mapping quality and base quality thresholds.
Count total usable reads at the variant site.
Count ALT supporting reads for the candidate variant.
Compute raw VAF from ALT divided by total depth.
Estimate confidence intervals because sampling noise matters, especially at lower depth.
Compare observed VAF against assay detection limits and known background error levels.
If analyzing tumor tissue, adjust interpretation using tumor purity and copy number.

Why Confidence Intervals Matter

A reported VAF without uncertainty can be misleading. Sequencing depth creates binomial sampling variation. At shallow depth, the same true allele fraction can produce a wide range of observed values. For this reason, robust pipelines often report a confidence interval around the estimated VAF. This calculator uses a Wilson style 95% confidence interval, which performs well for proportions near 0 or 1 and for moderate sample sizes. Practical interpretation improves significantly when you can say, for example, 3.2% VAF with 95% confidence interval 2.4% to 4.3% rather than quoting a single point estimate.

Purity and Copy Number Adjustments

Raw VAF is a read level quantity, not directly a cell fraction. In tumor samples, non tumor cells dilute signal. Copy number gains or losses further distort expected allele balance. A common approximation for cancer cell fraction uses:

CCF ≈ [VAF × (purity × total copy number + (1 − purity) × 2)] / (purity × mutated copies)

Here, purity is expressed as a decimal, total copy number is the locus specific copy number in tumor cells, and mutated copies is how many tumor copies carry the variant. This formula helps distinguish clonal versus subclonal events, though full clonality analysis generally requires integrated models across many variants and segments.

Method	Typical Depth Range	Common Practical Detection Range	Typical Use Case
Sanger Sequencing	Low to moderate read support	About 15% to 20% VAF	Orthogonal confirmation for higher VAF variants
Standard WES/WGS Somatic Calling	About 100x to 300x in many workflows	Often around 2% to 10% VAF depending on pipeline	Broad discovery with moderate sensitivity
Targeted Amplicon NGS	About 500x to 5000x+	Roughly 0.5% to 2% VAF in validated assays	Focused panels and hotspot surveillance
UMI Error Corrected NGS	Often 5000x to 30000x raw reads	About 0.1% to 1% VAF depending on locus and chemistry	Low frequency somatic monitoring and MRD contexts
Digital PCR (ddPCR)	Partition based molecular counting	Near 0.01% to 0.1% in optimized settings	Very low VAF tracking for known variants

Values above are widely used practical ranges seen in molecular diagnostics and translational studies. Exact performance depends on assay validation, DNA quality, locus context, and bioinformatics thresholds.

Depth and Statistical Precision: A Quantitative View

Suppose the true VAF is 5%. Sampling variation shrinks with depth according to binomial statistics. The approximate standard deviation is sqrt(p(1-p)/n), where p is true fraction and n is total depth. This directly explains why ultra deep targeted assays are better for low frequency detection.

Total Depth (n)	Expected SD at True VAF = 5%	Approximate 95% Sampling Range	Interpretation Impact
100	About 2.18%	Roughly 0.7% to 9.3%	Wide uncertainty, low confidence for fine trend tracking
500	About 0.97%	Roughly 3.1% to 6.9%	Better precision for clinical trend use
1000	About 0.69%	Roughly 3.7% to 6.3%	Strong routine precision for many applications
5000	About 0.31%	Roughly 4.4% to 5.6%	High precision for low VAF longitudinal monitoring

Common Interpretation Scenarios

Germline heterozygous expectation: near 50% VAF in diploid regions, often with some technical spread.
Germline homozygous expectation: near 100% ALT support if no mapping bias and adequate quality.
Somatic clonal variant in pure diploid tumor: often around 50% for single copy heterozygous mutations.
Somatic variant in mixed sample: VAF decreases as normal cell contamination increases.
Copy number gain at locus: expected VAF may decrease or increase depending on which copies carry mutation.
Subclonal architecture: lower VAF can reflect biologic heterogeneity, not just technical noise.

Quality Controls You Should Not Skip

Set minimum depth thresholds for calling and reporting tiers.
Use strand balance checks to avoid orientation artifacts.
Filter problematic regions with known mapping ambiguity.
Account for sequence context artifacts, including homopolymers and oxidative damage signatures.
For very low VAF claims, use molecular barcodes or orthogonal confirmation when possible.
Review batch level contamination metrics and index hopping risk in multiplexed runs.

Clinical and Research Applications

In oncology practice, VAF helps with treatment monitoring, resistance mutation tracking, and serial liquid biopsy interpretation. For hematologic malignancies, quantitative mutation trajectories can align with response depth and relapse risk. In solid tumors, VAF patterns can support clonal decomposition when integrated with purity and copy number. In pharmacogenomic and inherited disease settings, VAF can assist mosaicism assessment when high depth assays are used, although technical validation is essential before making clinical decisions.

In research, VAF is central to phylogenetic reconstruction and tumor evolution modeling. Multi region sampling combined with VAF and copy number lets teams infer truncal and branch mutations. Single time point VAF values are informative, but longitudinal sampling often delivers stronger insight by separating stable clonal drivers from therapy selected resistant subclones.

Practical Mistakes That Distort VAF

Using total raw reads without quality filtering.
Comparing VAF values across assays with very different error profiles as if they were equivalent.
Ignoring tumor purity and copy number when inferring clonality.
Over interpreting tiny changes that are inside expected confidence intervals.
Assuming low VAF always means low biological relevance.

How to Use This Calculator Output

Use raw VAF as the foundational measurement. Then inspect the 95% confidence interval to understand statistical spread. Compare your observed VAF to assay specific limits of detection. Finally, if you are working in tumor samples, inspect the purity adjusted cancer cell fraction estimate as a biologically richer metric. If adjusted fraction exceeds 100%, that usually indicates model mismatch, inaccurate purity or copy number assumptions, or inconsistent mutation copy assignment.

Authoritative Sources and Further Reading

Bottom line: calculating VAF is mathematically straightforward, but interpreting VAF correctly is an expert task that combines statistics, assay physics, and tumor biology. If you apply depth aware confidence intervals, validate against method specific sensitivity, and account for purity plus copy number, your VAF interpretation will be significantly more reliable and clinically meaningful.