FST Calculator Between Two Populations
Enter allele frequencies for the same loci in Population 1 and Population 2. The calculator uses the heterozygosity framework: FST = (HT – HS) / HT.
Results
Click Calculate FST to see locus-level and global estimates.
How to Calculate FST Between Two Populations: Expert Guide
FST, often called the fixation index, is one of the most important statistics in population genetics for quantifying how genetically differentiated populations are from each other. If you are comparing two populations, FST gives you a compact way to summarize whether allele frequencies are similar or divergent. In practical terms, an FST of 0 means the populations are genetically identical at the loci analyzed, while higher values indicate increasing structure and reduced sharing of genetic variation.
This guide walks you through the conceptual meaning, the equations, a step-by-step manual workflow, interpretation boundaries, and common pitfalls that cause incorrect estimates. You can use the calculator above for rapid multi-locus analysis, then use this section to understand what the numbers mean and when they can be trusted.
Why FST Matters in Real Research
Researchers use FST to answer questions such as: Are populations isolated? Is there evidence for migration barriers? Do candidate loci show unusually high differentiation consistent with local adaptation? Is conservation management mixing distinct units? In human genetics, FST supports discussions about population history and ancestry clines. In wildlife and plant studies, it helps define management units and infer gene flow.
- Conservation genetics: Distinguish fragmented subpopulations and prioritize corridors.
- Evolutionary biology: Evaluate drift, migration, and selection across landscapes.
- Medical and human genomics: Characterize background population structure in association studies.
- Agriculture: Compare breeding lines and landraces for diversity partitioning.
Core Formula for Two Populations
For a biallelic locus with allele frequency p1 in Population 1 and p2 in Population 2:
- Calculate expected heterozygosity in each population:
- H1 = 2p1(1 – p1)
- H2 = 2p2(1 – p2)
- Compute HS, the average within-population heterozygosity. For equal weighting: HS = (H1 + H2) / 2. For sample-size weighting: HS = (n1H1 + n2H2) / (n1 + n2).
- Compute the pooled allele frequency pbar:
- Unweighted: pbar = (p1 + p2) / 2
- Weighted: pbar = (n1p1 + n2p2) / (n1 + n2)
- Compute total heterozygosity: HT = 2pbar(1 – pbar).
- Compute FST: FST = (HT – HS) / HT, as long as HT is greater than 0.
If HT equals zero, there is no variation at that locus in the pooled sample, so locus-level FST is typically set to 0 or treated as undefined for reporting.
Step-by-Step Example by Hand
Assume one SNP where p1 = 0.20 and p2 = 0.50, and sample sizes are equal:
- H1 = 2(0.20)(0.80) = 0.32
- H2 = 2(0.50)(0.50) = 0.50
- HS = (0.32 + 0.50)/2 = 0.41
- pbar = (0.20 + 0.50)/2 = 0.35
- HT = 2(0.35)(0.65) = 0.455
- FST = (0.455 – 0.41)/0.455 = 0.0989
An FST near 0.10 for that locus suggests moderate differentiation at that marker. You should never interpret a single SNP in isolation for broad conclusions. Multi-locus and genome-wide context are essential.
How to Aggregate Across Loci Correctly
When you have many loci, there are multiple averaging choices. The most stable practical strategy is to aggregate heterozygosity terms first, then compute a global estimate:
- Compute HS and HT for each locus.
- Average HS across loci and average HT across loci.
- Compute global FST = (mean HT – mean HS) / mean HT.
This avoids distortions that happen when averaging ratios directly. Your calculator above reports per-locus values and a global estimate derived from aggregate heterozygosity components.
Interpreting Magnitude: Useful Benchmarks
Interpretation depends on species, marker type, demographic history, and sampling design, but classic guideposts are still widely used:
- 0.00 to 0.05: little differentiation
- 0.05 to 0.15: moderate differentiation
- 0.15 to 0.25: great differentiation
- above 0.25: very great differentiation
These are rough bins, not strict laws. In high-dispersal marine species, FST around 0.02 may already indicate meaningful structure. In strongly subdivided plant systems, values above 0.30 can be common.
Comparison Table: Human Population-Scale Pairwise FST (Approximate Genome-Wide Values)
The following values are representative ranges frequently reported in large-scale human datasets and synthesis papers. Exact numbers depend on marker filtering, estimator, and population definitions.
| Population Pair (Broad Groups) | Approximate Genome-Wide FST | Interpretation |
|---|---|---|
| African vs European | 0.10 to 0.13 | Moderate to high differentiation |
| African vs East Asian | 0.13 to 0.16 | High differentiation in global human context |
| European vs East Asian | 0.08 to 0.11 | Moderate differentiation |
| European vs South Asian | 0.03 to 0.06 | Low to moderate differentiation |
| East Asian vs South Asian | 0.05 to 0.09 | Moderate differentiation |
Comparison Table: Typical FST Ranges Across Organisms (Published Study Ranges)
| Organism / System | Reported FST Range | General Context |
|---|---|---|
| Humans (continental-scale comparisons) | ~0.05 to 0.16 | Structure shaped by migration history and geography |
| Drosophila melanogaster (regional to continental contrasts) | ~0.08 to 0.20 | Demography and local adaptation both contribute |
| Arabidopsis thaliana (regional ecotypes) | ~0.20 to 0.70 | Strong subdivision can occur with selfing and geography |
| Salmonid river populations | ~0.01 to 0.15 | Fine-scale natal homing can create measurable structure |
Common Mistakes That Inflate or Deflate FST
- Using mismatched loci: Population 1 and Population 2 must reference the exact same marker order.
- Ignoring sample size imbalance: Weighted estimates can be more stable when n differs strongly.
- Combining low-quality genotypes: Missingness and genotyping error can bias allele frequencies.
- Interpreting negative FST literally: Small negative values are usually sampling noise in finite samples.
- Over-interpreting small datasets: A handful of loci cannot represent genome-wide structure reliably.
- Ignoring ascertainment bias: SNP panel design may distort differentiation patterns across populations.
Estimator Notes: Nei, Weir and Cockerham, and Hudson
The equation used in this calculator follows the heterozygosity partitioning logic that is often introduced as Nei-style GST/FST for biallelic loci. In many research pipelines, you may instead use Weir and Cockerham theta or Hudson’s estimator, especially for sequence data and unequal samples. These alternatives differ in bias properties and finite-sample behavior but target related concepts of population differentiation. For publication-grade analysis, explicitly report the estimator, filtering thresholds, confidence intervals, and bootstrap strategy.
Best Practices for Publication-Quality FST Analysis
- Use large marker panels after strict quality control.
- Report both per-locus and genome-wide summaries.
- Provide confidence intervals using block-bootstrap or jackknife methods.
- Include sensitivity analyses across minor allele frequency cutoffs.
- Cross-check structure with PCA, ADMIXTURE-like models, or clustering methods.
- Interpret FST jointly with demographic and ecological context, not as a standalone proof of adaptation.
Authoritative Sources for Deeper Reading
For official and academic references on fixation index and population differentiation, start with these resources:
- U.S. National Human Genome Research Institute (genome.gov): Fixation Index (FST)
- NCBI / NIH (nih.gov): Population genetics and F-statistics overview article
- University of California, Berkeley (.edu): Population genetics educational resource
Practical takeaway: FST is most informative when computed across many loci, with transparent estimator choice, careful QC, and context-aware interpretation. Use the calculator for rapid exploration, then validate with full population-genetic workflows for final conclusions.