How To Calculate Fst Between Two Populations

FST Calculator Between Two Populations

Enter allele frequencies for the same loci in Population 1 and Population 2. The calculator uses the heterozygosity framework: FST = (HT – HS) / HT.

Each value must be between 0 and 1. Example: 0.12, 0.35, 0.48
Must contain the same number of loci as Population 1.

Results

Click Calculate FST to see locus-level and global estimates.

How to Calculate FST Between Two Populations: Expert Guide

FST, often called the fixation index, is one of the most important statistics in population genetics for quantifying how genetically differentiated populations are from each other. If you are comparing two populations, FST gives you a compact way to summarize whether allele frequencies are similar or divergent. In practical terms, an FST of 0 means the populations are genetically identical at the loci analyzed, while higher values indicate increasing structure and reduced sharing of genetic variation.

This guide walks you through the conceptual meaning, the equations, a step-by-step manual workflow, interpretation boundaries, and common pitfalls that cause incorrect estimates. You can use the calculator above for rapid multi-locus analysis, then use this section to understand what the numbers mean and when they can be trusted.

Why FST Matters in Real Research

Researchers use FST to answer questions such as: Are populations isolated? Is there evidence for migration barriers? Do candidate loci show unusually high differentiation consistent with local adaptation? Is conservation management mixing distinct units? In human genetics, FST supports discussions about population history and ancestry clines. In wildlife and plant studies, it helps define management units and infer gene flow.

  • Conservation genetics: Distinguish fragmented subpopulations and prioritize corridors.
  • Evolutionary biology: Evaluate drift, migration, and selection across landscapes.
  • Medical and human genomics: Characterize background population structure in association studies.
  • Agriculture: Compare breeding lines and landraces for diversity partitioning.

Core Formula for Two Populations

For a biallelic locus with allele frequency p1 in Population 1 and p2 in Population 2:

  1. Calculate expected heterozygosity in each population:
    • H1 = 2p1(1 – p1)
    • H2 = 2p2(1 – p2)
  2. Compute HS, the average within-population heterozygosity. For equal weighting: HS = (H1 + H2) / 2. For sample-size weighting: HS = (n1H1 + n2H2) / (n1 + n2).
  3. Compute the pooled allele frequency pbar:
    • Unweighted: pbar = (p1 + p2) / 2
    • Weighted: pbar = (n1p1 + n2p2) / (n1 + n2)
  4. Compute total heterozygosity: HT = 2pbar(1 – pbar).
  5. Compute FST: FST = (HT – HS) / HT, as long as HT is greater than 0.

If HT equals zero, there is no variation at that locus in the pooled sample, so locus-level FST is typically set to 0 or treated as undefined for reporting.

Step-by-Step Example by Hand

Assume one SNP where p1 = 0.20 and p2 = 0.50, and sample sizes are equal:

  • H1 = 2(0.20)(0.80) = 0.32
  • H2 = 2(0.50)(0.50) = 0.50
  • HS = (0.32 + 0.50)/2 = 0.41
  • pbar = (0.20 + 0.50)/2 = 0.35
  • HT = 2(0.35)(0.65) = 0.455
  • FST = (0.455 – 0.41)/0.455 = 0.0989

An FST near 0.10 for that locus suggests moderate differentiation at that marker. You should never interpret a single SNP in isolation for broad conclusions. Multi-locus and genome-wide context are essential.

How to Aggregate Across Loci Correctly

When you have many loci, there are multiple averaging choices. The most stable practical strategy is to aggregate heterozygosity terms first, then compute a global estimate:

  1. Compute HS and HT for each locus.
  2. Average HS across loci and average HT across loci.
  3. Compute global FST = (mean HT – mean HS) / mean HT.

This avoids distortions that happen when averaging ratios directly. Your calculator above reports per-locus values and a global estimate derived from aggregate heterozygosity components.

Interpreting Magnitude: Useful Benchmarks

Interpretation depends on species, marker type, demographic history, and sampling design, but classic guideposts are still widely used:

  • 0.00 to 0.05: little differentiation
  • 0.05 to 0.15: moderate differentiation
  • 0.15 to 0.25: great differentiation
  • above 0.25: very great differentiation

These are rough bins, not strict laws. In high-dispersal marine species, FST around 0.02 may already indicate meaningful structure. In strongly subdivided plant systems, values above 0.30 can be common.

Comparison Table: Human Population-Scale Pairwise FST (Approximate Genome-Wide Values)

The following values are representative ranges frequently reported in large-scale human datasets and synthesis papers. Exact numbers depend on marker filtering, estimator, and population definitions.

Population Pair (Broad Groups) Approximate Genome-Wide FST Interpretation
African vs European 0.10 to 0.13 Moderate to high differentiation
African vs East Asian 0.13 to 0.16 High differentiation in global human context
European vs East Asian 0.08 to 0.11 Moderate differentiation
European vs South Asian 0.03 to 0.06 Low to moderate differentiation
East Asian vs South Asian 0.05 to 0.09 Moderate differentiation

Comparison Table: Typical FST Ranges Across Organisms (Published Study Ranges)

Organism / System Reported FST Range General Context
Humans (continental-scale comparisons) ~0.05 to 0.16 Structure shaped by migration history and geography
Drosophila melanogaster (regional to continental contrasts) ~0.08 to 0.20 Demography and local adaptation both contribute
Arabidopsis thaliana (regional ecotypes) ~0.20 to 0.70 Strong subdivision can occur with selfing and geography
Salmonid river populations ~0.01 to 0.15 Fine-scale natal homing can create measurable structure

Common Mistakes That Inflate or Deflate FST

  1. Using mismatched loci: Population 1 and Population 2 must reference the exact same marker order.
  2. Ignoring sample size imbalance: Weighted estimates can be more stable when n differs strongly.
  3. Combining low-quality genotypes: Missingness and genotyping error can bias allele frequencies.
  4. Interpreting negative FST literally: Small negative values are usually sampling noise in finite samples.
  5. Over-interpreting small datasets: A handful of loci cannot represent genome-wide structure reliably.
  6. Ignoring ascertainment bias: SNP panel design may distort differentiation patterns across populations.

Estimator Notes: Nei, Weir and Cockerham, and Hudson

The equation used in this calculator follows the heterozygosity partitioning logic that is often introduced as Nei-style GST/FST for biallelic loci. In many research pipelines, you may instead use Weir and Cockerham theta or Hudson’s estimator, especially for sequence data and unequal samples. These alternatives differ in bias properties and finite-sample behavior but target related concepts of population differentiation. For publication-grade analysis, explicitly report the estimator, filtering thresholds, confidence intervals, and bootstrap strategy.

Best Practices for Publication-Quality FST Analysis

  • Use large marker panels after strict quality control.
  • Report both per-locus and genome-wide summaries.
  • Provide confidence intervals using block-bootstrap or jackknife methods.
  • Include sensitivity analyses across minor allele frequency cutoffs.
  • Cross-check structure with PCA, ADMIXTURE-like models, or clustering methods.
  • Interpret FST jointly with demographic and ecological context, not as a standalone proof of adaptation.

Authoritative Sources for Deeper Reading

For official and academic references on fixation index and population differentiation, start with these resources:

Practical takeaway: FST is most informative when computed across many loci, with transparent estimator choice, careful QC, and context-aware interpretation. Use the calculator for rapid exploration, then validate with full population-genetic workflows for final conclusions.

Leave a Reply

Your email address will not be published. Required fields are marked *