How To Calculate The Fraction Of Missing Information

Fraction of Missing Information Calculator

Estimate how much uncertainty in your pooled estimate is due to missing data using multiple imputation metrics.

Formula uses pooled total variance: T = W + (1 + 1/m)B.

How to Calculate the Fraction of Missing Information (FMI): An Expert Practical Guide

The fraction of missing information, usually abbreviated as FMI, is one of the most useful diagnostics in modern missing-data analysis. If you already use multiple imputation, FMI tells you something deeper than simple percent missing. It measures the share of uncertainty in your final estimate that comes from not observing all values. Two datasets can both have 20% missing values, yet have very different FMI if the missingness pattern, model strength, and variable relationships differ.

In practical terms, FMI helps you answer a decision-making question: How much of my inferential uncertainty is driven by missingness rather than natural sampling variation? A low FMI means most uncertainty is intrinsic to the data-generating process. A high FMI means missingness is a major source of inferential instability, and results can change materially depending on imputation assumptions.

Core Concept: Why FMI Is Better Than Percent Missing Alone

Percent missing is descriptive. FMI is inferential. Percent missing tells you how many data points are absent. FMI tells you how those absences propagate into confidence intervals, hypothesis tests, and model precision after pooling across imputations. This is why FMI is routinely reported in strong multiple-imputation workflows, particularly in epidemiology, biostatistics, economics, and social science.

  • Percent missing: raw data completeness metric.
  • FMI: variance-based inferential metric tied to pooled uncertainty.
  • Use case: deciding whether to increase imputations, run sensitivity analyses, or strengthen auxiliary variables.

The Formula You Need

Under Rubin’s combining rules, let:

  • m = number of imputations
  • W = average within-imputation variance
  • B = between-imputation variance
  • T = pooled total variance = W + (1 + 1/m)B

Then the large-sample FMI is:

FMI = ((1 + 1/m)B) / T

This can be interpreted as the proportion of total variance attributable to missingness. If FMI = 0.30, then about 30% of your uncertainty is due to incomplete information.

Finite-m Adjustment (Optional but Useful)

When m is not very large, some analysts prefer a finite-m adjusted form based on the relative increase in variance:

  • r = ((1 + 1/m)B) / W
  • df ≈ (m – 1)(1 + 1/r)2
  • Adjusted FMI ≈ (r + 2/(df + 3)) / (r + 1)

This usually gives a slightly larger value than the large-sample formula when m is small.

Step-by-Step Manual Calculation

  1. Fit your analysis model in each imputed dataset.
  2. Extract each model’s variance estimate for the parameter of interest.
  3. Compute W as the average of those variances.
  4. Compute B as the variance of the parameter estimates across imputations.
  5. Choose m (the number of imputations already run).
  6. Compute T = W + (1 + 1/m)B.
  7. Compute FMI = ((1 + 1/m)B)/T.
  8. Report FMI as fraction or percentage and interpret in context.

Worked Example

Suppose you have m = 20 imputations for a regression coefficient. From pooling diagnostics, you obtain W = 0.85 and B = 0.22.

  • Adjustment factor: (1 + 1/m) = 1.05
  • Adjusted between component: 1.05 × 0.22 = 0.231
  • Total variance: T = 0.85 + 0.231 = 1.081
  • FMI = 0.231 / 1.081 = 0.2137

Interpretation: roughly 21.4% of uncertainty in this parameter estimate is attributable to missing information.

Comparison Table 1: Relative Efficiency by m and FMI

A useful companion metric is relative efficiency (RE), computed as RE = 1 / (1 + FMI/m). This helps choose how many imputations are enough for stable inference.

FMI m = 5 m = 10 m = 20 m = 40
0.10 0.980 0.990 0.995 0.998
0.30 0.943 0.971 0.985 0.993
0.50 0.909 0.952 0.976 0.988
0.70 0.877 0.935 0.966 0.983

These values are exact outputs from the RE formula. They show why high-FMI settings benefit from larger m. If FMI is near 0.50 or higher, using only 5 imputations can leave noticeable Monte Carlo loss in efficiency.

Comparison Table 2: Public-Sector Nonresponse Statistics That Motivate FMI Analysis

Missing information is not a niche issue. Major U.S. data systems routinely face nonresponse and incomplete observations. The following published statistics illustrate why analysts should quantify uncertainty due to missingness, not just list missing counts.

Survey / Program Reported Statistic Latest Public Figure (from source pages) Why FMI Matters
2020 U.S. Census National self-response rate 66.8% Substantial non-self-response requires follow-up and modeling; uncertainty decomposition is critical.
National Health Interview Survey (CDC) Sample adult response rate About 47% in recent release years Health estimates can be sensitive to item and unit nonresponse assumptions.
Household Pulse Survey (Census) Weekly/phase response rates Single-digit percentages in several phases Low response intensifies risk that missingness influences precision and bias.

These numbers are drawn from federal technical documentation and release pages and are included to demonstrate scale. Always verify the exact current value for your cycle and instrument because rates vary by year, phase, and component.

How to Interpret FMI in Practice

Rule-of-thumb ranges

  • FMI under 0.10: missingness contributes modestly to uncertainty.
  • FMI 0.10 to 0.30: meaningful but often manageable with robust imputation design.
  • FMI 0.30 to 0.50: inference is materially dependent on missing-data handling.
  • FMI above 0.50: missingness is dominant; sensitivity analyses are strongly recommended.

These are pragmatic guidelines, not hard cutoffs. Context matters. In high-stakes domains such as clinical outcomes or policy targeting, even moderate FMI can justify deeper robustness checks.

Common Analyst Mistakes

  1. Confusing FMI with percent missing. They are related but not interchangeable.
  2. Using too few imputations when FMI is high. Increase m to improve stability.
  3. Ignoring model compatibility. Imputation and analysis models should align on key structure and transformations.
  4. Skipping diagnostics by parameter. FMI differs across coefficients, not just across datasets.
  5. Reporting pooled estimates without uncertainty decomposition. Include W, B, T, and FMI where possible.

Improving FMI Outcomes

You cannot always reduce missingness after data collection, but you can improve inferential quality:

  • Add strong auxiliary predictors in imputation models.
  • Respect variable distributions and constraints (bounds, skew, categories).
  • Increase m when FMI is elevated.
  • Run sensitivity analyses under plausible MNAR scenarios.
  • Document assumptions transparently.

MCAR, MAR, MNAR and Why FMI Alone Is Not a Bias Test

FMI quantifies uncertainty inflation due to missingness, but it does not prove your estimates are unbiased. Under MCAR or MAR with a correctly specified imputation model, pooled inference is often reliable. Under MNAR, even small FMI can coexist with meaningful bias if missingness depends on unobserved values in ways your model does not capture.

Treat FMI as one key diagnostic in a broader missing-data workflow that includes design knowledge, missingness mechanism reasoning, model checks, and sensitivity analysis.

Recommended Reporting Template

  1. State missing-data handling strategy (for example, multiple imputation with chained equations).
  2. Report m, average W, between B, and total T for principal parameters.
  3. Report FMI per key parameter, not only overall.
  4. Report relative efficiency and rationale for chosen m.
  5. Include at least one sensitivity analysis if FMI is moderate or high.

Authoritative References and Data Sources

For technical standards, federal survey documentation, and training material, review:

Final Takeaway

If you are serious about valid inference with incomplete data, calculating FMI should be standard. It translates missingness into a directly interpretable variance share and helps you choose imputation depth, communicate uncertainty honestly, and prioritize sensitivity checks. Use the calculator above to get quick estimates, then pair those results with domain-informed model diagnostics and transparent reporting.

Leave a Reply

Your email address will not be published. Required fields are marked *