Safe Failure Fraction Calculator
Compute SFF, diagnostic coverage, and expected failure distribution for functional safety decisions.
Calculator Inputs
Results
How to Calculate Safe Failure Fraction: Expert Guide for Practical Functional Safety Engineering
Safe Failure Fraction, usually abbreviated as SFF, is one of the most important metrics in hardware functional safety. If you design, evaluate, or procure safety related systems, you will often see SFF used alongside terms such as dangerous undetected failures, diagnostic coverage, proof test interval, and SIL capability. At a practical level, SFF tells you how much of your total failure behavior is either inherently safe or at least detected before it can create an uncontrolled hazardous event. The higher the SFF, the smaller the portion of hidden dangerous behavior in your device architecture.
In many industries, teams overfocus on a single headline percentage and underfocus on data quality, assumptions, and proof test realism. A better approach is to treat SFF as a decision metric inside a broader safety lifecycle. That means combining failure mode analysis, realistic diagnostic claims, tested failure rates, and periodic validation. This guide walks through the full method in a way that can be used by controls engineers, reliability specialists, process safety teams, and compliance auditors.
Core definition and formula
The classic hardware level expression is:
SFF (%) = (lambda S + lambda DD) / (lambda S + lambda DD + lambda DU) x 100
- lambda S: safe failure rate. Failures that do not lead to a dangerous state.
- lambda DD: dangerous detected failure rate. Hazardous failures that diagnostics can identify.
- lambda DU: dangerous undetected failure rate. Hazardous failures not revealed by diagnostics in normal operation.
You can enter these rates in failures per hour, or in FIT. One FIT is one failure per one billion hours. The calculator above accepts either unit and normalizes internally. The interpretation remains the same: more contribution from safe and detected modes means higher SFF, while high lambda DU pulls SFF down.
Why SFF matters in real projects
A high SFF typically indicates that your architecture and diagnostics are doing a good job of preventing hidden dangerous behavior. In standards based design, this influences hardware fault tolerance expectations and achievable integrity claims. But SFF is not a substitute for complete risk assessment. You still need demand rate assumptions, common cause analysis, proof testing strategy, and operational constraints.
In practice, teams use SFF in three ways:
- Concept and architecture screening: compare sensor technologies, final elements, or logic solver options before final design freeze.
- Detailed verification: justify safety function capability with documented failure mode split and diagnostics.
- Lifecycle governance: monitor whether field behavior and maintenance quality support original assumptions.
Step by step calculation workflow
- Build a failure mode inventory. Start with FMEDA or equivalent evidence. Every known hardware failure mode should be classified as safe, dangerous detected, or dangerous undetected.
- Assign rates with traceability. Pull base failure rates from validated sources, vendor reliability data, stress based estimates, or field feedback. Keep assumptions explicit.
- Normalize units. Convert all rates to the same basis before adding them. If using FIT, divide by 1,000,000,000 to get failures per hour.
- Compute SFF. Apply the formula and document the exact numbers used.
- Compute diagnostic coverage as companion metric. Diagnostic coverage can be estimated as lambda DD / (lambda DD + lambda DU).
- Check proof test realism. Even with good SFF, long or poorly executed proof testing can leave unacceptable latent risk.
- Validate against operational context. Duty cycle, environmental stress, bypass behavior, and maintenance discipline can materially change practical risk.
Worked example
Assume a transmitter subsystem with the following rates in FIT:
- lambda S = 120 FIT
- lambda DD = 60 FIT
- lambda DU = 20 FIT
Total lambda = 200 FIT. SFF = (120 + 60) / 200 x 100 = 90%. Diagnostic coverage = 60 / (60 + 20) = 75%. If mission time is 8,760 hours (one year), expected dangerous undetected events are still low in absolute terms, but they are not zero. That is why proof tests, alarms, and independent layers remain essential.
What counts as a strong SFF value
Engineers often classify SFF qualitatively:
- Below 60%: high concern. Hidden dangerous fraction is usually too large for demanding applications.
- 60% to less than 90%: moderate. May be acceptable in lower demand contexts with stronger compensating measures.
- 90% to less than 99%: strong. Often seen in mature designs with diagnostics and disciplined maintenance.
- 99% and above: very strong on paper, but verify assumptions and test coverage carefully.
These ranges are screening guides. Formal acceptance still depends on applicable standards, architecture constraints, demand mode, and documented lifecycle evidence.
Comparison table: public safety indicators that motivate robust failure management
| Domain metric | Recent value | Why it matters to SFF thinking | Public source |
|---|---|---|---|
| US private industry recordable injury incidence rate | 2.4 cases per 100 full-time workers (2023) | Demonstrates that even mature systems need continuous risk controls and prevention design. | BLS IIF |
| US manufacturing recordable injury incidence rate | 2.8 cases per 100 full-time workers (2023) | Industrial operations remain sensitive to equipment reliability and safety instrumentation quality. | BLS IIF |
| US occupational fatal injuries | 5,283 fatalities (2023) | Highlights why latent dangerous failures must be aggressively minimized in safety-critical design. | BLS Census of Fatal Occupational Injuries |
| US roadway fatalities | 42,514 deaths (2022) | Large scale safety outcomes reinforce the importance of fail-safe and detectable failure behavior. | NHTSA |
Statistics above are included to provide broader risk context for engineering decision makers. Always check the latest releases when preparing compliance packages.
Comparison table: SFF interpretation and design focus
| SFF band | Hidden dangerous share | Typical engineering priority | Operational implication |
|---|---|---|---|
| Less than 60% | High | Add diagnostics, redesign weak components, reduce single point vulnerabilities. | High dependence on external protection layers and strict maintenance controls. |
| 60% to less than 90% | Moderate | Increase diagnostic test depth and improve fault detection latency. | May be feasible with conservative proof tests and architecture constraints. |
| 90% to less than 99% | Low | Maintain quality of diagnostics and protect against common cause failures. | Often aligns with robust safety function design when other requirements are met. |
| 99% and above | Very low | Challenge assumptions, verify claims under field conditions, avoid overconfidence. | Strong paper performance, but lifecycle governance remains mandatory. |
Frequent mistakes and how to avoid them
- Mixing units: some rates entered as FIT, others as failures per hour. Always normalize first.
- Double counting diagnostics: one failure mode should belong to one category only.
- Ignoring maintenance quality: poor proof testing can erase theoretical SFF benefits.
- Assuming vendor data is universal: environment, duty cycle, and integration details matter.
- Treating SFF as final compliance proof: SFF is necessary in many cases, but never sufficient alone.
How to improve safe failure fraction in existing equipment
- Increase on-line diagnostics and self test coverage for dangerous fault modes.
- Improve fault annunciation so dangerous detected faults are quickly acted on.
- Reduce common cause exposure through separation, diversity, and environmental controls.
- Use higher quality components in dominant dangerous undetected pathways.
- Shorten proof test intervals where operationally practical.
- Strengthen technician procedures and closed loop maintenance feedback.
- Continuously recalibrate assumptions with field return and incident data.
Authority resources for deeper validation
- NIST Engineering Statistics Handbook (.gov)
- US Bureau of Labor Statistics Injury and Illness Data (.gov)
- NHTSA Research and Traffic Safety Data (.gov)
Final takeaways
If you remember only one thing, remember this: safe failure fraction is a ratio of failure quality, not just failure quantity. Two devices can have similar total failure rates but very different risk profiles depending on how much of that failure behavior is safe or diagnosable. Use the calculator to quantify your current design, then use the results to drive practical improvements in diagnostics, testing, and lifecycle controls. The strongest safety programs combine sound mathematics with disciplined operational execution.