Interactive IR Metric Tool

Calculate Mean Average Precision Example

Use this premium calculator to compute Average Precision for multiple ranked result lists and then derive Mean Average Precision (mAP). Enter one query per line using binary relevance values such as 1,0,1,1 and optionally provide the true number of relevant documents for each query.

mAP Calculator

Ranked relevance lists

Enter one query per line. Use only 1 for relevant and 0 for non-relevant in ranked order from top result to lower result.

Optional total relevant counts per query

If left blank for a line, the calculator uses the number of 1s in that ranking as the total relevant count. This matters when the ranked list does not contain every relevant document.

Formula used: AP = (sum of Precision@k at each relevant hit) / total relevant documents, and mAP = mean of AP across all queries.

Results

Queries 0

mAP 0.0000

Add your ranked lists and click Calculate mAP to see per-query Average Precision, a worked breakdown, and a chart.

How to Calculate Mean Average Precision: A Deep Example-Driven Guide

If you are searching for a practical way to calculate mean average precision example, you are usually working in information retrieval, search relevance evaluation, recommendation systems, ranking models, or machine learning for document retrieval. Mean Average Precision, commonly written as mAP, is one of the most useful ranking metrics because it does more than count how many relevant items appear. It also measures where those relevant items appear in the ranked list. That position-sensitive behavior makes it especially valuable when the order of results matters.

In plain terms, mean average precision rewards systems that place relevant results near the top. Two search engines might return the same number of relevant documents, but the engine that ranks them earlier should receive a better score. That is exactly where mAP becomes powerful. It combines precision values at each relevant position and averages them across multiple queries so you can evaluate system quality with more nuance.

What Mean Average Precision Actually Measures

Before computing mAP, it helps to understand its two parts:

Average Precision (AP): Calculated for a single query. It averages the precision at every rank where a relevant item appears.
Mean Average Precision (mAP): Calculated across many queries. It is simply the arithmetic mean of all AP scores.

Suppose a query returns five ranked documents and the relevance labels in order are 1, 0, 1, 1, 0. The relevant results appear at ranks 1, 3, and 4. To calculate AP, you compute precision at each of those relevant ranks:

Rank 1: 1 relevant out of 1 retrieved = 1.0000
Rank 3: 2 relevant out of 3 retrieved = 0.6667
Rank 4: 3 relevant out of 4 retrieved = 0.7500

If the total number of relevant documents for that query is 3, then:

AP = (1.0000 + 0.6667 + 0.7500) / 3 = 0.8056

That single-query score is already informative. It says that relevant results were found and were positioned reasonably high in the ranking. But real evaluation almost always uses multiple queries. Once you calculate AP for each query, you average them to get mAP.

Step-by-Step Mean Average Precision Example

Let us walk through a more complete example using three queries. This mirrors the kind of input accepted by the calculator above.

Query	Ranked Relevance List	Total Relevant Documents	Relevant Ranks
Q1	1, 0, 1, 1, 0	3	1, 3, 4
Q2	0, 1, 1, 0, 1	4	2, 3, 5
Q3	1, 1, 0, 0, 1	3	1, 2, 5

Now calculate AP for each query.

Query 1 AP Calculation

For Q1, the ranking is 1, 0, 1, 1, 0.

Precision at rank 1 = 1/1 = 1.0000
Precision at rank 3 = 2/3 = 0.6667
Precision at rank 4 = 3/4 = 0.7500

AP(Q1) = (1.0000 + 0.6667 + 0.7500) / 3 = 0.8056

Query 2 AP Calculation

For Q2, the ranking is 0, 1, 1, 0, 1, but the total number of relevant documents is 4. That means one relevant document exists but was not retrieved in the shown ranking.

Precision at rank 2 = 1/2 = 0.5000
Precision at rank 3 = 2/3 = 0.6667
Precision at rank 5 = 3/5 = 0.6000

AP(Q2) = (0.5000 + 0.6667 + 0.6000) / 4 = 0.4417

This is an important example because it shows why the denominator matters. If you divided by only the retrieved relevant results, you would inflate the score and hide the fact that the system missed one relevant item entirely.

Query 3 AP Calculation

For Q3, the ranking is 1, 1, 0, 0, 1.

Precision at rank 1 = 1/1 = 1.0000
Precision at rank 2 = 2/2 = 1.0000
Precision at rank 5 = 3/5 = 0.6000

AP(Q3) = (1.0000 + 1.0000 + 0.6000) / 3 = 0.8667

Final mAP Calculation

Now average the three AP values:

AP(Q1) = 0.8056
AP(Q2) = 0.4417
AP(Q3) = 0.8667

mAP = (0.8056 + 0.4417 + 0.8667) / 3 = 0.7046

That means the ranking system has a mean average precision of approximately 0.7046 over the three evaluation queries.

Why mAP Is So Widely Used

Mean Average Precision is popular because it captures multiple qualities at once. It rewards retrieval systems that:

Return many relevant items
Place relevant items earlier in the ranking
Perform consistently across multiple queries
Do not hide weak performance on difficult queries

Unlike accuracy, which is often too coarse for ranked retrieval, mAP recognizes that the first relevant result at rank 1 is more valuable than the first relevant result at rank 20. In search engines, legal document retrieval, biomedical literature search, product discovery, and academic information systems, this ranking sensitivity is essential.

mAP Versus Precision, Recall, and NDCG

People often compare mAP with other retrieval metrics. Each metric answers a slightly different question, so choosing the right one depends on your use case.

Metric	Best For	Strength	Limitation
Precision	Snapshot quality of retrieved items	Simple and intuitive	Ignores rank positions unless used at a cutoff
Recall	Coverage of all relevant items	Useful when missing relevant documents is costly	Does not reflect ranking order
Average Precision	Single-query ranked relevance	Rewards early relevant hits	Needs reliable relevance judgments
Mean Average Precision	Multi-query ranking evaluation	Balances relevance and rank across queries	Can be harder to explain to non-technical stakeholders
NDCG	Graded relevance scenarios	Handles multiple relevance levels well	Less direct when labels are binary only

Common Mistakes When You Calculate Mean Average Precision

Even experienced practitioners make avoidable mistakes with AP and mAP. Here are the most common pitfalls:

Using the wrong denominator: AP should divide by the total number of relevant documents for the query, not just the number retrieved.
Ignoring queries with no relevant documents: Depending on the benchmark, those queries may be excluded or specially handled. You should define your policy clearly.
Confusing AP with precision at k: Precision@k looks only at the top k results, while AP considers the set of relevant hits across the ranking.
Assuming mAP is the same as the average of raw precision values: It is not. mAP averages AP scores, and AP itself is built from precision at relevant ranks.
Using inconsistent relevance judgments: mAP is only as trustworthy as your labeling process.

Interpreting a Good Mean Average Precision Score

There is no universal threshold for a “good” mAP score. A strong value in one domain may be poor in another. Enterprise search, web search, medical search, and image retrieval all have different baselines and data complexity. Instead of asking whether 0.70 is objectively good, ask these questions:

How does the score compare with your current production baseline?
Does the new model improve mAP on the same test set?
Is the gain statistically and operationally meaningful?
Do the hardest or most valuable queries improve?

In practice, relative improvement often matters more than an isolated absolute score. A move from 0.42 to 0.49 can be a major business win if the dataset is difficult and the benchmark is rigorous.

When to Use Optional Relevant Counts

The calculator above includes an optional field for total relevant counts because many real-world rankings show only the top N retrieved documents, not the entire collection. In that situation, your ranking might display three relevant hits while the ground truth says there are five relevant documents overall. AP should still divide by five. This prevents retrieval systems from looking artificially stronger than they are.

This distinction becomes critical in benchmarking tasks such as ad hoc search, passage retrieval, e-commerce ranking, and test collections used in evaluation campaigns. Official benchmarking programs, including those associated with retrieval research communities, typically define relevance judgments independently of the returned ranking.

Practical Uses of Mean Average Precision

mAP appears in many areas of applied machine learning and search engineering:

Search engines: Evaluating how well relevant documents appear near the top of results.
Recommendation systems: Measuring ranked recommendation quality when relevance can be treated as binary.
Question answering retrieval: Testing whether supporting passages are surfaced early.
Legal and compliance discovery: Ranking critical records for review workflows.
Academic and scientific search: Comparing ranking models across many queries.

How This Calculator Helps You Work Faster

This page is built to make a calculate mean average precision example workflow easier. Instead of manually computing precision at every relevant hit on paper, you can paste ranked relevance lines and immediately get:

Per-query Average Precision
Total query count
Final mAP score
A visual chart of AP by query
A compact breakdown you can use in reports, audits, or model reviews

That combination is useful for students, SEO analysts, machine learning engineers, IR researchers, and product teams comparing ranking experiments.

Additional Learning Resources and References

For broader context on evaluation methodology, retrieval benchmarks, and academic foundations, these resources are worth reviewing:

NIST TREC — a long-running benchmark initiative from the U.S. National Institute of Standards and Technology focused on information retrieval evaluation.
Stanford Introduction to Information Retrieval — a respected educational reference from Stanford that explains ranking metrics, precision, recall, and retrieval systems.
Cornell Information Retrieval course materials — university-level resources that help connect ranking theory with implementation and evaluation practice.

Final Takeaway

If you want a reliable metric for ranked retrieval, mAP remains one of the most practical and interpretable choices. It captures not only whether your system found relevant documents, but whether it ranked them early enough to matter. A strong calculate mean average precision example should always show three things clearly: the ranked relevance list, the precision at each relevant position, and the final averaging step across queries. Once you understand those building blocks, evaluating search quality becomes much more systematic and much less mysterious.