Calculate Mean Average Precision Example
Use this premium calculator to compute Average Precision for multiple ranked result lists and then derive Mean Average Precision (mAP). Enter one query per line using binary relevance values such as 1,0,1,1 and optionally provide the true number of relevant documents for each query.
mAP Calculator
Results
How to Calculate Mean Average Precision: A Deep Example-Driven Guide
If you are searching for a practical way to calculate mean average precision example, you are usually working in information retrieval, search relevance evaluation, recommendation systems, ranking models, or machine learning for document retrieval. Mean Average Precision, commonly written as mAP, is one of the most useful ranking metrics because it does more than count how many relevant items appear. It also measures where those relevant items appear in the ranked list. That position-sensitive behavior makes it especially valuable when the order of results matters.
In plain terms, mean average precision rewards systems that place relevant results near the top. Two search engines might return the same number of relevant documents, but the engine that ranks them earlier should receive a better score. That is exactly where mAP becomes powerful. It combines precision values at each relevant position and averages them across multiple queries so you can evaluate system quality with more nuance.
What Mean Average Precision Actually Measures
Before computing mAP, it helps to understand its two parts:
- Average Precision (AP): Calculated for a single query. It averages the precision at every rank where a relevant item appears.
- Mean Average Precision (mAP): Calculated across many queries. It is simply the arithmetic mean of all AP scores.
Suppose a query returns five ranked documents and the relevance labels in order are 1, 0, 1, 1, 0. The relevant results appear at ranks 1, 3, and 4. To calculate AP, you compute precision at each of those relevant ranks:
- Rank 1: 1 relevant out of 1 retrieved = 1.0000
- Rank 3: 2 relevant out of 3 retrieved = 0.6667
- Rank 4: 3 relevant out of 4 retrieved = 0.7500
If the total number of relevant documents for that query is 3, then:
AP = (1.0000 + 0.6667 + 0.7500) / 3 = 0.8056
That single-query score is already informative. It says that relevant results were found and were positioned reasonably high in the ranking. But real evaluation almost always uses multiple queries. Once you calculate AP for each query, you average them to get mAP.
Step-by-Step Mean Average Precision Example
Let us walk through a more complete example using three queries. This mirrors the kind of input accepted by the calculator above.
| Query | Ranked Relevance List | Total Relevant Documents | Relevant Ranks |
|---|---|---|---|
| Q1 | 1, 0, 1, 1, 0 | 3 | 1, 3, 4 |
| Q2 | 0, 1, 1, 0, 1 | 4 | 2, 3, 5 |
| Q3 | 1, 1, 0, 0, 1 | 3 | 1, 2, 5 |
Now calculate AP for each query.
Query 1 AP Calculation
For Q1, the ranking is 1, 0, 1, 1, 0.
- Precision at rank 1 = 1/1 = 1.0000
- Precision at rank 3 = 2/3 = 0.6667
- Precision at rank 4 = 3/4 = 0.7500
AP(Q1) = (1.0000 + 0.6667 + 0.7500) / 3 = 0.8056
Query 2 AP Calculation
For Q2, the ranking is 0, 1, 1, 0, 1, but the total number of relevant documents is 4. That means one relevant document exists but was not retrieved in the shown ranking.
- Precision at rank 2 = 1/2 = 0.5000
- Precision at rank 3 = 2/3 = 0.6667
- Precision at rank 5 = 3/5 = 0.6000
AP(Q2) = (0.5000 + 0.6667 + 0.6000) / 4 = 0.4417
This is an important example because it shows why the denominator matters. If you divided by only the retrieved relevant results, you would inflate the score and hide the fact that the system missed one relevant item entirely.
Query 3 AP Calculation
For Q3, the ranking is 1, 1, 0, 0, 1.
- Precision at rank 1 = 1/1 = 1.0000
- Precision at rank 2 = 2/2 = 1.0000
- Precision at rank 5 = 3/5 = 0.6000
AP(Q3) = (1.0000 + 1.0000 + 0.6000) / 3 = 0.8667
Final mAP Calculation
Now average the three AP values:
- AP(Q1) = 0.8056
- AP(Q2) = 0.4417
- AP(Q3) = 0.8667
mAP = (0.8056 + 0.4417 + 0.8667) / 3 = 0.7046
That means the ranking system has a mean average precision of approximately 0.7046 over the three evaluation queries.
Why mAP Is So Widely Used
Mean Average Precision is popular because it captures multiple qualities at once. It rewards retrieval systems that:
- Return many relevant items
- Place relevant items earlier in the ranking
- Perform consistently across multiple queries
- Do not hide weak performance on difficult queries
Unlike accuracy, which is often too coarse for ranked retrieval, mAP recognizes that the first relevant result at rank 1 is more valuable than the first relevant result at rank 20. In search engines, legal document retrieval, biomedical literature search, product discovery, and academic information systems, this ranking sensitivity is essential.
mAP Versus Precision, Recall, and NDCG
People often compare mAP with other retrieval metrics. Each metric answers a slightly different question, so choosing the right one depends on your use case.
| Metric | Best For | Strength | Limitation |
|---|---|---|---|
| Precision | Snapshot quality of retrieved items | Simple and intuitive | Ignores rank positions unless used at a cutoff |
| Recall | Coverage of all relevant items | Useful when missing relevant documents is costly | Does not reflect ranking order |
| Average Precision | Single-query ranked relevance | Rewards early relevant hits | Needs reliable relevance judgments |
| Mean Average Precision | Multi-query ranking evaluation | Balances relevance and rank across queries | Can be harder to explain to non-technical stakeholders |
| NDCG | Graded relevance scenarios | Handles multiple relevance levels well | Less direct when labels are binary only |
Common Mistakes When You Calculate Mean Average Precision
Even experienced practitioners make avoidable mistakes with AP and mAP. Here are the most common pitfalls:
- Using the wrong denominator: AP should divide by the total number of relevant documents for the query, not just the number retrieved.
- Ignoring queries with no relevant documents: Depending on the benchmark, those queries may be excluded or specially handled. You should define your policy clearly.
- Confusing AP with precision at k: Precision@k looks only at the top k results, while AP considers the set of relevant hits across the ranking.
- Assuming mAP is the same as the average of raw precision values: It is not. mAP averages AP scores, and AP itself is built from precision at relevant ranks.
- Using inconsistent relevance judgments: mAP is only as trustworthy as your labeling process.
Interpreting a Good Mean Average Precision Score
There is no universal threshold for a “good” mAP score. A strong value in one domain may be poor in another. Enterprise search, web search, medical search, and image retrieval all have different baselines and data complexity. Instead of asking whether 0.70 is objectively good, ask these questions:
- How does the score compare with your current production baseline?
- Does the new model improve mAP on the same test set?
- Is the gain statistically and operationally meaningful?
- Do the hardest or most valuable queries improve?
In practice, relative improvement often matters more than an isolated absolute score. A move from 0.42 to 0.49 can be a major business win if the dataset is difficult and the benchmark is rigorous.
When to Use Optional Relevant Counts
The calculator above includes an optional field for total relevant counts because many real-world rankings show only the top N retrieved documents, not the entire collection. In that situation, your ranking might display three relevant hits while the ground truth says there are five relevant documents overall. AP should still divide by five. This prevents retrieval systems from looking artificially stronger than they are.
This distinction becomes critical in benchmarking tasks such as ad hoc search, passage retrieval, e-commerce ranking, and test collections used in evaluation campaigns. Official benchmarking programs, including those associated with retrieval research communities, typically define relevance judgments independently of the returned ranking.
Practical Uses of Mean Average Precision
mAP appears in many areas of applied machine learning and search engineering:
- Search engines: Evaluating how well relevant documents appear near the top of results.
- Recommendation systems: Measuring ranked recommendation quality when relevance can be treated as binary.
- Question answering retrieval: Testing whether supporting passages are surfaced early.
- Legal and compliance discovery: Ranking critical records for review workflows.
- Academic and scientific search: Comparing ranking models across many queries.
How This Calculator Helps You Work Faster
This page is built to make a calculate mean average precision example workflow easier. Instead of manually computing precision at every relevant hit on paper, you can paste ranked relevance lines and immediately get:
- Per-query Average Precision
- Total query count
- Final mAP score
- A visual chart of AP by query
- A compact breakdown you can use in reports, audits, or model reviews
That combination is useful for students, SEO analysts, machine learning engineers, IR researchers, and product teams comparing ranking experiments.
Additional Learning Resources and References
For broader context on evaluation methodology, retrieval benchmarks, and academic foundations, these resources are worth reviewing:
- NIST TREC — a long-running benchmark initiative from the U.S. National Institute of Standards and Technology focused on information retrieval evaluation.
- Stanford Introduction to Information Retrieval — a respected educational reference from Stanford that explains ranking metrics, precision, recall, and retrieval systems.
- Cornell Information Retrieval course materials — university-level resources that help connect ranking theory with implementation and evaluation practice.
Final Takeaway
If you want a reliable metric for ranked retrieval, mAP remains one of the most practical and interpretable choices. It captures not only whether your system found relevant documents, but whether it ranked them early enough to matter. A strong calculate mean average precision example should always show three things clearly: the ranked relevance list, the precision at each relevant position, and the final averaging step across queries. Once you understand those building blocks, evaluating search quality becomes much more systematic and much less mysterious.