mean Average Precision (mAP)

If you have ever worked on an Object Detection, Instance Segmentation, or Semantic Segmentation tasks, you might have heard of the popular mean Average Precision (mAP) Machine Learning (ML) metric. On this page, we will:

Сover the logic behind the metric;
Check out the mean Average Precision formula;
Find out how to interpret the metric’s value;
Present a simple mean Average Precision calculation example;
And see how to work with mAP using Python.

Let’s jump in.

As the name suggests, the mean Average Precision metric is based on the Average Precision metric derived from Precision and Recall. Also, it uses the Precision-Recall curve and Intersection over Union concepts. So, to better grasp the metric, please check out the Confusion matrix, Precision score, Recall score, Intersection over Union, Precision-Recall curve, and Average Precision pages first.

What is mean Average Precision?

To define the term, mean Average Precision (or mAP) is a Machine Learning metric designed to evaluate the Object Detection algorithms. To clarify, nowadays, you can use mAP to evaluate Instance and Semantic Segmentation models as well. Still, we will not talk much about these use cases on this page as we will focus on mean Average Precision for Object Detection tasks.

mean Average Precision formula

From the mathematical standpoint, computing mAP requires summing up the Average Precision scores across all the classes and dividing the result by the total number of classes.

Still, it is not that easy. In Object Detection-related papers, you can face such abbreviations as mAP@0.5 or mAP@0.75. In short, this notation depicts the IoU threshold used to calculate mAP. Let’s check out how it works.

Objects	Predicted Intersection over Union	IoU threshold > 0.9	IoU threshold > 0.71	IoU threshold > 0.3
Object 1	0.95	Correct Prediction	Correct Prediction	Correct Prediction
Object 2	0.9	Incorrect Prediction	Correct Prediction	Correct Prediction
Object 3	0.85	Incorrect Prediction	Correct Prediction	Correct Prediction
Object 4	0.8	Incorrect Prediction	Correct Prediction	Correct Prediction
Object 5	0.7	Incorrect Prediction	Incorrect Prediction	Correct Prediction
Object 6	0.4	Incorrect Prediction	Incorrect Prediction	Correct Prediction

If we set the IoU threshold at 0.9, then Precision is equal to 16% as only 1 out of 6 predictions fits the score;
If the threshold is 0.71, then Precision is 66,67% because 4 predictions are above that score.
And if the threshold is 0.3, then Precision rises to 100% as all the predictions have IoU above 0.3!

So, the IoU threshold can significantly affect the final mean Average Precision value. This dependency introduces variability to a model evaluation. This is bad because there can be a scenario when one model performs well under one IoU threshold and massively underperforms under another.

Data Scientists identified this weakness. They wanted the evaluation metric to be as robust as possible. So, they suggested measuring Average Precision for every class and every IoU threshold first and then calculating the average of the obtained scores.

The picture above shows Precision-Recall curves drawn for 4 IoU thresholds for three different classes. In the example, the IoU threshold of 0.9 is the most stringent (as at least 90% overlap between the predicted and ground truth bounding boxes is required), and 0.6 is the most lenient.

As you can see, the difference between each IoU threshold value is 0.1. This measure is called a step. So, the abbreviation mAP@0.6:0.1:0.9 means that the mAP was calculated for each IoU threshold in the range [0.6, 0.7, 0.8, 0.9] and each class. Then the obtained values were averaged to get the final result.

Noteworthy, in the popular Common Objects in Context (COCO) dataset, mAP is benchmarked by averaging it out over IoUs from [0.5, 0.95] in 0.05 steps.

mean Average Precision calculation algorithm

Define the IoU thresholds;
Compute the Average Precision score for each IoU threshold for a specific class;
Calculate the mean Average Precision value for the given class by summing up the scores from step 2 and dividing them by the number of IoU threshold values;
Apply steps 2 and 3 to all the classes;
Sum up the obtained mAP scores and divide them by the total number of classes to get the final result.

What is Mask mean Average Precision?

To define the term, mask mean Average Precision (or mask mAP) is a variety of mean Average Precision. It is still a Machine Learning metric but designed to evaluate Instance Segmentation algorithms.

In short, mAP and mask mAP are similar when it comes to the calculation algorithm. The main difference is that IoU, in the mask mAP case, is calculated between segmentation masks instead of the bounding boxes. Beyond this, the mean Average Precision calculation algorithm is intact.

To visualize the difference, please take a look at the examples below. The first picture features an Object Detection task, as the ground truth labels are represented by bounding boxes. The second picture portrays an Instance Segmentation task, as the labels are pixel-perfect masks.

Object Detection mean Average Precision metric uses bounding boxes IoU as thresholds
Source

Segmentation Mask mean Average Precision uses segmentation masks IoU as thresholds
Source

Interpreting mean Average Precision

Since mAP is calculated across multiple Precision-Recall curves, the best case scenario you can get is when both precision and recall metrics on every IoU threshold are equal to their best possible value – one. However, such a case is a fantasy, so you should stay realistic and expect a significantly lower metric value.

Noteworthy, it is complicated to provide unified mAP benchmarks that would suit any Object Detection problem since there are too many variables to consider, for instance:

the number of classes;
the expected tradeoff between Precision and Recall;
the IoU threshold, etc.

The truth is that based on the task, the same metric value might be both good and bad. So, if you spend some time identifying the desired value for your case and estimating your goals and resources, it would benefit you greatly.

Nevertheless, the SOTA (State-of-the-Art) mAP for the Object Detection task on the COCO dataset's test part is currently 63,1.

Object Detection benchmarks on COCO’s test part
Source

As you can see, SOTA is relatively low. Still, it does not mean you should stop if you see a value similar to it in your task. From our experience and the research conducted when writing this page, if your task does not include millions of classes and your IoU threshold is set to 0,5, you should reach at least 0,7 mAP before you are more or less satisfied with the model's performance.

To summarize, if your Object Detection task is not similar to COCO, you should not bother that much about SOTA and try to achieve 0,8 mAP on the 0,5 IoU threshold before calling the job done. Also, do not trust the metric alone and always visualize the model's predictions to ensure your solution works as intended.

mean Average Precision calculation example

For this page, we decided to be more advanced and prepared a Google Colab notebook featuring calculating mAP using Python on a simple example. We used the Average Precision and mean Average Precision formal formulas, NumPy and Sklearn functionalities, and some imagination.

If you want a well-commented yet simple example of computing mean Average Precision in Python, check out this example.

mean Average Precision in Python

The mean Average Precision score is widely used in the industry, so the Machine and Deep Learning libraries either have their implementation of this metric or can be used to code it quickly. For this page, we prepared three code blocks featuring calculating mAP in Python. In detail, you can check out:

mean Average Precision in NumPy;
mean Average Precision in TensorFlow;
mean Average Precision in PyTorch.

mean Average Precision in NumPy

  
Hello, thank you for using the code provided by CloudFactory. Please note that some code blocks might not be 100% complete and ready to be run as is. This is done intentionally as we focus on implementing only the most challenging parts that might be tough to pick up from scratch. View our code block as a LEGO block - you can’t use it as a standalone solution, but you can take it and add it to your system to complement it.

      python
      
      import numpy as np
 
  def apk(actual, predicted, k=10):
  
    #This function computes the average prescision at k between two lists of items.
 
      if not actual:
          return 0.0
 
      return score / min(len(actual), k)
 
  def mapk(actual, predicted, k=10):
  
    #This function computes the mean average precision at k between two lists of lists of items.
 
      return np.mean([apk(a,p,k) for a,p in zip(actual, predicted)])

mean Average Precision in TensorFlow

  
Hello, thank you for using the code provided by CloudFactory. Please note that some code blocks might not be 100% complete and ready to be run as is. This is done intentionally as we focus on implementing only the most challenging parts that might be tough to pick up from scratch. View our code block as a LEGO block - you can’t use it as a standalone solution, but you can take it and add it to your system to complement it.

      python
      
    
      !pip install tensorflow==1.15 
  
  #Make sure you have updated the Tensorflow version for tf.metrics.average_precision_at_k to work 
  import tensorflow as tf
  
  import numpy as np
 
  y_true = np.array([[2], [1], [0], [3], [0]]).astype(np.int64)
  y_true = tf.identity(y_true)
 
  y_pred = np.array([[0.1, 0.2, 0.6, 0.1],
                   [0.8, 0.05, 0.1, 0.05],
                   [0.3, 0.4, 0.1, 0.2],
                   [0.6, 0.25, 0.1, 0.05],
                   [0.1, 0.2, 0.6, 0.1]
                   ]).astype(np.float32)
  y_pred = tf.identity(y_pred)
 
  m_ap = tf.metrics.average_precision_at_k(y_true, y_pred, 3)
 
  sess = tf.Session()
  sess.run(tf.local_variables_initializer())
 
  stream_vars = [i for i in tf.local_variables()]
 
  tf_map = sess.run(m_ap)
  print(tf_map)
 
  print((sess.run(stream_vars)))
 
  tmp_rank = tf.nn.top_k(y_pred,3)
 
  print(sess.run(tmp_rank))
    

mean Average Precision in Pytorch

  
Hello, thank you for using the code provided by CloudFactory. Please note that some code blocks might not be 100% complete and ready to be run as is. This is done intentionally as we focus on implementing only the most challenging parts that might be tough to pick up from scratch. View our code block as a LEGO block - you can’t use it as a standalone solution, but you can take it and add it to your system to complement it.

      python
      
    
      import torch
  from torchmetrics.detection.mean_ap import MeanAveragePrecision
  preds = [
    dict(
      boxes=torch.tensor([[258.0, 41.0, 606.0, 285.0]]),
      scores=torch.tensor([0.536]),
      labels=torch.tensor([0]),
    )
  ]
  target = [
    dict(
      boxes=torch.tensor([[214.0, 41.0, 562.0, 285.0]]),
      labels=torch.tensor([0]),
    )
  ]
  metric = MeanAveragePrecision()
  metric.update(preds, target)
  from pprint import pprint
  pprint(metric.compute())
    

Learn more about the metrics based on the Confusion matrix

Boost model performance quickly with AI-powered labeling and 100% QA.

Learn more

Last modified 14d ago

Previous - Computer Vision metrics

Average Precision

Next - Loss functions in Machine Learning

Comprehensive overview of loss functions in Machine Learning