- Getting started
- Introduction to the Wiki
- Overview of topics
- How to contribute
- General best practices
- Key principles of Computer Vision
- Convolution
- Advanced convolution techniques and layers
- Pooling
- Overfitting
- Underfitting
- Overfitting Vs. Underfitting in Machine Learning
- Upsampling and Downsampling techniques in Machine Learning
- Computer Vision tasks
- The complete glossary of the modern Computer Vision tasks
- Classification / Tagging
- Object Detection
- Semantic Segmentation
- Instance Segmentation
- Panoptic Segmentation
- Attribute Prediction
- Computer Vision model architectures
- ResNet
- Faster R-CNN
- Mask R-CNN
- DeepLabv3+
- U-Net
- FBNetV3
- U-Net++
- Efficient Net
- PAN
- PSPNet
- LinkNet
- FPN
- RetinaNet
- Cascade R-CNN
- FBNetV3IS
- FBNetV3OD
- CascadeMask R-CNN
- HybridTask Cascade
- Computer Vision metrics
- Confusion Matrix
- Intersection over Union (IoU)
- Accuracy
- Hamming score
- Precision
- Recall
- Precision-Recall curve and AUC-PR
- F-score
- Average Precision
- mean Average Precision (mAP)
- Loss functions in Machine Learning
- Comprehensive overview of loss functions in Machine Learning
- Cross-Entropy Loss
- Binary Cross-Entropy Loss
- Focal loss
- Bounding Box Regression Loss
- CrossEntropyIoULoss2D
- Average Loss
- Solver / Optimizer
- Comprehensive overview of solvers/optimizers in Deep Learning
- Adam
- SGD
- Adadelta
- Adagrad
- AdaMax
- Adamw
- ASGD
- Rprop
- RMSprop
- Lion
- Weight Decay
- Base Learning Rate
- Momentum (SGD)
- Epsilon Coefficient
- Training Parameters
- Patience
- Min delta
- Seed
- Everything you need to know about batches in Machine Learning
- Iterations
- Epoch
- Scheduler
- Comprehensive overview of learning rate schedulers in Machine Learning
- ExponentialLR
- CyclicLR
- StepLR
- MultiStepLR
- ReduceLROnPlateau
- CosineAnnealingLR
- Computer Vision augmentations
- Comprehensive overview of augmentations in Machine Learning
- Horizontal Flip
- Vertical Flip
- Random Crop
- Random Sized Crop
- Rotate
- Resize
- Blur
- Smallest max size
- Center Crop
- Color Jitter
- Gaussian Noise
- Shift Scale Rotate
- Longest max size
- Equalize
- To gray
- Shear
- Mosaic
- Copy Paste
- Extrapolation methods
- Interpolation methods
- Deployment
- Primitive deployment using web frameworks
- Commonly used web frameworks
- Containerized Deployment
- Orchestrated Deployment
- Challenges of Deployment
- Splits
- Data Splitting in Machine Learning

- Getting started
- Introduction to the Wiki
- Overview of topics
- How to contribute
- General best practices
- Key principles of Computer Vision
- Convolution
- Advanced convolution techniques and layers
- Pooling
- Overfitting
- Underfitting
- Overfitting Vs. Underfitting in Machine Learning
- Upsampling and Downsampling techniques in Machine Learning
- Computer Vision tasks
- The complete glossary of the modern Computer Vision tasks
- Classification / Tagging
- Object Detection
- Semantic Segmentation
- Instance Segmentation
- Panoptic Segmentation
- Attribute Prediction
- Computer Vision model architectures
- ResNet
- Faster R-CNN
- Mask R-CNN
- DeepLabv3+
- U-Net
- FBNetV3
- U-Net++
- Efficient Net
- PAN
- PSPNet
- LinkNet
- FPN
- RetinaNet
- Cascade R-CNN
- FBNetV3IS
- FBNetV3OD
- CascadeMask R-CNN
- HybridTask Cascade
- Computer Vision metrics
- Confusion Matrix
- Intersection over Union (IoU)
- Accuracy
- Hamming score
- Precision
- Recall
- Precision-Recall curve and AUC-PR
- F-score
- Average Precision
- mean Average Precision (mAP)
- Loss functions in Machine Learning
- Comprehensive overview of loss functions in Machine Learning
- Cross-Entropy Loss
- Binary Cross-Entropy Loss
- Focal loss
- Bounding Box Regression Loss
- CrossEntropyIoULoss2D
- Average Loss
- Solver / Optimizer
- Comprehensive overview of solvers/optimizers in Deep Learning
- Adam
- SGD
- Adadelta
- Adagrad
- AdaMax
- Adamw
- ASGD
- Rprop
- RMSprop
- Lion
- Weight Decay
- Base Learning Rate
- Momentum (SGD)
- Epsilon Coefficient
- Training Parameters
- Patience
- Min delta
- Seed
- Everything you need to know about batches in Machine Learning
- Iterations
- Epoch
- Scheduler
- Comprehensive overview of learning rate schedulers in Machine Learning
- ExponentialLR
- CyclicLR
- StepLR
- MultiStepLR
- ReduceLROnPlateau
- CosineAnnealingLR
- Computer Vision augmentations
- Comprehensive overview of augmentations in Machine Learning
- Horizontal Flip
- Vertical Flip
- Random Crop
- Random Sized Crop
- Rotate
- Resize
- Blur
- Smallest max size
- Center Crop
- Color Jitter
- Gaussian Noise
- Shift Scale Rotate
- Longest max size
- Equalize
- To gray
- Shear
- Mosaic
- Copy Paste
- Extrapolation methods
- Interpolation methods
- Deployment
- Primitive deployment using web frameworks
- Commonly used web frameworks
- Containerized Deployment
- Orchestrated Deployment
- Challenges of Deployment
- Splits
- Data Splitting in Machine Learning

If you have ever tried solving a Classification task using a Machine Learning (ML) algorithm, you might have heard of the popular Accuracy score ML metric. On this page, we will:

- Сover the logic behind the metric (both for the binary and multiclass cases);
- Check out the metric’s formula;
- Find out how to interpret the Accuracy value;
- Talk about the disadvantages of the metric;
- Calculate Accuracy on a simple example (or two);
- And see how to work with the Accuracy score using Python.

Let’s jump in.

The Accuracy score is firmly based on the Confusion matrix. So, to better grasp the metric, please check out the Confusion matrix page first.

The most intuitive way to evaluate the performance of any Classification algorithm is to calculate the percentage of its correct predictions. And this is precisely the logic behind the Accuracy score.

To define the term, in Machine Learning, the Accuracy score (or just Accuracy) is a Classification metric featuring a fraction of the predictions that a model got right. The metric is prevalent as it is easy to calculate and interpret. Also, it measures the model’s performance with a single value.

So, to evaluate a Classification model using the Accuracy score, you need to have:

- The ground truth classes;
- And the model’s predictions.

Fortunately, Accuracy is a highly intuitive metric, so you should not experience any challenges in understanding it. The Accuracy score is calculated by dividing the number of correct predictions by the total prediction number.

The more formal formula is the following one.

As you can see, Accuracy can be easily described using the Confusion matrix terms such as True Positive, True Negative, False Positive, and False Negative. Still, as described on the Confusion matrix page, these terms are mainly used for the **binary** Classification tasks.

So, the Accuracy score algorithm for the binary Classification task is as follows:

- Get predictions from your model;
- Calculate the number of True Positives, True Negatives, False Positives, and False Negatives;
- Use the Accuracy formula for the binary case;
- And analyze the obtained value.

Yes, it is as simple as that. But what about the multiclass case?
Well, there is no specific formula, so we suggest using the basic logic
behind the metric to get the result. The Accuracy score algorithm for
the **multiclass** Classification task is as follows:

- Get predictions from your model;
- Calculate the number of correct predictions;
- Divide it by the total prediction number;
- And analyze the obtained value.

In the Accuracy case, the metric value interpretation is more or less straightforward. If you are getting more correct predictions, it results in a higher Accuracy score. The higher the metric value, the better. The best possible value is 1 (if a model got all the predictions right), and the worst is 0 (if a model did not make a single correct prediction).

From our experience, you should consider **Accuracy > 0.9** as an excellent score, **Accuracy > 0.7**
as a good one, and any other score as the poor one. Still, you can set
your own thresholds as your logic and task might vary highly from ours
(for example, in medicine, you might need to have an Accuracy score up
to 0.99+ before calling a job done).

Still, this metric has two massive drawbacks that must be considered when using it. Let’s cover them one by one.

The greatest problem is that Accuracy is utterly useless if the class distribution in your set is skewed. Let’s check out a simple example.

For example, we want to evaluate the performance of a mail spam filter. We have **100** non-spam emails. Our classifier correctly predicted **90** of them (**True Negative = 90**,** False Positive = 10**). From **10** spam emails classifier identified only **5** (**True Positive = 5**, **False Negative = 5**). In this case, the Accuracy score will be:

**Accuracy**= (5 + 90) / (90 + 10 + 5 + 5) = 0.864

However, if we predict all emails as non-spam, we will get a higher Accuracy (**True Negative = 100**, **False Positive = 0**, **True Positive = 0**, **False Negative = 10**):

**Accuracy**= (0 + 100) (0 + 100 + 0 + 10) = 0.909

The second model has a better metric value but does not have any predictive power. So, be very careful and always check whether your data has a class imbalance problem before applying Accuracy.

To be fair, Data Scientists came up with a solution to this problem by developing the Balanced Accuracy metric. Check its page in the sklearn documentation to learn more.

The other disadvantage is that Accuracy is not that informative when used as the only metric. For example, it does not tell you what types of errors your model makes.

At a 1% misclassification rate (99% Accuracy), the error could be caused by False Positives or False negatives. Such information is essential when evaluating a model for a specific use case. Take COVID tests as an example: you'd rather have FPs (the test says that a person has COVID, but he actually does not) than FNs (the test says that a person does not have COVID, but he actually does).

Overall, it is not a massive problem as you can solve it in a few lines of code by calculating some other metrics, but you still should keep in mind that relying only on the Accuracy value is a bad idea.

Let’s say we have a binary Classification task. For example, you are trying to determine whether a cat or a dog is on an image. You have a model and want to evaluate its performance using Accuracy. You pass **15** pictures with a cat and **20** images with a dog to the model. From the given **15** cat images, the algorithm predicts **9** pictures as the dog ones, and from the **20** dog images - **6** pictures as the cat ones. Let’s build a Confusion matrix first (you can check the detailed calculation on the Confusion matrix page).

Excellent, now let’s calculate the Accuracy score using the formula for the binary Classification task (the number of correct predictions is in the green cells of the table, and the number of the incorrect ones is in the red cells).

**Accuracy**= (TN + TP) / (TP + FP + TN + FN) = (14 + 6) / (6 + 6 + 14 + 9) ~ 0.57

Ok, great. Let’s expand the task and add another class, for example, the bird one. You pass **15** pictures with a cat, **20** images with a dog, and **12** pictures with a bird to the model. The predictions are as follows:

**15**cat images:**9**dog pictures,**3**bird ones, and**15 - 9 - 3 = 3**cat images;**20**dog images:**6**cat pictures,**4**bird ones, and**20 - 6 - 4 = 10**dog images;**12**bird images:**4**dog pictures,**2**cat ones, and**12 - 4 - 2 = 6**bird images.

Let’s build the matrix.

Let’s use the basic logic behind the Accuracy metric to calculate the value for the multiclass case.

**Number of correct predictions**= 10 (dog) + 6 (bird) + 3 (cat) = 19;**Total number of predictions**= 10 + 4 + 9 + 4 + 6 + 3 + 6 + 2 + 3 = 47;**Accuracy**= 19 / 47 ~ 0.4

Accuracy score is widely used in the industry, so all the Machine and Deep Learning libraries have their own implementation of this metric. For this page, we prepared three code blocks featuring calculating Accuracy in Python. In detail, you can check out:

- Accuracy in Scikit-learn (Sklearn);
- Accuracy in TensorFlow;
- Accuracy in PyTorch.

Scikit-learn is the most popular Python library for classical Machine Learning. From our experience, Sklearn is the tool you will likely use the most to calculate Accuracy (especially, if you are working with the tabular data). Fortunately, you can do it in just a few lines of code.

```
# Importing the function
from sklearn.metrics import accuracy_score
# Initializing the arrays (multiclass case)
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
# Calculating and printing the result
accuracy_score(y_true, y_pred, normalize=False)
```

Beyond the basic functionality, Sklearn has various Accuracy options implemented. You should definitely check them out to simplify your workflow.

In the vision AI field, the Accuracy score algorithm is slightly different. For instance segmentors, semantic segmentors, and object detectors, a prediction is correct if the predicted class equals the ground truth one and the prediction's IoU is above a certain threshold (often, a threshold of 0.5 is used).

```
# Importing the library
import tensorflow as tf
# Calculating the metric value
m = tf.keras.metrics.Accuracy()
m.update_state([1, 2, 3, 4], [0, 2, 3, 4])
# Printing the result
print('Final result: ', m.result().numpy())
```

```
!pip install torchmetrics
# Importing the library
import torch
import torchmetrics
from torchmetrics import Accuracy
# Initializing the input tensors
target = torch.tensor([0, 1, 2, 3])
preds = torch.tensor([0, 2, 1, 3])
# Сalculating and printing the result
accuracy = Accuracy()
accuracy(preds, target)
```

Last modified 9d ago

© 2010-2024 CloudFactory Limited. All rights reserved.