Binary Classification Metrics

date

Jul 4, 2023

slug

bin-classification-metrics

status

Published

tags

Machine Learning

summary

The 3 dimensional equation that formulates 8 different metrics of binary classification

type

Post

It's magical how many metrics exist to evaluate a problem that outputs only two possibilities - 0 or 1. Here we will derive a formulated way of understanding the relationships between all these methods of evaluation.

Reviewing the confusion matrix, we see that supervised binary classification leads to one of four possible outcomes: True Negative, False Positive False Negative, True Positive

Summing the matrix horizontally, we get the actual number of negative and positive examples:

Summing vertically, we get the predicted number of negative and positive examples:

There are a few different binary classification metrics we can use to asses the quality of the predicted outcomes.

A 3-dimensional question we can use to derive any of the metrics is:

💡

When x y, how often is it z?

Then, the 3 dimensions of a binary classification metric are:

x - the actual label is / the model predicts

y - 1 / 0

z - Correct / Incorrect

From these 3 different dimensions, we can derive 8 different metrics for evaluating a binary classification problem.

The first 4 metrics focus on the correct predictions - the numerator focuses on what the model got right (the True Positives and True Negatives):

1: True Positive Rate (TPR) / Recall / Sensitivity - When the actual label is 1, how often are we correct?

2: Precision - When the model predicts 1, how often are we correct?

3: True Negative Rate (TNR) / Specificity - When the actual label is 0, how often are we correct?

4: Unnamed - When the model predicts 0, how often are we correct?

The next 4 metrics focus on the incorrect predictions - the numerator focuses on what the model got wrong (the False Negatives and False Positives):

5: False Negative Rate (FNR) - When the actual label is 1, how often are we incorrect?

6: Unnamed - When the model predicts 1, how often are we incorrect?

7: False Positive Rate (FPR) - When the actual label is 0, how often are we incorrect?

8: Unnamed - When the model predicts 0, how often are we incorrect?

And there we have it, the 8 key metrics fundamental for evaluating a binary classification problem fixing a threshold. However, note that more metrics exist that help combine these foundational metrics to evaluate a model across multiple thresholds (F1 score, AUC for ROC curve, AUC for Precision Recall curve, etc…).

I hope this post gave you a better foundation for understanding the relationships between the different metrics for evaluating binary classification problems.

Please don't hesitate to reach out to me on LinkedIn or Twitter if you have any questions or comments!