Accuracy, Precision, Recall, or F1
How do I get the most accurate model?
Most often when I am working on machine learning models and their processes, the most common question that comes in is “How to get the most accurate model?”. Well, to answer this question we should know, “What business challenge are you trying to solve using the model?” because accuracy isn’t the be-all and end-all model metric that we shall choose our “best” model from.
Therefore, I thought I will explain in this blog post that Accuracy need not necessarily be the one-and-only model metrics data scientists chase and include a simple explanation of other metrics as well.
First of all, let us look at the following confusion matrix. Can you tell what is the accuracy of the model?
You can easily notice that the accuracy of this model is very very high, at 99.9%! But, you haven’t hit the jackpot, what if I say the positive here represents a fraud case? Or the positive here represents a terrorist that the model says it’s a non-terrorist? You must have got an idea. The costs of having a misclassified actual positive or false negative are very high in these circumstances that I posed. Hence by now, you have realized that accuracy is not the be-all and end-all model metric to use when selecting the best model.
Precision and Recall
In pattern recognition, information retrieval and classification, precision and recall are performance metrics that apply to data retrieved from a collection, corpus, or sample space.
Precision also called positive predictive value is the fraction of relevant instances among the retrieved instances, while recall is also known as sensitivity is the fraction of relevant instances that were retrieved. Both precision and recall are therefore based on relevance.
The formula for calculating Precision and Recall is as follows:
Let’s see the confusion matrix and its part here.
- Precision
Now let us look at Precision first.
The denominator is actually the Total Predicted Positive.
Therefore, immediately you can see that Precision talks about how precise/accurate your model is out of those predicted positive, and how many of them are actually positive. Precision is a good measure to determine when the costs of False Positive are high. For instance, in email spam detection, in this case, the false positive means that an email that is non-spam has been identified as spam. The email user might lose important emails if the precision is not high for the spam detection model.
- Recall
Let’s see how recall is calculated.
We can clearly see, recall actually calculates how many of the Actual Positives our model capture through labeling it as Positive. Applying the same understanding, we know that Recall shall be the model metric we use to select our best model when there is high cost associated with False Negative.
For instance, in fraud detection or sick patient detection. If a fraudulent is Predicted as non-fraudulent, the consequence can be very bad for the bank. Similarly, in sick patient detection. If a sick patient goes through the test and is predicted as not sick. The cost associated with False Negative will be extremely high if the sickness is contagious.
F1 Score
F1 is a function of Precision and Recall, the formula is as follows:
F1 score is needed when you want to seek a balance between Precision and Recall. The difference between F! Score and Accuracy is that accuracy can be largely contributed by a large number of True Negatives which in most business circumstances, we do not focus on much whereas False Negative and False Positive usually has business costs, thus F1 score might be a better measure to use if we need to seek a balance between Precision and Recall and there is an uneven class distribution(a large number of Actual Negatives).
I hope the explanation will help those starting out on Machine Learning and working on classification problems, that accuracy will not always be the metric to select the best model from.
Please consider following me to get the latest blog related to Data Science.