Differences from single class classification
In single class, we interpret final model scores as probabilities favouring one as the maximal.
Hence we used Softmax, to generate probabilites summing to 1.
Cross Entropy loss, to maximize the probability of the correct class.
In multi-class, we want to interpret the scores as degree of belongingness to that class. (Probability between (0-1))
Therefore we apply Sigmoid.
Penalize using individualised Binary Cross Entropy Loss.
Accuracy as a metric gives us good feedback only when the dataset is balanced wrt class instances, fails when skewed.
Ex: say class A - 80%, B - 10%, C - 5%, D - 5%
Accuracy would give a feedback of 80% for only getting class A instances correct.
We desire to evaluate model perormance wrt all classes.
Therefore we improve the metrics,
Class-Wise Accuracy or Precision: Correct predictions of class C / All predictions of class C
Above alone can't give a complete picture as it could predict less and give a higher value.
Hence we add Recall,
Recall or Coverage : Correct predictions of class C / All instances of class C
Both combined give a complete picture of the model performance on each class.
Ideally the higher the two values the better.
The calculation of these values depend on what we calculate as 'correct predictions' from the model scores. This we do by using a threshold value. (Ex: p >= 0.5)
Since this is a hyper-parameter, we would like to express the robustness of the model by observing PR values over various thresholds.
Express it using a graph, Average Precision (AP).
Average it out over all classes to get mAP.
This expresses model performance better than accuracy and allows ways to analyze model on individual class.
The Jupyter notebook for this post can be found here.