In this section, we present the results from the experimental evaluation. The results in Table 1 show model performance for each of evaluation measures from example-based, a label-based and ranking-based groups. Best performances over each evaluation measure are shown in bold (best value in the row), while the worst performance (by rows) is shown in italic font.
The results in Table 1 clearly show that the use of hierarchies (both expert-provided and data-derived) improves the predictive performance as compared to the use of “flat” output space. Namely, models that did not exploit any hierarchical information (No Hierarchy column) showed worse performance on each performance evaluation measure. This means that both expert and data inferred knowledge have a positive impact on performance on the model.The conclusions revealed by the comparisons made between the different hierarchies on the output space (expert-provided and data-derived) are not that clear. But, we noticed some patterns occurring in the experimental results. In particular, PCT with domain knowledge hierarchy have better performance on label-based and ranking-based evaluations measures.
Based on results from Table 1 we found that data-derived hierarchies obtained better example-based performances compared to expert-driven hierarchies. Namely, on average models with data-driven hierarchies were better by 3.96\%, and specifically, k means hierarchy based models and hierarchical clustering models for 2.76\% and 6.35\%, respectively.
However, when looking label based evaluation measures knowledge driven hierarchy based model achieved better performance on average by 10.63\%. More specifically, performances of data-driven hierarchy based models were 4.95\% and 2.20\% worse for k means hierarchy based models and hierarchical clustering based models, respectively. It is worth to notice that k means hierarchy based models had best both micro and macro precision and still was the worst performing group of models.
Finally, for ranking based evaluation measures we can state that knowledge-based hierarchy model achieved far better performances (better for 8.98\%) compared to data-driven hierarchy based models. It is worth to notice that k means hierarchy group of models was better than hierarchical clustering models.
The reasoning behind these results is as follows. Clustering examples results in groups of similar cases with similar outputs, therefore predicting all outputs in one example in similar cases performs better compared to expert-based hierarchy (since it is expected for similar cases to have similar outputs). Expert-driven hierarchy groups similar diagnoses, not cases, and most often those cases do not have diagnoses from the same group which yields worse performance compared to data-driven hierarchies. However, on a label based and ranking based evaluation measures expert-based hierarchy is more general (depth is four) which yield better performance since even if model misses diagnosis it is close in expert-driven hierarchy (i.e. model predicted viral pneumonia while true output was infective pneumonia). More specifically, data-driven hierarchies do not take into account medical similarity. Therefore, for example, viral and infective pneumonia can be very dissimilar (i.e. viral pneumonia is grouped with dehydration).
Due to the sake of clarity, in next section, we will present k means (k=5) based model. However, other hierarchy models will be presented in Appendix.