Calibrating probability estimation trees using Venn-Abers predictors
2019 (English) In: SIAM International Conference on Data Mining, SDM 2019, Society for Industrial and Applied Mathematics, 2019, p. 28-36Conference paper, Published paper (Refereed)
Abstract [en]
Class labels output by standard decision trees are not very useful for making informed decisions, e.g., when comparing the expected utility of various alternatives. In contrast, probability estimation trees (PETs) output class probability distributions rather than single class labels. It is well known that estimating class probabilities in PETs by relative frequencies often lead to extreme probability estimates, and a number of approaches to provide more well-calibrated estimates have been proposed. In this study, a recent model-agnostic calibration approach, called Venn-Abers predictors is, for the first time, considered in the context of decision trees. Results from a large-scale empirical investigation are presented, comparing the novel approach to previous calibration techniques with respect to several different performance metrics, targeting both predictive performance and reliability of the estimates. All approaches are considered both with and without Laplace correction. The results show that using Venn-Abers predictors for calibration is a highly competitive approach, significantly outperforming Platt scaling, Isotonic regression and no calibration, with respect to almost all performance metrics used, independently of whether Laplace correction is applied or not. The only exception is AUC, where using non-calibrated PETs together with Laplace correction, actually is the best option, which can be explained by the fact that AUC is not affected by the absolute, but only relative, values of the probability estimates.
Place, publisher, year, edition, pages Society for Industrial and Applied Mathematics, 2019. p. 28-36
Keywords [en]
Calibration, Data mining, Decision trees, Forestry, Laplace transforms, Calibration techniques, Class probabilities, Empirical investigation, Performance metrics, Predictive performance, Probability estimate, Probability estimation trees, Relative frequencies, Probability distributions
National Category
Probability Theory and Statistics
Identifiers URN: urn:nbn:se:hj:diva-44355 DOI: 10.1137/1.9781611975673.4 Scopus ID: 2-s2.0-85066082095 ISBN: 9781611975673 (print) OAI: oai:DiVA.org:hj-44355 DiVA, id: diva2:1322841
Conference 19th SIAM International Conference on Data Mining, SDM 2019, Hyatt Regency Calgary, Calgary, Canada, 2 - 4 May 2019
2019-06-112019-06-112021-03-15 Bibliographically approved