Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Well-calibrated and specialized probability estimation trees
Jönköping University, School of Engineering, JTH, Department of Computing, Jönköping AI Lab (JAIL).ORCID iD: 0000-0003-0412-6199
Jönköping University, School of Engineering, JTH, Department of Computing, Jönköping AI Lab (JAIL).ORCID iD: 0000-0003-0274-9026
2020 (English)In: Proceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020 / [ed] C. Demeniconi and N. Chawla, Society for Industrial and Applied Mathematics, 2020, p. 415-423Conference paper, Published paper (Refereed)
Abstract [en]

In many predictive modeling scenarios, the production set inputs that later will be used for the actual prediction is available and could be utilized in the modeling process. In fact, many predictive models are generated with an existing production set in mind. Despite this, few approaches utilize this information in order to produce models optimized on the production set at hand. If these models need to be comprehensible, the oracle coaching framework can be applied, often resulting in interpretable models, e.g., decision trees and rule sets, with accuracies on par with opaque models like neural networks and ensembles, on the specific production set. In oracle coaching, a strong but opaque predictive model is used to label instances, including the production set, which are later learned by a weaker but interpretable model. In this paper, oracle coaching is, for the first time, used for improving the calibration of probabilistic predictors. More specifically, setups where oracle coaching are combined with the techniques Platt scaling, isotonic regression and Venn-Abers are suggested and evaluated for calibrating probability estimation trees (PETs). A key contribution is the setup designs ensuring that the oracle-coached PETs, that per definition utilize knowledge about production data, remain well-calibrated. In the experimentation, using 23 publicly available data sets, it is shown that oracle-coached models are not only more accurate, but also significantly better calibrated, compared to standard induction. Interestingly enough, this holds both for the uncalibrated PETs, and for all calibration techniques evaluated, i.e., Platt scaling, isotonic regression and Venn-Abers. As expected, all three external techniques significantly improved the calibration of the original PETs. Finally, an outright comparison between the three external calibration techniques showed that Venn-Abers significantly outperformed the alternatives in most setups.

Place, publisher, year, edition, pages
Society for Industrial and Applied Mathematics, 2020. p. 415-423
Keywords [en]
Calibration, Data mining, Decision trees, Forestry, Space division multiple access, Calibration techniques, External calibration, Isotonic regression, Modeling process, Predictive modeling, Predictive models, Probability estimation trees, Production data, Predictive analytics
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:hj:diva-50325DOI: 10.1137/1.9781611976236.47ISI: 000627117200047Scopus ID: 2-s2.0-85089193470ISBN: 9781611976236 (electronic)OAI: oai:DiVA.org:hj-50325DiVA, id: diva2:1459603
Conference
2020 SIAM International Conference on Data Mining, SDM 2020, 7 May 2020 through 9 May 2020
Funder
Vinnova, 2018-03581Knowledge Foundation, DATAKIND 20190194, DATAMINE HJ 2016/874-51Available from: 2020-08-20 Created: 2020-08-20 Last updated: 2021-04-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Johansson, UlfLöfström, Tuwe

Search in DiVA

By author/editor
Johansson, UlfLöfström, Tuwe
By organisation
Jönköping AI Lab (JAIL)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 115 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf