Customized interpretable conformal regressors
2019 (English)In: Proceedings - 2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 221-230, article id 8964179Conference paper, Published paper (Refereed)
Abstract [en]
Interpretability is recognized as a key property of trustworthy predictive models. Only interpretable models make it straightforward to explain individual predictions, and allow inspection and analysis of the model itself. In real-world scenarios, these explanations and insights are often needed for a specific batch of predictions, i.e., a production set. If the input vectors for this production set are available when generating the predictive model, a methodology called oracle coaching can be used to produce highly accurate and interpretable models optimized for the specific production set. In this paper, oracle coaching is, for the first time, combined with the conformal prediction framework for predictive regression. A conformal regressor, which is built on top of a standard regression model, outputs valid prediction intervals, i.e., the error rate on novel data is bounded by a preset significance level, as long as the labeled data used for calibration is exchangeable with production data. Since validity is guaranteed for all conformal predictors, the key performance metric is the size of the prediction intervals, where tighter (more efficient) intervals are preferred. The efficiency of a conformal model depends on several factors, but more accurate underlying models will generally also lead to improved efficiency in the corresponding conformal predictor. A key contribution in this paper is the design of setups ensuring that when oracle coached regression trees, that per definition utilize knowledge about production data, are used as underlying models for conformal regressors, these remain valid. The experiments, using 20 publicly available regression data sets, demonstrate the validity of the suggested setups. Results also show that utilizing oracle-coached underlying models will generally lead to significantly more efficient conformal regressors, compared to when these are built on top of models induced using only training data.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019. p. 221-230, article id 8964179
Keywords [en]
Conformal prediction, Interpretability, Oracle coaching, Predictive modeling, Regression trees, Advanced Analytics, Efficiency, Forecasting, Forestry, Labeled data, Regression analysis, Conformal predictions, Trees (mathematics)
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:hj:diva-47936DOI: 10.1109/DSAA.2019.00037ISI: 000540890900022Scopus ID: 2-s2.0-85079278508ISBN: 9781728144931 (print)OAI: oai:DiVA.org:hj-47936DiVA, id: diva2:1412086
Conference
6th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019, Washington, United States, 5 - 8 October, 2019
Funder
Knowledge Foundation, DATAKIND 20190194
Note
This work was supported by the Swedish Knowledge Foundation (DATAKIND 20190194) and by Region Jönköping (DATAMINE HJ 2016/874-51).
2020-03-052020-03-052021-03-15Bibliographically approved