Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 70) Show all publications
Buendia, R., Kogej, T., Engkvist, O., Carlsson, L., Linusson, H., Johansson, U., . . . Ahlberg, E. (2019). Accurate Hit Estimation for Iterative Screening Using Venn-ABERS Predictors. Journal of Chemical Information and Modeling, 59(3), 1230-1237
Open this publication in new window or tab >>Accurate Hit Estimation for Iterative Screening Using Venn-ABERS Predictors
Show others...
2019 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 59, no 3, p. 1230-1237Article in journal (Refereed) Published
Abstract [en]

Iterative screening has emerged as a promising approach to increase the efficiency of high-throughput screening (HTS) campaigns in drug discovery. By learning from a subset of the compound library, inferences on what compounds to screen next can be made by predictive models. One of the challenges of iterative screening is to decide how many iterations to perform. This is mainly related to difficulties in estimating the prospective hit rate in any given iteration. In this article, a novel method based on Venn - ABERS predictors is proposed. The method provides accurate estimates of the number of hits retrieved in any given iteration during an HTS campaign. The estimates provide the necessary information to support the decision on the number of iterations needed to maximize the screening outcome. Thus, this method offers a prospective screening strategy for early-stage drug discovery.

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2019
National Category
Computer Sciences
Identifiers
urn:nbn:se:hj:diva-43510 (URN)10.1021/acs.jcim.8b00724 (DOI)000462943700027 ()30726080 (PubMedID)2-s2.0-85063371683 (Scopus ID);JTHDatateknikIS (Local ID);JTHDatateknikIS (Archive number);JTHDatateknikIS (OAI)
Funder
Knowledge Foundation, 20150185
Available from: 2019-04-23 Created: 2019-04-23 Last updated: 2019-08-22Bibliographically approved
Johansson, U. & Gabrielsson, P. (2019). Are Traditional Neural Networks Well-Calibrated?. In: Proceedings of the International Joint Conference on Neural Networks: . Paper presented at 2019 International Joint Conference on Neural Networks, IJCNN 2019, Budapest, Hungary, 14 - 19 July 2019. IEEE, July, Article ID 8851962.
Open this publication in new window or tab >>Are Traditional Neural Networks Well-Calibrated?
2019 (English)In: Proceedings of the International Joint Conference on Neural Networks, IEEE, 2019, Vol. July, article id 8851962Conference paper, Published paper (Refereed)
Abstract [en]

Traditional neural networks are generally considered to be well-calibrated. Consequently, the established best practice is to not try to improve the calibration using general techniques like Platt scaling. In this paper, it is demonstrated, using 25 publicly available two-class data sets, that both single multilayer perceptrons and ensembles of multilayer perceptrons in fact often are poorly calibrated. Furthermore, from the experimental results, it is obvious that the calibration can be significantly improved by using either Platt scaling or Venn-Abers predictors. These results stand in sharp contrast to the standard recommendations for the use of neural networks as probabilistic classifiers. The empirical investigation also shows that for bagged ensembles, it is beneficiary to calibrate on the out-of-bag instances, despite the fact that this leads to using substantially smaller ensembles for the predictions. Finally, an outright comparison between Platt scaling and Venn-Abers predictors shows that the latter most often produced significantly better calibrations, especially when calibrated on out-of-bag instances. 

Place, publisher, year, edition, pages
IEEE, 2019
Keywords
Bagging, Calibration, Classification, Multilayer perceptrons, Probabilistic prediction, Venn-Abers predictors, Classification (of information), Multilayer neural networks, Multilayers, Best practices, Empirical investigation, Probabilistic classifiers, Sharp contrast
National Category
Computer Sciences
Identifiers
urn:nbn:se:hj:diva-46689 (URN)10.1109/IJCNN.2019.8851962 (DOI)2-s2.0-85073208584 (Scopus ID)9781728119854 (ISBN)
Conference
2019 International Joint Conference on Neural Networks, IJCNN 2019, Budapest, Hungary, 14 - 19 July 2019
Available from: 2019-10-25 Created: 2019-10-25 Last updated: 2019-10-25Bibliographically approved
Johansson, U., Löfström, T. & Boström, H. (2019). Calibrating probability estimation trees using Venn-Abers predictors. In: SIAM International Conference on Data Mining, SDM 2019: . Paper presented at 19th SIAM International Conference on Data Mining, SDM 2019, Hyatt Regency Calgary, Calgary, Canada, 2 - 4 May 2019 (pp. 28-36). Society for Industrial and Applied Mathematics
Open this publication in new window or tab >>Calibrating probability estimation trees using Venn-Abers predictors
2019 (English)In: SIAM International Conference on Data Mining, SDM 2019, Society for Industrial and Applied Mathematics, 2019, p. 28-36Conference paper, Published paper (Refereed)
Abstract [en]

Class labels output by standard decision trees are not very useful for making informed decisions, e.g., when comparing the expected utility of various alternatives. In contrast, probability estimation trees (PETs) output class probability distributions rather than single class labels. It is well known that estimating class probabilities in PETs by relative frequencies often lead to extreme probability estimates, and a number of approaches to provide more well-calibrated estimates have been proposed. In this study, a recent model-agnostic calibration approach, called Venn-Abers predictors is, for the first time, considered in the context of decision trees. Results from a large-scale empirical investigation are presented, comparing the novel approach to previous calibration techniques with respect to several different performance metrics, targeting both predictive performance and reliability of the estimates. All approaches are considered both with and without Laplace correction. The results show that using Venn-Abers predictors for calibration is a highly competitive approach, significantly outperforming Platt scaling, Isotonic regression and no calibration, with respect to almost all performance metrics used, independently of whether Laplace correction is applied or not. The only exception is AUC, where using non-calibrated PETs together with Laplace correction, actually is the best option, which can be explained by the fact that AUC is not affected by the absolute, but only relative, values of the probability estimates. 

Place, publisher, year, edition, pages
Society for Industrial and Applied Mathematics, 2019
Keywords
Calibration, Data mining, Decision trees, Forestry, Laplace transforms, Calibration techniques, Class probabilities, Empirical investigation, Performance metrics, Predictive performance, Probability estimate, Probability estimation trees, Relative frequencies, Probability distributions
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:hj:diva-44355 (URN)10.1137/1.9781611975673.4 (DOI)2-s2.0-85066082095 (Scopus ID)9781611975673 (ISBN)
Conference
19th SIAM International Conference on Data Mining, SDM 2019, Hyatt Regency Calgary, Calgary, Canada, 2 - 4 May 2019
Available from: 2019-06-11 Created: 2019-06-11 Last updated: 2019-08-22Bibliographically approved
Johansson, U., Sonstrod, C., Löfström, T. & Bostrom, H. (2019). Customized interpretable conformal regressors. In: Proceedings - 2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019: . Paper presented at 6th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019, Washington, United States, 5 - 8 October, 2019 (pp. 221-230). Institute of Electrical and Electronics Engineers (IEEE), Article ID 8964179.
Open this publication in new window or tab >>Customized interpretable conformal regressors
2019 (English)In: Proceedings - 2019 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 221-230, article id 8964179Conference paper, Published paper (Refereed)
Abstract [en]

Interpretability is recognized as a key property of trustworthy predictive models. Only interpretable models make it straightforward to explain individual predictions, and allow inspection and analysis of the model itself. In real-world scenarios, these explanations and insights are often needed for a specific batch of predictions, i.e., a production set. If the input vectors for this production set are available when generating the predictive model, a methodology called oracle coaching can be used to produce highly accurate and interpretable models optimized for the specific production set. In this paper, oracle coaching is, for the first time, combined with the conformal prediction framework for predictive regression. A conformal regressor, which is built on top of a standard regression model, outputs valid prediction intervals, i.e., the error rate on novel data is bounded by a preset significance level, as long as the labeled data used for calibration is exchangeable with production data. Since validity is guaranteed for all conformal predictors, the key performance metric is the size of the prediction intervals, where tighter (more efficient) intervals are preferred. The efficiency of a conformal model depends on several factors, but more accurate underlying models will generally also lead to improved efficiency in the corresponding conformal predictor. A key contribution in this paper is the design of setups ensuring that when oracle coached regression trees, that per definition utilize knowledge about production data, are used as underlying models for conformal regressors, these remain valid. The experiments, using 20 publicly available regression data sets, demonstrate the validity of the suggested setups. Results also show that utilizing oracle-coached underlying models will generally lead to significantly more efficient conformal regressors, compared to when these are built on top of models induced using only training data. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
Conformal prediction, Interpretability, Oracle coaching, Predictive modeling, Regression trees, Advanced Analytics, Efficiency, Forecasting, Forestry, Labeled data, Regression analysis, Conformal predictions, Trees (mathematics)
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:hj:diva-47936 (URN)10.1109/DSAA.2019.00037 (DOI)2-s2.0-85079278508 (Scopus ID)9781728144931 (ISBN)
Conference
6th IEEE International Conference on Data Science and Advanced Analytics, DSAA 2019, Washington, United States, 5 - 8 October, 2019
Note

This work was supported by the Swedish Knowledge Foundation (DATAKIND 20190194) and by Region Jönköping (DATAMINE HJ 2016/874-51).

Available from: 2020-03-05 Created: 2020-03-05 Last updated: 2020-03-05Bibliographically approved
Linusson, H., Johansson, U. & Boström, H. (2019). Efficient conformal predictor ensembles. Neurocomputing
Open this publication in new window or tab >>Efficient conformal predictor ensembles
2019 (English)In: Neurocomputing, ISSN 0925-2312, E-ISSN 1872-8286Article in journal (Refereed) Epub ahead of print
Abstract [en]

In this paper, we study a generalization of a recently developed strategy for generating conformal predictor ensembles: out-of-bag calibration. The ensemble strategy is evaluated, both theoretically and empirically, against a commonly used alternative ensemble strategy, bootstrap conformal prediction, as well as common non-ensemble strategies. A thorough analysis is provided of out-of-bag calibration, with respect to theoretical validity, empirical validity (error rate), efficiency (prediction region size) and p-value stability (the degree of variance observed over multiple predictions for the same object). Empirical results show that out-of-bag calibration displays favorable characteristics with regard to these criteria, and we propose that out-of-bag calibration be adopted as a standard method for constructing conformal predictor ensembles.

Place, publisher, year, edition, pages
Elsevier, 2019
Keywords
Classification, Conformal prediction, Ensembles, Classification (of information), Forecasting, Statistical methods, Conformal predictions, Conformal predictors, Ensemble strategies, Error rate, P-values, Region size, Calibration, article, bootstrapping, prediction, theoretical study, validity
National Category
Computer Engineering
Identifiers
urn:nbn:se:hj:diva-47219 (URN)10.1016/j.neucom.2019.07.113 (DOI)2-s2.0-85076549331 (Scopus ID)
Funder
Knowledge Foundation, 20150185
Available from: 2020-01-02 Created: 2020-01-02 Last updated: 2020-01-02
Johansson, U., Löfström, T., Linusson, H. & Boström, H. (2019). Efficient Venn Predictors using Random Forests. Machine Learning, 108(3), 535-550
Open this publication in new window or tab >>Efficient Venn Predictors using Random Forests
2019 (English)In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 108, no 3, p. 535-550Article in journal (Refereed) Published
Abstract [en]

Successful use of probabilistic classification requires well-calibrated probability estimates, i.e., the predicted class probabilities must correspond to the true probabilities. In addition, a probabilistic classifier must, of course, also be as accurate as possible. In this paper, Venn predictors, and its special case Venn-Abers predictors, are evaluated for probabilistic classification, using random forests as the underlying models. Venn predictors output multiple probabilities for each label, i.e., the predicted label is associated with a probability interval. Since all Venn predictors are valid in the long run, the size of the probability intervals is very important, with tighter intervals being more informative. The standard solution when calibrating a classifier is to employ an additional step, transforming the outputs from a classifier into probability estimates, using a labeled data set not employed for training of the models. For random forests, and other bagged ensembles, it is, however, possible to use the out-of-bag instances for calibration, making all training data available for both model learning and calibration. This procedure has previously been successfully applied to conformal prediction, but was here evaluated for the first time for Venn predictors. The empirical investigation, using 22 publicly available data sets, showed that all four versions of the Venn predictors were better calibrated than both the raw estimates from the random forest, and the standard techniques Platt scaling and isotonic regression. Regarding both informativeness and accuracy, the standard Venn predictor calibrated on out-of-bag instances was the best setup evaluated. Most importantly, calibrating on out-of-bag instances, instead of using a separate calibration set, resulted in tighter intervals and more accurate models on every data set, for both the Venn predictors and the Venn-Abers predictors.

Place, publisher, year, edition, pages
Springer, 2019
Keywords
Probabilistic prediction, Venn predictors, Venn-Abers predictors, Random forests, Out-of-bag calibration
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:hj:diva-41127 (URN)10.1007/s10994-018-5753-x (DOI)000459945900008 ()2-s2.0-85052523706 (Scopus ID)HOA JTH 2019 (Local ID)HOA JTH 2019 (Archive number)HOA JTH 2019 (OAI)
Available from: 2018-08-13 Created: 2018-08-13 Last updated: 2019-08-22Bibliographically approved
Johansson, U., Löfström, T., Boström, H. & Sönströd, C. (2019). Interpretable and Specialized Conformal Predictors. In: Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Evgueni Smirnov (Ed.), Conformal and Probabilistic Prediction and Applications: . Paper presented at Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications, 9-11 September 2019, Golden Sands, Bulgaria (pp. 3-22).
Open this publication in new window or tab >>Interpretable and Specialized Conformal Predictors
2019 (English)In: Conformal and Probabilistic Prediction and Applications / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Evgueni Smirnov, 2019, p. 3-22Conference paper, Published paper (Refereed)
Abstract [en]

In real-world scenarios, interpretable models are often required to explain predictions, and to allow for inspection and analysis of the model. The overall purpose of oracle coaching is to produce highly accurate, but interpretable, models optimized for a specific test set. Oracle coaching is applicable to the very common scenario where explanations and insights are needed for a specific batch of predictions, and the input vectors for this test set are available when building the predictive model. In this paper, oracle coaching is used for generating underlying classifiers for conformal prediction. The resulting conformal classifiers output valid label sets, i.e., the error rate on the test data is bounded by a preset significance level, as long as the labeled data used for calibration is exchangeable with the test set. Since validity is guaranteed for all conformal predictors, the key performance metric is efficiency, i.e., the size of the label sets, where smaller sets are more informative. The main contribution of this paper is the design of setups making sure that when oracle-coached decision trees, that per definition utilize knowledge about test data, are used as underlying models for conformal classifiers, the exchangeability between calibration and test data is maintained. Consequently, the resulting conformal classifiers retain the validity guarantees. In the experimentation, using a large number of publicly available data sets, the validity of the suggested setups is empirically demonstrated. Furthermore, the results show that the more accurate underlying models produced by oracle coaching also improved the efficiency of the corresponding conformal classifiers.

Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 105
Keywords
Interpretability, Decision trees, Classification, Oracle coaching, Conformal prediction
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:hj:diva-46804 (URN)
Conference
Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications, 9-11 September 2019, Golden Sands, Bulgaria
Available from: 2019-11-11 Created: 2019-11-11 Last updated: 2019-11-11Bibliographically approved
Boström, H., Johansson, U. & Vesterberg, A. (2019). Predicting with Confidence from Survival Data. In: Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Evgueni Smirnov (Ed.), Conformal and Probabilistic Prediction and Applications: . Paper presented at Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications, 9-11 September 2019, Golden Sands, Bulgaria (pp. 123-141).
Open this publication in new window or tab >>Predicting with Confidence from Survival Data
2019 (English)In: Conformal and Probabilistic Prediction and Applications / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Evgueni Smirnov, 2019, p. 123-141Conference paper, Published paper (Refereed)
Abstract [en]

Survival modeling concerns predicting whether or not an event will occur before or on a given point in time. In a recent study, the conformal prediction framework was applied to this task, and so-called conformal random survival forest was proposed. It was empirically shown that the error level of this model indeed is very close to the provided confidence level, and also that the error for predicting each outcome, i.e., event or no-event, can be controlled separately by employing a Mondrian approach. The addressed task concerned making predictions for time points as provided by the underlying distribution. However, if one instead is interested in making predictions with respect to some specific time point, the guarantee of the conformal prediction framework no longer holds, as one is effectively considering a sample from another distribution than from which the calibration instances have been drawn. In this study, we propose a modification of the approach for specific time points, which transforms the problem into a binary classification task, thereby allowing the error level to be controlled. The latter is demonstrated by an empirical investigation using both a collection of publicly available datasets and two in-house datasets from a truck manufacturing company.

Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 105
Keywords
Conformal prediction, survival modeling, random forests.
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:hj:diva-46802 (URN)
Conference
Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications, 9-11 September 2019, Golden Sands, Bulgaria
Available from: 2019-11-11 Created: 2019-11-11 Last updated: 2019-11-11Bibliographically approved
Giri, C., Johansson, U. & Löfström, T. (2019). Predictive modeling of campaigns to quantify performance in fashion retail industry. In: Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019: . Paper presented at 2019 IEEE International Conference on Big Data, Big Data 2019, Los Angeles, United States, 9-12 December 2019 (pp. 2267-2273). IEEE
Open this publication in new window or tab >>Predictive modeling of campaigns to quantify performance in fashion retail industry
2019 (English)In: Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, IEEE, 2019, p. 2267-2273Conference paper, Published paper (Refereed)
Abstract [en]

Managing campaigns and promotions effectively is vital for the fashion retail industry. While retailers invest a lot of money in campaigns, customer retention is often very low. At innovative retailers, data-driven methods, aimed at understanding and ultimately optimizing campaigns are introduced. In this application paper, machine learning techniques are employed to analyze data about campaigns and promotions from a leading Swedish e-retailer. More specifically, predictive modeling is used to forecast the profitability and activation of campaigns using different kinds of promotions. In the empirical investigation, regression models are generated to estimate the profitability, and classification models are used to predict the overall success of the campaigns. In both cases, random forests are compared to individual tree models. As expected, the more complex ensembles are more accurate, but the usage of interpretable tree models makes it possible to analyze the underlying relationships, simply by inspecting the trees. In conclusion, the accuracy of the predictive models must be deemed high enough to make these data-driven methods attractive.

Place, publisher, year, edition, pages
IEEE, 2019
Keywords
Campaign Prediction, Decision Trees, Fashion retail, Machine Learning, Predictive Modeling, Random Forest, Big data, Forecasting, Forestry, Learning systems, Profitability, Random forests, Regression analysis, Classification models, Customer retention, Data-driven methods, Empirical investigation, Individual tree model, Machine learning techniques, Sales
National Category
Computer Sciences
Identifiers
urn:nbn:se:hj:diva-48026 (URN)10.1109/BigData47090.2019.9005492 (DOI)2-s2.0-85081295913 (Scopus ID)9781728108582 (ISBN)9781728108599 (ISBN)
Conference
2019 IEEE International Conference on Big Data, Big Data 2019, Los Angeles, United States, 9-12 December 2019
Funder
Knowledge Foundation, 20160035
Available from: 2020-03-30 Created: 2020-03-30 Last updated: 2020-03-30Bibliographically approved
Löfström, T., Johansson, U., Balkow, J. & Sundell, H. (2018). A data-driven approach to online fitting services. In: Liu, J, Lu, J, Xu, Y, Martinez, L & Kerre, EE (Ed.), Data Science And Knowledge Engineering For Sensing Decision Support: . Paper presented at 13th International Conference on Fuzzy Logic and Intelligent Technologies in Nuclear Science (FLINS), Belfast, Ireland, 21-24 August, 2018 (pp. 1559-1566). World Scientific, 11
Open this publication in new window or tab >>A data-driven approach to online fitting services
2018 (English)In: Data Science And Knowledge Engineering For Sensing Decision Support / [ed] Liu, J, Lu, J, Xu, Y, Martinez, L & Kerre, EE, World Scientific, 2018, Vol. 11, p. 1559-1566Conference paper, Published paper (Refereed)
Abstract [en]

Being able to accurately predict several attributes related to size is vital for services supporting online fitting. In this paper, we investigate a data-driven approach, while comparing two different supervised modeling techniques for predictive regression; standard multiple linear regression and neural networks. Using a fairly large, publicly available, data set of high quality, the main results are somewhat discouraging. Specifically, it is questionable whether key attributes like sleeve length, neck size, waist and chest can be modeled accurately enough using easily accessible input variables as sex, weight and height. This is despite the fact that several services online offer exactly this functionality. For this specific task, the results show that standard linear regression was as accurate as the potentially more powerful neural networks. Most importantly, comparing the predictions to reasonable levels for acceptable errors, it was found that an overwhelming majority of all instances had at least one attribute with an unacceptably high prediction error. In fact, if requiring that all variables are predicted with an acceptable accuracy, less than 5 % of all instances met that criterion. Specifically, for females, the success rate was as low as 1.8 %.

Place, publisher, year, edition, pages
World Scientific, 2018
Series
World Scientific Proceedings Series on Computer Engineering and Information Science ; 11
Keywords
Predictive regression; online fitting; fashion
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:hj:diva-44183 (URN)10.1142/9789813273238_0194 (DOI)000468160600194 ()978-981-3273-24-5 (ISBN)978-981-3273-22-1 (ISBN)
Conference
13th International Conference on Fuzzy Logic and Intelligent Technologies in Nuclear Science (FLINS), Belfast, Ireland, 21-24 August, 2018
Available from: 2019-06-11 Created: 2019-06-11 Last updated: 2019-08-22Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0412-6199

Search in DiVA

Show all publications