Change search
Link to record
Permanent link

Direct link
Duras, Toni
Publications (8 of 8) Show all publications
Duras, T., Javed, F., Månsson, K., Sjölander, P. & Söderberg, M. (2023). Using machine learning to select variables in data envelopment analysis: Simulations and application using electricity distribution data. Energy Economics, 120, Article ID 106621.
Open this publication in new window or tab >>Using machine learning to select variables in data envelopment analysis: Simulations and application using electricity distribution data
Show others...
2023 (English)In: Energy Economics, ISSN 0140-9883, E-ISSN 1873-6181, Vol. 120, article id 106621Article in journal (Refereed) Published
Abstract [en]

Agencies that regulate electricity providers often apply nonparametric data envelopment analysis (DEA) to assess the relative efficiency of each firm. The reliability and validity of DEA are contingent upon selecting relevant input variables. In the era of big (wide) data, the assumptions of traditional variable selection techniques are often violated due to challenges related to high-dimensional data and their standard empirical properties. Currently, regulators have access to a large number of potential input variables. Therefore, our aim is to introduce new machine learning methods for regulators of the energy market. We also propose a new two-step analytical approach where, in the first step, the machine learning-based adaptive least absolute shrinkage and selection operator (ALASSO) is used to select variables and, in the second step, selected variables are used in a DEA model. In contrast to previous research, we find, by using a more realistic data-generating process common for production functions (i.e., Cobb–Douglas and Translog), that the performance of different machine learning techniques differs substantially in different empirically relevant situations. Simulations also reveal that the ALASSO is superior to other machine learning and regression-based methods when the collinearity is low or moderate. However, in situations of multicollinearity, the LASSO approach exhibits the best performance. We also use real data from the Swedish electricity distribution market to illustrate the empirical relevance of selecting the most appropriate variable selection method.

Place, publisher, year, edition, pages
Elsevier, 2023
Keywords
Clustering algorithms, Commerce, Data envelopment analysis, Electric utilities, Regression analysis, Curse of dimensionality, Electricity distribution, Input variables, Least absolute shrinkage and selection operators, Machine-learning, Nonparametrics, Performance, Regulation, Relative efficiency, Variables selections, Machine learning, Variable selection
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:hj:diva-60028 (URN)10.1016/j.eneco.2023.106621 (DOI)000972681000001 ()2-s2.0-85150299788 (Scopus ID)HOA;intsam;868713 (Local ID)HOA;intsam;868713 (Archive number)HOA;intsam;868713 (OAI)
Available from: 2023-03-27 Created: 2023-03-27 Last updated: 2023-05-29Bibliographically approved
Duras, T. (2022). The fixed effects PCA model in a common principal component environment. Communications in Statistics - Theory and Methods, 51(6), 1653-1673
Open this publication in new window or tab >>The fixed effects PCA model in a common principal component environment
2022 (English)In: Communications in Statistics - Theory and Methods, ISSN 0361-0926, E-ISSN 1532-415X, Vol. 51, no 6, p. 1653-1673Article in journal (Refereed) Published
Abstract [en]

This paper explores multivariate data using principal component analysis (PCA). Traditionally, two different approaches to PCA have been considered, an algebraic descriptive one and a probabilistic one. Here, a third type of PCA approach, lying somewhere between the two traditional approaches, called the fixed effects PCA model, is considered. This model includes mainly geometrical, rather than probabilistic assumptions, such as the optimal choice of dimensionality and metric. The model is designed to account for any possible prior information about the noise in the data to yield better estimates. Parameters are estimated by minimizing a least-squares criterion with respect to a specified metric. A suggestion of how the fixed effects PCA estimates can be improved in a common principal component (CPC) environment is made. If the CPC assumption is fulfilled, then the fixed effects PCA model can consider more information by incorporating common principal component analysis (CPCA) theory into the estimation procedure. 

Place, publisher, year, edition, pages
Taylor & Francis, 2022
Keywords
Common principal component analysis, exploratory data analysis, fixed effects principal component analysis model, principal component analysis, Communication, Mathematical techniques, Statistical methods, Statistics, Estimation procedures, Least squares criterion, Multivariate data, Optimal choice, Principal Components, Prior information, Probabilistic assumptions, Traditional approaches
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:hj:diva-50079 (URN)10.1080/03610926.2020.1765255 (DOI)000536982400001 ()2-s2.0-85085552681 (Scopus ID)HOA;;1454344 (Local ID)HOA;;1454344 (Archive number)HOA;;1454344 (OAI)
Available from: 2020-07-16 Created: 2020-07-16 Last updated: 2022-04-09Bibliographically approved
Duras, T. (2019). A comparison of two estimation methods for common principal components. Communications in statistics. Case studies, data analysis and applications, 5(4), 366-393
Open this publication in new window or tab >>A comparison of two estimation methods for common principal components
2019 (English)In: Communications in statistics. Case studies, data analysis and applications, E-ISSN 2373-7484, Vol. 5, no 4, p. 366-393Article in journal (Refereed) Published
Abstract [en]

Common principal components (CPCs) are often estimated using maximum likelihood estimation through an algorithm called the Flury–Gautschi (FG) Algorithm. Krzanowski proposed a simpler estimation method via a principal component analysis of a weighted sum of the sample covariance matrices. These methods are compared for real-world datasets and in a Monte Carlo simulation. The real-world data is used to compare the selection of a common eigenvector model and the estimated coefficients. The simulation study investigates how the accuracy of the methods is affected by autocorrelation, the number of covariance matrices, dimensions, and sample sizes for multivariate normal and chi-square distributed data. The findings in this article support the use of Krzanowski’s method in situations where the CPC assumption is appropriate. 

Place, publisher, year, edition, pages
Taylor & Francis, 2019
Keywords
Common principal components, identification of common eigenvector models, maximum likelihood estimation, Monte Carlo simulation
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:hj:diva-38578 (URN)10.1080/23737484.2019.1656117 (DOI)2-s2.0-85080129292 (Scopus ID)HOA;;1174952 (Local ID)HOA;;1174952 (Archive number)HOA;;1174952 (OAI)
Available from: 2018-01-17 Created: 2018-01-17 Last updated: 2021-02-26Bibliographically approved
Duras, T. (2019). Applications of common principal components in multivariate and high-dimensional analysis. (Doctoral dissertation). Jönköping: Jönköping University, Jönköping International Business School
Open this publication in new window or tab >>Applications of common principal components in multivariate and high-dimensional analysis
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis consists of four papers, all exploring some aspect of common principal component analysis (CPCA), the generalization of PCA to multiple groups. The basic assumption of the CPC model is that the space spanned by the eigenvectors is identical across several groups, whereas eigenvalues associated with the eigenvectors can vary. CPCA is used in essentially the same areas and applications as PCA.

The first paper compares the performance of the maximum likelihood and Krzanowski’s estimators of the CPC model for two real-world datasets and in a Monte Carlo simulation study. The simplicity and intuition of Krzanowski's estimator and the findings in this paper support and promote the use of this estimator for CPC models over the maximum likelihood estimator.

Paper number two uses CPCA as a tool for imposing restrictions on system-wise regression models. The paper contributes to the field by proposing a variety of explicit estimators, deriving their properties and identifying the appropriate amount of smoothing that should be imposed on the estimator. 

In the third paper, a generalization of the fixed effects PCA model to multiple populations in a CPC environment is proposed. The model includes mainly geometrical, rather than probabilistic, assumptions, and is designed to account for any possible prior information about the noise in the data to yield better estimates, obtained by minimizing a least squares criterion with respect to a specified metric.

The fourth paper survey some properties of the orthogonal group and the associated Haar measure on it. It is demonstrated how seemingly abstract results contribute to applied statistics and, in particular, to PCA.

Abstract [sv]

Denna avhandling består av fyra papper som alla utforskar någon del av gemensam principalkomponentanalys (CPCA), generaliseringen av principal-komponentanalys (PCA) till flera grupper. Det grundläggande antagandet av CPC-modellen är att egenvektorerna är identiska för samtliga grupper medan de associerade egenvärdena kan variera.

Det första pappret jämför prestationen av maximum likelihood estimatorn och Krzanowskis estimator för CPC-modellen för två verkliga dataset och i en Monte Carlo-simuleringstudie. Enkelheten och intuitionen av Krzanowskis estimator samt resultaten i detta papper stödjer användningen av denna estimator för CPC-modeller över maximum likelihood-estimatorn.

Papper nummer två använder CPCA som ett verktyg för att införa restriktioner på systemvisa regressionsmodeller. Pappret bidrar till området genom att föreslå en rad olika estimatorer, härleda deras egenskaper och identifiera lämplig mängd utjämning som ska åläggas estimatorn.

I det tredje pappret föreslås en generalisering av PCA-modellen med icke-stokastiska effekter till flera populationer i en CPC-miljö. Modellen innehåller huvudsakligen geometriska, snarare än probabilistiska antaganden och är utformad för att betrakta eventuell information om bruset i dataseten för att ge bättre uppskattningar; erhållna genom att minimera ett minsta kvadratkriterium med avseende på ett specificerat metriskt rum.

Det fjärde pappret undersöker egenskaper hos den ortogonala gruppen och det associerade Haar-måttet på gruppen. Det demonstreras hur till synes abstrakta resultat är viktiga för tillämpad statistik och i synnerhet för PCA.

Place, publisher, year, edition, pages
Jönköping: Jönköping University, Jönköping International Business School, 2019. p. 56
Series
JIBS Dissertation Series, ISSN 1403-0470 ; 131
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:hj:diva-43519 (URN)978-91-86345-93-8 (ISBN)
Public defence
2019-05-17, B1014, Jönköping International Business School, Jönköping, 10:00 (English)
Opponent
Supervisors
Available from: 2019-04-23 Created: 2019-04-23 Last updated: 2019-04-23Bibliographically approved
Duras, T. (2017). Aspects of common principal components. (Licentiate dissertation). Jönköping: Jönköping University, Jönköping International Business School
Open this publication in new window or tab >>Aspects of common principal components
2017 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The focus of this thesis is the common principal component (CPC) model, the generalization of principal components to several populations. Common principal components refer to a group of multidimensional datasets such that their inner products share the same eigenvectors and are therefore simultaneously diagonalized by a common decorrelator matrix. Common principal component analysis is essentially applied in the same areas and analysis as its one-population counterpart. The generalization to multiple populations comes at the cost of being more mathematically involved, and many problems in the area remains to be solved.

This thesis consists of three individual papers and an introduction chapter.In the first paper, the performance of two different estimation methods of the CPC model is compared for two real-world datasets and in a Monte Carlo simulation study. The second papers show that the orthogonal group and the Haar measure on this group plays an important role in PCA, both in single- and multi-population principal component analysis. The last paper considers using common principal component analysis as a tool for imposing restrictions on system-wise regression models. When the exogenous variables of a multi-dimensional model share common principal components, then each of the marginal models in the system is, up to their eigenvalues, identical. They henceform a class of regression models situated in between the classical seemingly unrelated regressions, where each set of explanatory variables is unique, and multivariate regression, where each marginal model shares the same common set of regressors.

Place, publisher, year, edition, pages
Jönköping: Jönköping University, Jönköping International Business School, 2017. p. 80
Series
JIBS Research Reports, ISSN 1403-0462 ; 2017-2
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:hj:diva-38587 (URN)978-91-86345-79-2 (ISBN)
Supervisors
Available from: 2018-01-17 Created: 2018-01-17 Last updated: 2018-01-17Bibliographically approved
Duras, T. & Holgersson, T.A small excursion on the Haar measure on Op.
Open this publication in new window or tab >>A small excursion on the Haar measure on Op
(English)Manuscript (preprint) (Other academic)
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:hj:diva-38584 (URN)
Available from: 2018-01-17 Created: 2018-01-17 Last updated: 2019-04-23
Duras, T.An extension of the fixed effects principal component model to a common principal component environment.
Open this publication in new window or tab >>An extension of the fixed effects principal component model to a common principal component environment
(English)Manuscript (preprint) (Other academic)
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:hj:diva-43517 (URN)
Available from: 2019-04-23 Created: 2019-04-23 Last updated: 2019-04-23
Duras, T. & Holgersson, T.Common principal components with applications in regression.
Open this publication in new window or tab >>Common principal components with applications in regression
(English)Manuscript (preprint) (Other academic)
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:hj:diva-38586 (URN)
Available from: 2018-01-17 Created: 2018-01-17 Last updated: 2019-04-23
Organisations

Search in DiVA

Show all publications