Comparison of five dimensionality reduction methods on the multidimensional “Forest Cover Type” dataset


Published: Jan 2, 2026
Keywords:
Multivariate data Big data Principal Components Analysis Factor Analysis Correspondence Analysis Categorical Principal Components Analysis Factor Analysis for Mixed Data
Zacharenia Kyrana
https://orcid.org/0000-0001-9269-0675
Emmanouil Pratsinakis
Nikolaos Papafilippou
Angelos Markos
George Menexes
Abstract

A multidimensional and multivariate structure, with mixed data type gives to researchers the opportunity to use many statistical methods of dimensionality reduction, which aimed at a reduced representation of the original data set that will be smaller in “volume” but will still contain critical and useful information. In this study, the statistical dimensionality reduction methods that were compared with each other, through appropriate data set, were Principal Components Analysis, Factor Analysis, Correspondence Analysis, Categorical Principal Components Analysis and Factor Analysis for Mixed Data. For the comparisons of these methods, various strategies were applied. The aims of this study were to compare the results of five dimensionality reduction methods, to check the application of these methods to multidimensional mixed data type and to compare the results’ extraction times from different statistical softwares, with purpose to highlighting significant computational and interpretive disadvantages. The statistical softwares used were Python and SPSS. Important disadvantages of these methods were the “curse of dimensionality”, in the sense of determining the number of important dimensions, the increased computing power that was required, the lack of softwares’ code for certain methods, the differentiation in terms of calculations between the softwares and by extension the extraction of different results, the disability of some softwares in terms of the management of many pseudo-variables and the difficulty of highlighting the most appropriate method for reducing the mathematical dimensions.

Article Details
  • Section
  • Empirical studies
Downloads
Download data is not yet available.
References
Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis (2nd ed.). New York: John Wiley & Sons, Inc.
Bellman, R. E. (1961). Adaptive Control Processes: A Guided Tour. Princeton University Press.
Cunningham, J. P., & Ghahramani, Z. (2015). Linear Dimensionality Reduction: Survey, Insights, and Generalizations. Journal of Machine Learning Research, 16(89), 2859−2900. https://doi.org/10.48550/arXiv.1406.0873
Dash, M., Liu, H., & Yao, J. (1997). Dimensionality reduction of unsupervised data. In Proceedings of the International Conference on Tools with Artificial Intelligence. IEEE.
Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2010). Multivariate Data Analysis: A Global Perspective (7th ed.). New Jersey: Pearson Education, Inc.
Hendrickson, J. L. (2014). Methods for Clustering Mixed Data. Doctoral dissertation in University of South Carolina. Columbia.
Kassambara, A. (2017). Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning (vol. 1). STHDA.
Linting, M., Meulman, J. J., Groenen, P. J., & van der Koojj, A. J. (2007). Nonlinear principal components analysis: introduction and application. Psychol Methods, 12(3), 336-358. https://doi.org/10.1037/1082-989X.12.3.336
Μενεξές, Γ. (2006). Πειραματικοί Σχεδιασμοί στην Ανάλυση Δεδομένων. Διδακτορική Διατριβή στο Τμήμα Εφαρμοσμένης Πληροφορικής του Πανεπιστημίου Μακεδονίας. Θεσσαλονίκη.
Messaoud, R. B., Boussaïd, O., & Loudcher-Rabaseda, S. (2007). A Multiple Correspondence Analysis to Organize Data Cubes. Frontiers in Artificial Intelligence and Applications, 155 (1), 133-146. https://halshs.archives-ouvertes.fr/halshs-00476483
Nguyen, L. H., & Holmes, S. (2019). Ten quick tips for effective dimensionality reduction. PLoS Computational Biology, 15 (6). https://doi.org/10.1371/journal.pcbi.1006907
Pagès, J. (2014). Multiple Factor Analysis by Example Using R (1st ed.). USA: Chapman & Hall/CRC.
Sharma, S. (1996). Applied Multivariate Techniques. New York: John Willey & Sons, Inc.
Tabachnick, B. G., & Fidell, L. S. (2007). Using Multivariate Statistics (5th ed.). New York: Allyn & Bacon/Pearson Education.
UCI Machine Learning Repository. Covertype Data Set. https://archive.ics.uci.edu/ml/datasets/covertype.
Most read articles by the same author(s)