Comparison of dimensionality reduction methods and strategies on multivariate categorical data
Abstract
The analysis of ‘high’-dimensional categorical data poses significant challenges in Data Science, Machine Learning, and Statistics, particularly in terms of the study of variability (inertia) of measured characteristics, its structure and components as well as the interpretation of the results. This paper addresses the above issues by investigating and comparing various methods and strategies for the dimensionality reduction of categorical data. These strategies were applied to the "Forest Cover Type, n=581,012" dataset from the UCI Machine Learning Repository. The proposed strategies, which provided more and sometimes different information about the structure of variability, were evaluated by applying and comparing several methods, such as Multiple Correspondence Analysis (MCA), Non-Linear Categorical Principal Components Analysis with Optimal Scaling (CATPCA), Principal Components Analysis (PCA), Factor Analysis for Mixed Data (FAMD), Nonlinear Canonical Correlation Analysis (NLCCA), and Multiple Factor Analysis (MFA). The results showed that the use of different strategies is probably required depending on the nature of the data and the research objectives. Also, demonstrated the applicability of each method in different contexts and revealed that while no single approach is “universally” superior, strategies tailored to the data's nature, such as Singular Value Decomposition (SVD) on several correlation matrices followed by PCA, combining MCA and CATPCA or advanced methods like FAMD, NLCCA or MFA, offer alternative solutions. In general, it is wiser to apply different analysis strategies depending on the objectives of the study and the researcher’s willingness on how the variables should be handled (nominal, ordinal, scale) in a specific scientific frame.
Article Details
- How to Cite
-
ΠΑΠΑΦΙΛΙΠΠΟΥ Ν., Kyrana, Z., Pratsinakis, E., Dordas, C., Markos, A., & Menexes, G. (2026). Comparison of dimensionality reduction methods and strategies on multivariate categorical data. Data Analysis Bulletin, 21(1). Retrieved from https://ejournals.epublishing.ekt.gr/index.php/dab/article/view/39563
- Section
- Empirical studies

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish their work in the journal DATA ANALYSIS BULLETIN agree to the following terms:
1. Authors will not be charged any submission, processing or publication fees for their work. These costs are covered by the Greek Society of Data Analysis.
2. The copyright of papers published in the journal DATA ANALYSIS BULLETIN is protected by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license. The Authors retain the Copyright and grant the journal the right of first publication. This license allows third party licensees to use the work in any form for non-commercial purposes only. If third parties modify or adapt the content, they must license the modified material for noncommercial purposes only. If others modify or adapt the material, they must license the modified material under identical terms.
3. Provided that the terms of the licence concerning the reference to the original author and the original publication in the journal DATA ANALYSIS BULLETIN are maintained.
4. Authors may enter into separate and additional contracts and agreements for the non-exclusive distribution of the work as published in the DATA ANALYSIS BULLETIN journal (e.g., deposit in academic repositories), provided that the acknowledgement and citation of the first publication in the DATA ANALYSIS BULLETIN journal is acknowledged.
5. The DATA ANALYSIS BULLETIN journal allows and encourages authors to deposit their work in institutional (e.g. the repository of the National Documentation Centre) or thematic repositories, after publication in DATA ANALYSIS BULLETIN and under Open Access conditions, as determined by their research funders and/or the institutions with which they collaborate, as appropriate. When submitting their work, authors should provide information on the publication of the work in the journal and the sources of funding for their research. Lists of institutional and thematic repositories by country are available at http://opendoar.org/countrylist.php. Authors can deposit their work free of charge in the repository www.zenodo.org, which is supported by OpenAIRE (www.openaire.eu ), as part of the European Commission's policies to support Open Academic Research.