Comparison of methods and analysis’ strategies for clustering the objects of the multidimensional dataset “Forest Cover Type”
Abstract
A multidimensional and multivariate structure with mixed-type data give researchers the opportunity to use various statistical approaches and methods of data clustering (classification), through statistical packages and programming languages. The choice of clustering method used can have an impact on the results obtained, with the distance and joining method playing a key role. In this study, the Partitioning Clustering or k-means and the Hierarchical Cluster Analysis methods were compared using the "Forest Cover Type" dataset. The main objective of this study was to apply and compare these methods in the division of the data set into groups-clusters. The results of the study showed that there are several analysis strategies for clustering mixed type data, based on data coding and selection of the measurement scale of the input variables. Python exported the results faster, up to more than 100%, compared to SPSS, and it was observed that the results of both were similar except for minor differences due to numerical rounding. It was found that Hierarchical Cluster could not be performed on this or other data sets of similar size, and with the specific PC configuration used for the analyses, since both softwares "crashed". This is probable a disadvantage, as Hierarchical Cluster allows the determination of the number of clusters through dendrograms, by combining various distances and clustering joining methods, which cannot be achieved with the k-means method. Finally, it was found that the results of the Classification depend on the coding strategy and the selection of the measurement scale of the variables to be used in the analysis.
Article Details
- How to Cite
-
Pratsinakis, E., Kyrana, Z., Papafilippoy, N., Markos, A., & Menexes, G. (2026). Comparison of methods and analysis’ strategies for clustering the objects of the multidimensional dataset “Forest Cover Type”. Data Analysis Bulletin, 20(1). Retrieved from https://ejournals.epublishing.ekt.gr/index.php/dab/article/view/34017
- Section
- Empirical studies

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish their work in the journal DATA ANALYSIS BULLETIN agree to the following terms:
1. Authors will not be charged any submission, processing or publication fees for their work. These costs are covered by the Greek Society of Data Analysis.
2. The copyright of papers published in the journal DATA ANALYSIS BULLETIN is protected by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license. The Authors retain the Copyright and grant the journal the right of first publication. This license allows third party licensees to use the work in any form for non-commercial purposes only. If third parties modify or adapt the content, they must license the modified material for noncommercial purposes only. If others modify or adapt the material, they must license the modified material under identical terms.
3. Provided that the terms of the licence concerning the reference to the original author and the original publication in the journal DATA ANALYSIS BULLETIN are maintained.
4. Authors may enter into separate and additional contracts and agreements for the non-exclusive distribution of the work as published in the DATA ANALYSIS BULLETIN journal (e.g., deposit in academic repositories), provided that the acknowledgement and citation of the first publication in the DATA ANALYSIS BULLETIN journal is acknowledged.
5. The DATA ANALYSIS BULLETIN journal allows and encourages authors to deposit their work in institutional (e.g. the repository of the National Documentation Centre) or thematic repositories, after publication in DATA ANALYSIS BULLETIN and under Open Access conditions, as determined by their research funders and/or the institutions with which they collaborate, as appropriate. When submitting their work, authors should provide information on the publication of the work in the journal and the sources of funding for their research. Lists of institutional and thematic repositories by country are available at http://opendoar.org/countrylist.php. Authors can deposit their work free of charge in the repository www.zenodo.org, which is supported by OpenAIRE (www.openaire.eu ), as part of the European Commission's policies to support Open Academic Research.