Hypatia Digital Library: A novel text classification approach for small text fragments
Abstract
Purpose - The purpose of this paper is to further investigate prior work of the authors in text classification in Hypatia, the digital library of University of Western Attica. The main objective is to provide an accurate automated classification tool as an alternative to manual assignments.
Design/methodology/approach - The crucial point in text classification is the selection of the most important term-words for document representation. The specific document collection consists of 718 abstracts in Medicine, Tourism and Food Technology. Two weighting methods were investigated: classic TF.IDF and DEVMAX.DF. The last one was proposed by the authors as a more accurate term-word selection tool for smaller text fragments. Classification was conducted by applying 14 classifiers available on WEKA.
Findings - Classification process yielded an excellent ~97% precision score and DEVMAX.DF proved to perform better than classic TF.IDF.
Article Details
- How to Cite
-
Triantafyllou, I., Vorgia, F., & Koulouris , A. (2019). Hypatia Digital Library: A novel text classification approach for small text fragments. Journal of Integrated Information Management, 4(2), 16–23. Retrieved from https://ejournals.epublishing.ekt.gr/index.php/jiim/article/view/37872
- Section
- Research Articles
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright Notice
Authors who publish with JIIM agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution Non-Commercial License that allows others to share the work with:
- An acknowledgment of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post their work online (preferably in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.