ΣΝΕΛ: Ένας νέος γλωσσικός πόρος για τη μελέτη της λογοτεχνίας στα ελληνικά


Published: May 13, 2026
Dionisis Goutsos
Χριστιάνα Νίκα
Κωνσταντίνος Περήφανος
Γεωργία Φραγκάκη
Abstract

This paper presents the principles and procedures involved in creating a new linguistic resource for Greek, the Corpus of Modern Greek Literature (CMGL), designed to support the systematic diachronic study of twentieth-century Greek literature. We first outline the conceptual framework behind the development of CMGL, placing it in relation to comparable resources in other languages, for which a brief overview is provided. We then describe the process of compiling the corpus, with particular emphasis on the Logios platform, developed specifically for the digitization of polytonic texts (Perifanos & Goutsos 2025). At its current stage, CMGL contains 133 works of modern Greek literature, in the polytonic or monotonic spelling system, published between 1927 and 1999, amounting to approximately 5.5 million words. The target size is 146 literary works. The corpus includes novels, short-story collections, poetry collections and theatrical plays. The article concludes with a presentation of preliminary findings that highlight the analytical possibilities offered by corpus-based stylistic methods. Specifically, we present frequency lists derived from CMGL, including lexical bundles, as well as measurements of lexical density, average sentence length and readability scores for the texts included in the corpus. In addition, we provide charts illustrating the diachronic development of grammatical and lexical forms, which point to research directions that can be further expanded.

Article Details
  • Section
  • Articles
Downloads
Download data is not yet available.
Most read articles by the same author(s)