Experimenting with Machine Interpreting in the PL-EN Language Pair Are We (Getting) Close to “Human-Like” Quality?
Abstract
Recent claims from technology companies suggesting that machine interpreting (MI) technology is approaching human-level quality remain largely unsubstantiated by ecologically valid empirical evidence. To address this gap, an experiment was conducted in June 2025 at the Institute of Applied Linguistics, University of Warsaw, comparing human simultaneous interpreting and two leading MI service providers in the Polish–English language pair. The experimental design simulated a real-life conference setting: a tandem of EU-accredited interpreters and two MA-level interpreting students worked alongside the two MI systems during a live event comprising an introductory speech in Polish, a 40-minute lecture in English, and a bi-directional Q&A session. Eleven student observers provided subjective perception data through an online survey (non-controlled) while recordings and transcripts served as the basis for a detailed error analysis. This paper focuses on a presentation of the latter: to gain an understanding of interpretation accuracy across the four outputs we used a simplified error-based approach adapted from Barik’s typology of errors and later methods such as the NER and NTR (Romero-Fresco and Pöchhacker 2017) and their adaptations for interpreting research (Davitti and Sandrelli 2020, Korybski and Davitti 2024).
A count of error instances in the transcripts reveals a clear quality sequence: accredited interpreters outperformed student interpreters, who in turn outperformed MI systems. As regards the weight of errors, machine-generated errors were more frequently of a major or meaning-distorting nature. This paper presents examples of common MI error types including misrecognitions propagated from automatic speech recognition, literal translations causing syntactic and stylistic distortions, redundant punctuation voicing, random language switches, and gender bias. Some errors clearly stem from a lack of contextual memory. In short, the overall speech-to-speech performance of the two MI systems lacked the flexibility, contextual awareness, and reformulation strategies characteristic of human interpreters. The findings suggest that, as of mid-2025 and given this experimental setup, MI in the PL-EN language pair remains far from human-like performance despite clear technological progress.
Article Details
- How to Cite
-
Korybski, T., Figiel, W., Tryuk, M., & Górnik, M. (2026). Experimenting with Machine Interpreting in the PL-EN Language Pair: Are We (Getting) Close to “Human-Like” Quality?. International Journal of Language, Translation and Intercultural Communication, 11, 1–14. https://doi.org/10.12681/ijltic.43557
- Section
- Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright Notice
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).