SENTENCE STRUCTURE IN HUMAN AND AI-GENERATED TEXTS: A COMPARATIVE STUDY

  • Elena Shalevska

Abstract

This mixed-method study analyzes the syntactic differences between human and AI-generated text.  To this end, the study includes a corpus of 20 essays (10 human, 10 ChatGPT-generated) across 10 topics, with each sentence in those essays manually coded for structure (simple, compound, complex, compound-complex).  Sentence length, total word count, and number of sentences are also measured to gain further insights. Preliminary results indicate that 1. Humans’ sentences are longer, on average; 2. Both human-written and AI-generated texts rarely include compound-complex sentences; 3. 60% of AI-generated text have no compound-complex sentences whatsoever, and 4. Both AI and human texts consistently rely heavily on simple sentences, though human-authored pieces of writing display more variation in their use of simple sentences across different essays.

Keywords: Syntactic features; Syntax; Artificial Intelligence; Academic Writing; Comparative analysis.

Downloads

Download data is not yet available.

References

Ackroyd, S., & Hughes, J. A. (1992). Data Collection in Context. Longman.
Alamleh, H., AlQahtani, A.A.S. & ElSaid, A. (2023). Distinguishing Human-Written and ChatGPT-Generated Text Using Machine Learning. Retrieved from
ResearchGate on 12.2.25.
Brewer, J., & Hunter, A. (1989). Multimethod Research: A Synthesis of Styles. SAGE Publications
Cohen, L., & Manion, L. (1994). Research Methods in Education (4th ed.). Routledge.
Flick, U. (2014). The Sage handbook of qualitative data analysis. Sage Publications Ltd.
Goom, H. (2023). AI-Generated vs. Human-Written Text: Technical Analysis. HackerNoon. Retrieved from: https://hackernoon.com/ai-generated-vs-human-written-text-technical-analysis on 10.1.25.
Ijibadejo O., W. & Altamimi, M. (2023). Large Language Model for Creative Writing and Article Generation. Proceedings of the Second International Conference on Scientific and Innovative Studies. Retrieved from: Large Language Model for Creative Writing and Article Generation on 15.1.25.
Kaplan, J. (2024). Generative Artificial Intelligence: What Everyone Needs to Know. UK: Oxford University Press.
Kahlke, М. R. (2014). Generic qualitative approaches: pitfalls and benefits of methodological mixology. International Journal of Qualitative Methods, 13(1), 37–52.
Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1), 50–60. DOI: https://doi.org/10.1214/aoms/1177730491.
Maxwell, J. (2005). Qualitative research: An interactive design, 2nd ed. Thousand Oaks, CA: Sage.
Mendenhall, T. C. (1887). The characteristic curves of composition. Terre Haute IND.
Muñoz-Ortiz, A., Gómez-Rodríguez, C., & Vilares, D. (2023). Contrasting Linguistic Patterns in Human and LLM-Generated Text. arXiv preprint arXiv:2308.09067. Retrieved from arXiv on 15.1.25.
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24(4), 492-518.
Qonitatun, Q. (2016). The Quality of Essay Writing of Indonesian EFL Learners. ASIAN TEFL Journal of Language Teaching and Applied Linguistics, 1(1).
Radford, A. (2009). An Introduction to English Sentence Structure. UK: Cambridge University Press.
Saragih, W., & Hutajulu, C. (2020). Types of Sentences Used by Male and Female Writers in Journal Article Abstracts. LingLit Journal Scientific Journal for Linguistics and Literature. DOI: https://doi.org/10.33258/linglit.v1i1.345 .
Shah, A., Ranka, P., Dedhia, U., Prasad, S., Muni, S. & Bhowmick, K. (2023). Detecting and Unmasking AI-Generated Texts through Explainable Artificial Intelligence using Stylistic Features. International Journal of Advanced Computer Science and Applications, Vol. 14, No. 10,. Retrieved from: ResearchGate on 10.1.25.
Shalevska, E. (2024). Hedges and Boosters in Human and AI writing. Knowledge-International Journal 65(5).
Sinclair, M. J. (1991). Corpus, concordance, collocation. UK: Oxford University Press.
Singh, A., Rangari, A., Waghela, H., Kumar, R., Ghoshal, R., Pandey, R. & Rakshit, S. (2023). Generative AI-Based Text Generation Methods Using Pre-trained GPT-2 Model. Retrieved from: Generative AI-Based Text Generation on 12.1.25.
Under the guidance of Svartvik, J. (Ed.). (1992). Directions in Corpus Linguistics. Proceedings of Nobel Symposium 82 Stockholm, 4–8 August 1991. Berlin: Mouton de Gruyter.
Warschauer, M., Tseng, W., Yim, S., Webster, T., Jacob, S., Du, Q., & Tate, T. (2023). The Affordances and Contradictions of AI-Generated Text for Second Language Writers. Journal of Second Language Writing, 62, DOI: https://doi.org/10.1016/j.jslw.2023.101071.
Woo, D. J., Susanto, H., Yeung, H. C. & Guo, K. (2023). Exploring AI-Generated Text in Student Writing: How Does AI Help? Language Learning & Technology , 28( 2). Retrieved from:
ResearchGate on 15.1.25.
Yule, G. U. (1939). On sentence length as a statistical characteristic of style in prose: With application to two cases of disputed authorship. Biometrika 30(3-4), 363–390 Retrieved from: http://biomet.oxfordjournals.org/content/30/3-4/363.short on 12.2.25.
Zhou J, Zhang Y, Luo Q, Parker A. G. & De Choudhury, M. (2023). Synthetic lies: understanding AI-generated misinformation and evaluating algorithmic and human solutions. In: Proceedings of the 2023 CHI conference on human factors in computing systems, 1–20.
Published
2025-06-30
How to Cite
Shalevska, E. (2025). SENTENCE STRUCTURE IN HUMAN AND AI-GENERATED TEXTS: A COMPARATIVE STUDY. PALIMPSEST / ПАЛИМПСЕСТ, 10(19), 15-24. Retrieved from https://js.ugd.edu.mk/index.php/PAL/article/view/7441
Section
ЈАЗИК / LANGUAGE