Krishnaraj L and V Vasanthi
This study investigates the extent to which large language models (LLMs), specifically GPT-4 and its contemporaries, exhibit syntactic competence by evaluating their understanding of island constraints and movement dependencies phenomena that are foundational in generative grammar. Despite the impressive surface fluency of these models, their grasp of underlying grammatical rules remains underexplored. This paper presents a controlled experiment involving a set of syntactic minimal pairs based on classic island phenomena such as wh-islands, adjunct islands, and complex NP constraints.
A series of acceptability judgment and sentence completion tasks using both grammatical and ungrammatical constructions have been designed for this purpose. These are input into LLMs and the results are quantitatively compared with native speaker judgments collected via human surveys. The models' responses through preference metrics, perplexity scores, and error distribution are analyzed. Findings of the paper reveal significant mismatches between human and model judgments, particularly in contexts requiring sensitivity to hierarchical syntactic structure. While models show some probabilistic sensitivity to surface-level cues, they fail to robustly generalize abstract syntactic constraints like Subjacency and Relativized Minimality. The study contributes to ongoing discussions about the limits of statistical learning in language modeling and calls for a renewed integration of formal syntactic theory in NLP model evaluation.
Pages: 153-158 | 985 Views 499 Downloads