IS THE MULTILINGUAL CORPUS INTERCORP A SUITABLE SOURCE OF MATERIAL FOR CONTRASTIVE ANALYSES OF PHRASEME CONSTRUCTIONS?
DOI:
https://doi.org/10.46763/PALIM25101955pAbstract
The PhraConRep (COST) project is creating a repository of German phraseme constructions (sentential or non-sentential structures with free slots) and their equivalents in 14 project languages (Albanian, Bosnian, Bulgarian, Croatian, Macedonian, Polish, Russian, Serbian, Slovak, Slovenian, Czech, Ukrainian, Hungarian and Belarusian), in which authentic evidence of these structures will be collected and described in detail. This article aims to determine whether the multilingual parallel corpus InterCorp (with Czech as „pivot “language) can serve as a suitable and sufficient source for all, or at least some, of the project languages. The analysis focuses on the size of the subcorpora for each language with German as the source language, the availability of necessary meta-information about the texts, and the occurrence of three phrasal constructions within these subcorpora. As expected, satisfactory results are obtained for the German-Czech corpus, however, InterCorp can also be very useful for Polish, Ukrainian and Croatian. For the remaining languages, the material can be extracted from so-called collections, i.e. from multilingual, automatically processed corpora which, however, lack complete meta-information (unknown source language).
Keywords: Phraseme constructions; anchors; slots; parallel corpora; InterCorp; contrastive analyses.