Frequency Distribution of Letters, Bigrams and Trigrams in the Macedonian language

  • Aleksandra Mileva
  • Stojanče Panov
  • Vesna Dimitrova


Frequency analysis in cryptanalysis is based on the fact that, in any given piece of written text, certain letters and combinations of two or three letters occur with varying frequencies. In this paper we present average frequency distribution of letters, bigrams and trigrams in the Macedonian language. Letter frequency of the most common first letter and last letter in words is also given. Our results are based on approximately 15000 pages of written text from the following subjects: poetry, prose, drama, natural sciences, social sciences, law, different laws, economy, and computer science. Obtained letter frequency sequence is “А О И Е Т Н Р С В Д К Л П М У З Ј Г Б Ч Ш Ц Ж Њ Ф Ќ Х Ѓ Џ Љ Ѕ”, the most common letter pairs are “НА АТ ТА НИ ТЕ РА ОТ СТ ТО КО” and the most common trigrams are “ИТЕ АТА УВА ИЈА АЊЕ СТА ОСТ ВАЊ ПРО ПРЕ”.


