Ukraine creates National Corpus of the Crimean Tatar language

Ukraine's Reintegration Ministry has announced the creation of the National Corpus of the Crimean Tatar language, a database of texts in Crimean Tatar for the purpose of language research, the ministry announced on its website.

Crimean Tatars are indigenous to the Crimean Peninsula, located in Ukraine's south and occupied by Russia since it was annexed by Russian troops in 2014. Following annexation, the Kremlin began a targeted campaign against Tatars, who have been outspoken against the Russian occupation regime. Joseph Stalin also forcibly deported hundreds of thousands of Tatars in 1944, many of whom died during the process.

According to the ministry, the process of collecting print and digital sources in Crimean Tatar for the database has been ongoing for four months, and 675 works by more than 180 authors have already been included in the catalog.

Among the sources in the database are works by well-known authors and texts from newspapers, magazines, textbooks, scientific articles, and international legal documents.

The oldest work dates back to the 13th century, with the most modern from the 21st century. The catalog contains materials in Crimean Tatar written in Arabic, pre-war Latin, Cyrillic, and modern Latin, the ministry wrote.

Collecting the texts is part of the ministry's 2022-2023 Strategy for the Development of the Crimean Tatar language. The project is being implemented along with the Swiss-Ukrainian EGAP program by the East Europe Foundation and the National University of Kyiv.