Turkish Translation and Turkish Localization

Facts about Turkey

Turkey is a democratic, secular, unitary, constitutional republic, with an ancient cultural heritage. Turkey has become increasingly integrated with the West through membership in organizations such as the Council of Europe, NATO, OECD, OSCE and the G-20 major economies. Turkey began full membership negotiations with the European Union in 2005, having been an associate member of the European Economic Community since 1963 and having reached a customs union agreement in 1995. Turkey has also fostered close cultural, political, economic and industrial relations with the Middle East, the Turkic states of Central Asia and the African countries through membership in organizations such as the Organisation of the Islamic Conference and the Economic Cooperation Organization. Given its strategic location, large economy and army, Turkey is classified as a regional power of Europe.* Most of the economists consider Turkey to be one of the economic giants of the new era with BRIC countries.***

Facts about Turkish

Turkish is the official language of Turkey, North Cyprus, the Prizren District in Kosovo and several municipalities of the Republic of Macedonia. It is also spoken by the native speakers in the several countries of the World, including Albania, Azerbaijan, Bosnia and Herzegovina, Bulgaria, Cyprus, Greece, Moldova, Montenegro, Romania, Russia, Serbia, Syria, Turkmenistan, Uzbekistan and by immigrant communities in Austria, Belgium, Canada, France, Germany, the Netherlands, Sweden, Switzerland, United Kingdom, and the United States.

Turkish is an agglutinative language. Speakers add suffixes to stems of words in order to indicate the grammatical function of the word. These suffixes can indicate, number, gender, cases and declensions. There is, however, no grammatical gender.

The Turkish Language Association (TDK) undertook an initiative to remove Arabic and Persian loanwords from the Turkish language. They successfully removed several hundred and replaced them with older Turkish words that had fallen into disuse or created new words from Turkish roots. Approximately 14% of the words listed in the 2005 official dictionary are of foreign origin. Currently, they are attempting to introduce Turkish terminology for the information technology sector instead of using English loan words.

There are several words of Turkish origin in the English language, including baklava, Balkan, yogurt and shish kebab.

Written Language

The Turkish language adopted a modified version of the Latin alphabet after the writing reform of 1928. The alphabet includes the following additional characters: ç, ğ, ö, İ, ş and ü. Before the writing reform, the language used a modified Arabic script, called the Ottoman Turkish script. Current alphabet contains 29 uppercase and 29 lowercase letters. Punctuation is the same as those characters traditionally used with the Latin script. Therefore, the small number of characters used in Turkish can be encoded in an 8-bit encoding scheme. Q, w and x, the Latin letters that are not part of the Turkish alphabet may collate either after the letter z or in the locations according to English collation.

Turkish Language Statistics

  • Turkey has a literacy rate of about 90%.
  • There are 80 million speakers of Turkish in Turkey, with an additional 10 million worldwide.
  • Writing Systems: Latin 1 extended
  • Code Pages: ANSI – 1254, Windows – Turkish
  • Unicode Supported: Yes

Turkish Translation and Localization Challenges

Loc.Pro team has extensive experience with the in and outs of the Turkish Language and we have a long and flawless record of success with complicated Turkish translation and Turkish localization projects. Loc.PRO stands out as a leading Turkish translation company with proven success.

During the translation process, there is a substantial textual expansion when translating from English into Turkish, which should be taken into consideration during the authoring stages.

Although the Turkish language is divided into western dialects and eastern dialects, there is a high level of mutual intelligibility. Most differences are in terms of vocabulary, rather than structure and grammar. Therefore, dialect variation does not pose a major translation issue.

Why Applications Fail With The Turkish Language

Turkish Has An Important Difference

Turkish has 4 letter “I”s. English has only two, a lowercase dotted i and an uppercase dotless I. Turkish has lowercase and uppercase forms of both dotted and dotless I.

Modifying or extending the Latin alphabet does not make Turkish unusual. Many languages have done so. However, usually when characters are added, both upper and lower case versions are added. As the characters are added in pairs, properties and mappings (other than collation) of the original English characters are unaffected. Therefore dependencies on internal keywords built on English letters are unaffected.

Turkish Case Mappings and Case-Insensitivity

Turkish instead, added letters that change the relationship between two of the English letters. Instead of the original case mapping of lower dotted i to upper dotless I, Turkish maps the lower dotted i to the new upper dotted İ, and the lower dotless ı to the upper dotless I.

The change in the case rules for the letter i frequently breaks software logic. Applications that have been internationalized are conditioned to easily accept new characters, collations, case rules, encodings, and other character-based properties and relationships. However, their design often doesn’t anticipate that properties or rules of English letters will change.

How Applications Fail With Turkish Language

Many applications have an internal table of English keywords. For example, a product may have a command language and have a table to identify the commands and associate them with the procedures that implement the appropriate functions. When it is given a command, the product looks through the table for a match. For ease of use, usually a case-insensitive lookup is used.

When support for the Turkish language is added, the case rules are changed to use the Turkish mapping for the letter i. Within the application, programmers depend on case-insensitivity and may encode keywords using either case. To understand how Turkish case rules break the program logic, suppose the applications needs to lookup the keyword “quit” in all lowercase letters. If the keyword table has the keyword encoded as “QUIT”, it will not find a match. This is because the lowercase dotted i no longer has the uppercase dotless I as its uppercase equivalent. Note that this problem occurs with internal text and lookups. The user interface is generally translated and will work well with Turkish rules. But the program logic maps the translated terms to internal keywords which may or may not match the casing of internal tables.

Databases also can fail when the Turkish language is incorporated. Although the database software is designed to work with different collations and casings, there are often dependencies that the metaschema names will be in English and work with English case rules. When the rules are changed to Turkish, the database software may not be able to find schema objects with names such as “files”.


Often companies adopt a quick workaround: They use the Turkish case rules except the case of the letter i remains as in English. This fixes the logic problems, but irritates Turkish end-users since functions that depend on case now do the wrong thing with the four i letters.

Some of the internal table problems can be worked around by adding entries for keywords that use the letter i, to cover all cases. For example, quit might have two entries, one with a dotted and one with a dotless i: “quit” and “quıt”. But this utilizes more memory and can hurt performance. And keywords like “mississippi” of course would need 16 entries to cover all the variations of the four i letters in the word. This approach is also error-prone. If the table is modified, programmers may forget to add additional “i” entries.

The correct solution is to have internal program logic use separate collation and case rules that do not change when the user selects international settings. This ensures that lookups of internal tables or database schemas work consistently.

However, the solution is difficult to implement since it can require specifying the locale (which selects the collation and case rules) on most function calls, and knowing which locale to use in every instance.*