A Profile of Arabic Script Languages Bushra Zawaydeh, Ph.D., Senior Linguist June 7, 2007
Proprietary Information of Basis Technology Corp.
History of the Arabic Script Derived from the Nabataean script, which was used in Petra in the 2nd century BC.
2
History.. The Nabataean script is an offshoot from the Aramaic script. The Aramaic script developed from the Phoenician script. The Phoenician script was a model for the Greeks to develop the Greek writing system (around 1000 B.C.), from which English, and all Western alphabets were based on.
3
4
Development of Phoenician Script
5
Development of Arabic Script Arabic inscriptions became widely available after the birth of Islam. The Quran descended upon the prophet Mohammad in the year A.D. 612 (Khan, 2001)
6
History Before the descension of the Quran, Arabic was primarily an oral language. Arabic is considered a holy language because it is the language of the Quran. Hence it is the primary prayer language for Muslims. Arabic spread through the spread of Islam. By the 11th century, Arabic became the common medium of expression from China to France.
7
Types of Arabic Calligraphy: Kufic The earliest manuscripts of the Quran (8th – 10th century) were written in the Kufic style of Arabic writing (Campbell, 1997). Kufic script is angular, which was most likely a product of inscribing on hard surfaces such as wood or stone.
8
9
10
Types of Arabic Calligraphy: Naskhi Since the 11th century, the cursive style that is known as Naskhi was developed.
11
Arabic Abjad There are different writing systems that languages use, such as: Alphabet – denotes both consonants and vowels. Ex: English. Abjad – denotes consonants. Ex: Arabic, Hebrew. Syllabary - characters denote syllables. Ex: Japanese Hiragana
12
Spread of Arabic The Muslim Arab civilization flourished in the Arabian Peninsula, and was embraced by the Turks, Iranians, Afghans, Indians, North Africans, Spanish Andalusians. Arabic became the language of art, science, and technology. Islamic Calligraphy became a noble art, that was appreciated more than any other form of art.
13
Samples of Arabic Calligraphy
14
15
Cursive Arabic Calligraphy
16
Features of the Arabic Script
The Arabic alphabet contains 28 letters.
complex text language, because it has bidirectional script. It is written right to left, except for numbers and Latin words are written left to right.
Many letters change their form depending on whether they appear alone, at the beginning, middle or end of the word.
Letters that change form, are always joined in both hand-written and printed Arabic. Hence, it is cursive, as in the English hand writing.
Only 3 long vowels are written.
Diacritics indicate things like short vowels and gemination. 17
Arabic Abjad
18
Arabic Letters in Different Positions
19
Letters in Different Positions
20
Arabic Diacritics
21
More Features of the Arabic Script Lack of capital letters. Lack of word division word finally. Unlike many other alphabetic scripts, it denotes a high phonetic accuracy, when diacritics are added.
22
Arabic Ligatures Arabic script uses ligatures. A compulsory one is the lam followed by an aleph:
23
Ligatures Optional/ stylistic
24
Arabic Language Arabic is a Semitic language. 221 million speakers. Countries it is spoken in: Afghanistan, Algeria, Bahrain, Chad, Cyprus, Djibouti, Egypt, Eritrea, Iran, Iraq, Israel, Jordan, Kenya, Kuwait, Lebanon, Libya, Mali, Mauritania, Morocco, Niger, Oman, Palestinian West Bank & Gaza, Qatar, Saudi Arabia, Somalia, Sudan, Syria, Tajikistan, Tanzania, Tunisia, Turkey, UAE, Uzbekistan and Yemen.
25
Worldwide use of the Arabic Abjad
•Dark green → Countries where the Arabic script is the only official orthography. •Light green → Countries where the Arabic script is used alongside other orthographies.
26
Arabic Abjad Usage in Other Languages Arabic Abjad is used in a large number of languages other than Arabic. Abjad spread in the world through the Islamic conquests (7-8th century). It is the second most widespread script in the world.
27
Writing Systems of the World Today
28
Languages Using the Arabic Script Presently 1. Arabic 2. Persian/ Dari 3. Urdu 4. Pashto 5. Baluchi 6. Kurdish 7. Lahnda 8. Kashmiri 9. Sindhi 10. Uyghur
11. Berber languages. 12. Moplah (dialect of Malayalam) 13. Malagasy 14. Sulu
29
Languages that Abandoned the Arabic Script Languages now using Roman
Indonesian (Malay) Hausa Somali Sudanese Swahili Turkish
Caucasian languages now using Cyrillic
Chechen Kabardian Lak Avar Lezgi
30
Adoption of the Arabic Script When the Arabic Abjad was adopted, it was augmented to fit the phonologies of the non-semitic languages. The alphabet was extended by the different languages. The 28 basic Arabic letters were extended to more than 100 letters (Esfahbod, 2004).
31
Method of Adoption All the Arabic letters are borrowed directly to preserve the Arabic orthography. When borrowing Arabic loanwords, the pronunciation would depend on the phonology of the borrowing language. Arabic specific sounds that are not present in the borrowing language, would be pronounced as a sound that is present in that language. Ex: the Arabic gutturals and interdentals.
32
Arabic Gutturals Sounds produced with a constriction in the back part of the vocal tract (Zawaydeh, 1999) Emphatics (T, D, S, Z) Uvulars (q, X)
ظ، ص، ض،ط
خ،ق
Pharyngeals (H, Eiyn)
ع،ح
Laryngeals (glottal stop, h)
ء،ﻩ
33
Rendition of Arabic Gutturals and Interdentals The Arabic emphatics are not pronounced as uvularized, but rather as plain, non-uvularized sounds. Persian: Pharyngeal عsound is pronounced as a glottal stop. Pharyngeal حsound is pronounced as a [h]. Persian phonetic redundancies: Persian /s/ is rendered as ص
، س،ث Persian /z/ is rendered as ز، ذ، ض،ظ
34
Nastaliq Script A writing style which is used, with extra letters, to write:
Farsi Urdu Pashto Kashmiri Sindhi Turkish - (Under the Ottoman Empire before 1920).
35
Nastaliq Samples
36
Persian Locally called: Farsi in Iran. Dari in Afghanistan Tajiki in Central Asia (former Soviet Union countries)
Dialects: Lari (in Iran) Hazaragi (in Afghanistan), Darwazi (In Afghanistan and Tajikistan)
37
Persian Language Map
38
Status of Languages in Iran Main languages:
Persian and its dialects 58% Azeri and other Turkic languages 26% Kurdish 9% Balochi 1% Arabic 1%
Official language is Persian. Ethnologue reports 71 languages!
39
40
Strategies for Modifying Arabic Script: Persian Basic Strategy: Add more dots to certain letters to create new letters. Persian added 4 more letters.
پwhile Arabic /b/ is ب. Persian /ʒ / is: ( ژwhile جis /ʤ/) Persian /ʧ/ is: چ Persian /g/ is: – گthis originally had three dots.
Persian /p/ is:
41
Persian/ Dari Alphabet 32 letters. Red is the Persian additional letters.
42
Persian vs. Arabic Æ used for Izafet compounds. Persian Kaf and Ya
43
Other Persian Orthographic Modifications إÆا ةÆ ﻩor ت Arabic words with hamza, may be spelled in various ways, example: ﻣﺴﺆولis spelled as ﻣﺴﺌﻮل. Damma is pronounced as an [o] not an [u] as in Arabic.
44
Languages Extending the Persian Alphabet Some languages used the Persian alphabet as a base, which in turn is based on Arabic, and added more letters that are not in Persian or Arabic. Examples: Urdu Pashto Sindhi
45
Status of Languages in Pakistan Major languages in Pakistan are: Punjabi, Saraiki, Sindhi, Pashto, Urdu, Balochi, Hindko, and Brahui.
Official language is English. National Language is Urdu. Language Distribution
Punjabi 44% Pashto 15% Sindhi 14% Siraiki 11% Urdu 8% Balochi 4% others 4% 46
Languages in Pakistan
47
Status of Languages in Pakistan Urdu and Sindhi have standardized spellings. If a speaker from the other languages needs to write their language, they would use either Urdu or Sindhi. In Pakistan, the classical spelling standard of Pashto is not always followed. There is a tendency to use the Urdu forms of letters instead of the Pashto forms (UCLA Language Materials Project).
48
Urdu Alphabet Red is Persian Letters. Blue is the Urdu letters
49
50
Urdu Alphabet Uses the emphatic طabove the letter to mark sounds that are retroflex, which are the “d, t, and r”. Uses the shape of the Arabic nun نwithout the dot, to indicate nasalized vowels: ﻣﺎںmãː “Arab” For aspirated consonants, follows the letter. Urdu [h] appears in the following forms: Distinguishes between [i] and [e, ɛ] sounds word finally:
ﻟﮍﮐﯽlaɽkiː “girl”. ﻟﮍﮐﮯlaɽke “boys”.
51
Status of Languages in Afghanistan Official languages are Pashto and Dari (Afghan Persian). Turkic languages (Uzbek and Turkmen). Other languages: Baluchi, Pahsai, Nirisani, etc.
52
Pashto Uses a modified form of the Perso-Arabic script. Improvised the Perso-Arabic script by adding letters that don’t appear in any other script. Used 4 Persian letters. Added 8 more letters: 4 Retroflex consonants /t/, /d/, /r/, /n/. Written with “pandak”, “gharwandah”, or “skarraen”: ټ ډ ړand ڼ
ښږ dental affricates /dz/ ځand /ts/ څ
Letters “ge” and “xin”:
[g] is written either in the Persian style or as:
ګ
ابپتټثجځچڅحخدډذرړزژږسشښصضطظعغفقﮎګلمنڼﻩ ۀوؤىئيېۍ
53
54
Pashto Zwarakay Pashto has a 4th vowel diacritic, which looks like a horizontal line.
55
Pashto diacritics
56
Arabic Numbers The decimal numbering system originated in India. It got adapted by the Arabic world. The Europeans adopted the Arabic numbers.
57
Arabic Numbers The number 4, 5, 6, 7 have various forms in the languages of Iran, Pakistan, and India.
58
Basis Technology Products Handling Arabic Script Arabic Base Linguistics Arabic Chatroom Reverse Transliterator Entity Extractor Name Matching Name Translation Arabic Editor Transliteration Assistant Digital Forensics Language Identification
Persian
Base Linguistics Entity Extractor Transliteration Assistant Name Matching Name Translation Digital Forensics Language Identification
Urdu
Base Linguistics Entity Extractor Name Matching Name Translation Language Identification
Pashto
Transliteration Assistant Name Matching Name Translation Language Identification
59
References
Afghan Transitional Islamic Administration. Ministry of Communications. United Nations Development Program. Computer Local Requirements for Afghanistan. Bhurghi, Abdul-Majid. Enabling Pakistani Languages through Unicode. (Written for Microsoft). Campbell, George. 1997. Handbook of Scripts and Alphabets. New York: Routledge. Eid, Mushira, et. Al. 2006. Encyclopedia of Arabic Language and Linguistics. Volume I. Ishida, Richard. 2004. Urdu script notes [Draft]. http://people.w3.org/rishida/scripts/urdu/urdu-in-unicode.html. Kew, Jonathan. 2005. Notes on some Unicode Arabic characters: recommendations for usage. Draft 2. Khan, Gabriel Mandel. 2001. Arabic Script. New York: Abbeville Press. Milo, Thomas. 2002. Authentic Arabic: A case Study. 20th International Unicode Conference. Washington, DC. Salloum, Habeeb. The Odyssey of the Arabic Language and its Script. http://www.alhewar.com/habeeb_salloum_arabic_language.htm UZT 1.01 & Unicode Mapping for Urdu. Center for Research in Urdu Language Processing. National University of Computer and Emerging Sciences. Unicode Standard 4.0. Zawaydeh, 1999. The Phonetics and Phonology of Gutturals in Arabic. Ph.D. Dissertation. Indiana University.
60
61