Pashto/Pashto

From the LDC Language Resource Wiki

Revision as of 17:21, 28 June 2011 by Mamandel (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Home > Pashto


پښتو


PASHTO


Contents

General

Pashto belongs to the Iranian subfamily of Indo-European. Its vocabulary includes many loans, chiefly from its Persian and Indo-Aryan neighbors, and from Arabic, especially via Islam.

Dialects

ISO 639-3 treats Pashto as a macrolanguage with three varieties (Central, Northern, and Southern) (below). Ethnologue treats Waneci as a fourth, and others (e.g., MacKenzie and UCLA) analyze the dialectology still differently.

Language summary

(Information based on Ethnologue and ISO 639, 2009-08-06)

  • ISO 639-3 code: pus (macrolanguage)
  • Population: 20,304,734
  • Alternate names: Pushto
  • Dialects: Central Pashto [pst], Northern Pashto [pbu], Southern Pashto [pbt]
  • Classification: Indo-European, Indo-Iranian, Iranian, Eastern, Southeastern

Central Pashto

Information based on Ethnologue, 2009-08-06

  • ISO 639-3 code: pst
  • Spoken in: Southern Pakistan (Wazirstan, Bannu, Karak, southern ethnic group territories and adjacent areas)
  • Population: 7,920,000.
  • Alternate names: Mahsudi
  • Dialects: Waciri (Waziri), Bannuchi (Bannochi, Bannu).
  • Script: Arabic.

Northern Pashto

Information based on Ethnologue, 2009-08-06

  • ISO 639-3 code: pbu
  • Spoken in: Pakistan (Afghanistan border, most of NWFP, Yusufzai, and Peshawar), Afghanistan (Central Ghilzai area), United Arab Emirates
  • Population: 9,720,700
  • Alternate names:
    • Pakistan: Pakhto, Pashtu, Pushto, Yusufzai Pashto
    • Afghanistan: Afghan, Pakhtoo, Pakhtu, Paktu. Called ‘Pakhtoon’ in the north, ‘Pashtoon’ in the south.
    • United Arab Emirates: Pakhtoo, Pashtu, Passtoo, Pushto, Pusto
  • Dialects:
    • Pakistan: Ningraharian Pashto, Northeastern Pashto.
    • Afghanistan: Northwestern Pakhto, Ghilzai, Durani.
  • Script: Arabic.

Southern Pashto

Information based on Ethnologue, 2009-08-06

  • ISO 639-3 code: pbt
  • Spoken in: Pakistan (Balochistan, Quetta area), Afghanistan (Kandahar area), Iran (Khorasan on Afghanistan border east of Qa’en), Tajikistan, United Arab Emirates
  • Population: 2,680,100.
  • Alternate names:
    • Pakistan: Pashtu, Pushto, Pushtu, Quetta-Kandahar Pashto
    • Iran: Afghani, Paktu, Pashtu
    • UAE: Afghan, Pakhtoo, Pakhtu, Paktu
  • Dialects:
    • Pakistan: Southeastern Pashto, Quetta Pashto
    • Afghanistan: Southwestern Pashto, Kandahar Pashto (Qandahar Pashto)
  • Script: Arabic.

Linguistic notes

Writing

Pashto is written with a Perso-Arabic script, adapted from Persian script, which in turn was adapted from Arabic. There is a classical standard, but there have been divergences, in different directions, in Pakistan and Afghanistan. Pakistan has instituted a number of orthographic innovations since officializing Pashto, while Pakistani writing shows occasional influence from Urdu, as well as sometimes representing the "hard" dialect forms phonetically instead of phonologically. In addition, the educational level and varying dialect background of writers inevitably introduces further variation in texts.

Like other scripts of Semitic origin, in most contexts Pashto indicates consonants only, except in special-purpose texts such as educational materials. As a result, the script is phonologically underspecified, and it is not in general possible to infer pronunciation from spelling.

See below for a list of Unicode Perso-Arabic characters that are probably unique to Pashto.

  • Omniglot
  • Pashto Alphabets [i.e., letters] in Detail. Perso-Arabic letters in all four positional forms, with Unicode name and code point, Pashto name with Roman transliteration, and languages using (Arabic, Pashto, Farsi).
    Note: Not complete with respect to Unicode 5.1: lists only code points named "ARABIC LETTER ...", and not all of those. Has at least one typo ("U+0623 Arabic Letter Zain" [should be U+0632]).

Linguistic resources

Overview

Linguistic portals and bibliographies

Grammar

  • Chavarría-Aguilar, O.L. 1962. Pashto Basic Course. University of Michigan. [Afghanistan. Transcription only.] ERIC #ED014717. Prepared under Contract No. SAE-8888 between The University Of Michigan and the United States Office of Education.
  • Lorenz, Manfred. 1979, 1982. Lehrbuch des Pashto (Afghanisch). [Afghanistan] VEB Verlag Enzyklopädie Leipzig.
  • Penzl, Herbert. 1955. A grammar of Pashto; a descriptive study of the dialect of Kandahar, Afghanistan. [Afghanistan. Transcription only.] Washington, American Council of Learned Societies.
  • Roos-Keppel, George Olof, and Qazi Abdul Ghani Khan. 1901. A manual of Pushtu. London: Sampson Low, Marston.
  • Shafeev, D.A. 1964. A Short Grammatical Outline of Pashto. Translated and edited by Herbert H. Paper. [Afghanistan. Transcription only.] Bloomington: Indiana University; The Hague: Mouton.
  • Tegey, Habibullah, and Robson, Barbara. 1996. A Reference Grammar of Pashto. [Afghanistan] Washington, DC: Center for Applied Linguistics. ERIC #ED399825. Developed with funding from Grant No. P017A50047-95 from the International Research and Studies Program of the US Department Of Education.

Lexicon

  • Morgenstierne, Georg. 2003. A new etymological vocabulary of Pashto; compiled and edited by J. Elfenbein, D.N. MacKenzie and Nicholas Sims-Williams. Transliteration only. Wiesbaden: Reichert.
  • Qamosona English-Pashto Dictionary. Ver. 1.0. 2005. [Afghanistan]. Based on the English to Pashto Dictionary, by Pashto Academy, Kabul, Afghanistan. About 22,000 words. Free download. Requirements: Windows XP, 2000, or Windows ME Arabic edition.
  • Penzl, Herbert. [Afghanistan] Online Version 1.0, released November 1998. This dictionary contains all of the words from the glossary of Herbert Penzl's A grammar of Pashto: A descriptive study of the dialect of Kandahar, Afghanistan (Washington, DC: American Council of Learned Societies, 1955), pp. 154-165, which is available from Schoenhof's Foreign Books." Transliteration only. "An on-line key to the orthography is not yet available. In the meantime, please download the Access database version of this file, or consult Penzl (1955).
  • Tegey, Habibullah, and Robson, Barbara. 1993. Pashto-English Glossary for the CAL Pashto Materials. [Afghanistan] Washington, DC: Center for Applied Linguistics. Contract P017A90055. ERIC #ED364083. Pashto script and transliteration. PDF, imaged text. [Note on pagination: Original page 87 (PDF p.96) is followed by original pp.97, 88-96, 98, and the rest in sequence.]
  • Wiktionary. Unicode. Monolingual. 613 entries (CC-BY-SA),(GFDL) [Mamandel 16:31, 3 May 2010 (UTC)]

Topical word lists

  • Babynology: List of Pashto names in Roman transliteration

Monographs

  • Ijaz, Madiha. Phonemic Inventory of Pashto. 2003. [Pakistan: Yusufzai dialect, in and around Peshawar. Transcription only.] Annual Student Report 2002-2003, Center for Research in Urdu Language Processing. The PDF apparently does not include fonts; many of the transcription characters are missing.

Educational software

  • Kodakan. Educational software in Pashto and Dari.

Encoding and Fonts

The Unicode range for Arabic script is 0600-06FF. See also Writing.

Input

Data Sources

Most of the text available is evidently monolingual. Parallel text is noted for some entries.

Magazines

  • FARDA. (Description from Library of Congress) Published bimonthly by the Afghans’ Pen Club in Stockholm. A critical, social, and cultural magazine committed to democratic ideals. Links to articles, contributors, activities, and more. Articles in Swedish, Pashto, and Dari; general information in English.

News

  • Azadi Radio. Pashto service of Radio Free Europe / Radio Liberty
  • BBC.
  • Benawa. Noted as "very close to Yusufzai Pashto" (June 2006).
  • CRI (China Radio International).
  • Deutsche Welle.
  • Killid news portal.
  • RTA: National Radio and Television of Afghanistan.
  • Sabawoon Online. [Afghanistan]
  • Voice of America.
  • Wahdat. Islamic Unity Party of Afghanistan. Dated by Persian calendar. [Our 2006 downloads contain an expected proportion of Pashto-specific characters, but fresh downloads as of 2009-08-11 have no such, even for archived articles from the same epoch. This may be an artifact of text conversion.]

Literature

  • Dastanona. Bimonthly magazine of Pashto fiction submitted by Afghan authors worldwide. Published in Kabul. Also has index pages and content in English, German, French, Russian, and Dari; some is parallel.
  • D'Zra Dardoona. "Afflictions of the Heart: Poetic collections of Aminullah Zmaryalai". (imaged book pages)
  • Khyber.org. Proverbs, short stories (some in English), jokes, poetry.
  • Landay. Traditional couplets, with English translation. (imaged text) (The "table of contents" lists the categories in Pashto, transcription, and English translation, but the links to the text pages are in the corresponding frame on the right, labeled only in Pashto.)

Miscellaneous

Speech

See also News.

Portals

Tools and Other NLP Resources

Identifying Pashto

The following Perso-Arabic characters are probably unique to Pashto:

Code Glyph Unicode Name
U+0659 ٙ ARABIC ZWARAKAY
U+067C ټ ARABIC LETTER TEH WITH RING
U+0685 څ ARABIC LETTER HAH WITH THREE DOTS ABOVE
U+0689 ډ ARABIC LETTER DAL WITH RING
U+0693 ړ ARABIC LETTER REH WITH RING
U+0696 ږ ARABIC LETTER REH WITH DOT BELOW AND DOT ABOVE
U+069A ښ ARABIC LETTER SEEN WITH DOT BELOW AND DOT ABOVE
U+06AB ګ ARABIC LETTER KAF WITH RING
U+06BC ڼ ARABIC LETTER NOON WITH RING
Personal tools