From the LDC Language Resource Wiki

(Difference between revisions)
Jump to: navigation, search
(New page: <center><font size=7> ਪੰਜਾਬੀ </font> <font size=7>PANJABI</font> <font size=6>(Eastern Panjabi, Gurmukhi)</font> </center> Panjabi ==General== Thi...)
Line 80: Line 80:
See [[Lwiki:ISCII]].
The Bureau of Indian Standards supports its own encoding standard. See [[ISCII]].

Revision as of 20:20, 23 June 2009



(Eastern Panjabi, Gurmukhi)



This document pertains primarily to Eastern Panjabi (Gurmukhi). There is some material on Western Panjabi as well.


Eastern Panjabi

(Information from Ethnologue, 2009-05-13)

  • ISO 639-3 code: pan
  • Spoken in: India: Punjab, Majhi in Gurdaspur and Amritsar districts, Bhatyiana in South Firozpur District; Rajasthan, Bhatyiana in north Ganganagar District; Haryana; Delhi; Jammu and Kashmir. Also spoken in Bangladesh and diaspora.
  • Population: 27,109,000 in India
  • Alternate names: Punjabi, Gurmukhi, Gurumukhi
  • Dialects: Panjabi Proper, Majhi, Doab, Bhatyiana (Bhatneri, Bhatti), Powadhi, Malwa, Bathi. Western Panjabi is distinct from Eastern Panjabi, although there is a chain of dialects to Western Hindi (Urdu).
  • Classification: Indo-European, Indo-Iranian, Indo-Aryan, Central zone, Panjabi
  • Script: Gur(u)mukhi and Devanagari

Western Panjabi

Information from Ethnologue, 2009-05-13

  • ISO 639-3 code: pnb
  • Spoken in: Mainly in the Punjab area of Pakistan.
  • Population: 60,647,207 in Pakistan (2000 WCD).
  • Alternate names: Western Punjabi, Lahnda, Lahanda, Lahndi
  • Dialects: There is a continuum of varieties between Eastern and Western Panjabi, and with Western Hindi and Urdu. 'Lahnda' is a name given earlier for Western Panjabi; an attempt to cover the dialect continuum between Hindko, Pahari-Potwari, and Western Panjabi in the north and Sindhi in the south.
  • Classification: Indo-European, Indo-Iranian, Indo-Aryan, Northwestern zone, Lahnda
  • Script: Perso-Arabic

Linguistic notes


Eastern Panjabi is usually written with the Brahmi-derived Gurmukhi script, and sometimes, especially by Hindus, with Devanagari. Western Panjabi is usually written in Shahmukhi, a variant of the Arabic writing system very similar to the writing system of Urdu.

Linguistic resources



  • Digital Dictionaries of South Asia, U. of Chicago. "Singh, Maya. The Panjabi dictionary. Lahore, Munshi Gulab Singh & Sons, 1895. This title is currently being entered by a data entry contractor. The dictionary will be functional on this site by January 2009." [Accessed 2009-05-18]
  • Punjabonline English <-> Punjabi Dictionary]. On-line; size unknown. Can toggle English or Punjabi data entry. Gurbani 8-bit encoding.
  • English to Punjabi Dictionary. On-line; medium size? Gurbani 8-bit encoding.
  • Wiktionary (Panjabi). Monolingual. Gurmukhi script, Unicode.


These sites do not distinguish names by sex.

  • 5abi: 8-bit Gurbani encoding.
  • Babynology: List of Panjabi baby names in Roman transliteration. (Each name appears twice, once for each sex.)
  • Sikh Names. Transliteration, with meanings.
  • Sushmajee: About 1000 names. Transliteration.


Linguistic portals and bibliographies

Encoding and Fonts

Before the development and general use of Unicode, computer use of Panjabi and other South Asian languages required special fonts using only one byte. Many of these fonts were specific to one website or another and used idiosyncratic encodings. To some extent that is still the case; and so this page includes some such sites (see News), and some resources for specific fonts and encoding converters.



The Unicode range for Gurmukhi is 0A00-0A7F.


The Bureau of Indian Standards supports its own encoding standard. See ISCII.


An 8-bit encoding used by a number of sites.



  • GUCA: Gurmukhi Unicode Conversion Application. GNU GPL. Requires Microsoft .NET Framework. Converts ASCII encoded, font-based Gurmukhi text based on Dr. Thind's fonts (e.g. AnmolLipi, GurbaniLipi fonts) into Unicode. Also includes a custom mapping engine to add encodings. -- Although the site for "Dr. Thind's fonts" now uses Unicode, many other sites still use these 8-bit encodings. See SikhNet, above.
  • Unicodify: From Lancaster University, producers of the Emille corpus. For Windows; source code available.


  • Indian Language Converter. Type in Roman characters according to the Gurmukhi character chart on the page and get Gurmukhi text and HTML. On-web or download with GNU GPL. E.g.:
    Roman input: guramukhee
    Gurmukhi output: ਗੁਰਮੁਖੀ
    HTML output: &#2583;&#2625;&#2608;&#2606;&#2625;&#2582;&#2624;<br/>

Data Sources

Monolingual Text

  • EMILLE corpus. Free license for non-profit research use.



Parallel Text

  • EMILLE corpus. 200,000 words of text in English (information leaflets from the UK Government and various local authorities) with Eastern Panjabi translation. Free license for non-profit research use.

COPIED FROM Lwiki:LCTL Panjabi harvest#* Bilingual Text, STILL EDITING

  • GUCA. belong here too? Panjabi computing resource that is also parallel English and Panjabi
  • Law Society. These titles look familiar, we should check if they are already included in EMILLE corpus
  • Bible. links for print editions.
  • Guru Granth Sahib. Sikh holy texts, word lists, concordances, interlinear translations. Panjabi (Gurmukhi, Shahmukhi, Devanagari) and English. Some files Unicode, but some Gurbani encoding.
  • Punjabi Online: Mool Mantar. Religious text. Interlinear, with Panjabi in Gurbani encoding, grouped as (Panjabi1, transliteration1, Panjabi2, English). Panjabi2, may be commentary on Panjabi1, and only the commentary translated.


  • Punjabilok: Font
    • In this site we can download the font and then we can see the translated version of this site
  • Sridasam
    • Religious text
    • 1466/2326 pages with parallel segments



still checking and editing

  • Apna Channel (Pakistan)
  • RAVi TV (US: "programmed by and for Punjabi natives residing in the United States")
  • DD Punjabi (India)
  • Balle Balle (India. "Currently not transmitting" [2009-05-15])
  • Alpha ETC Punjabi (UK): Zee: Alpha Punjabi was launched in October 1999 in India. A majority stake in ETC was acquired in 2002 which gave birth of Alpha ETC Punjabi. The channel was brought to the USA in July 2005 as the only Punjabi language channel on the platform. The channel captures the spirit of the audiences catering to their preferences. Alpha ETC Punjabi has a variety of offerings for everybody, providing a full range of quality Punjabi language entertainment for the entire family.
  • MH1 (India)
  • Punjab Today (India)
  • Vectone Punjab (UK)
  • NRI
  • WatchIndia.TV
    • ETC Channel Punjabi (India)
    • Zee Punjabi (india)

IPR notes


Tools and Other NLP Resources


Personal tools