From the LDC Language Resource Wiki

(Difference between revisions)
Jump to: navigation, search
(copy from LCTL Lwiki:Bengali)
Line 103: Line 103:
See [[Lwiki:ISCII]].
The Bureau of Indian Standards supports its own encoding standard. See [[ISCII]].

Revision as of 17:42, 23 June 2009





Language summary

(Information from Ethnologue, 2009-05-13)

  • ISO 639-3 code: ben
  • Spoken in: Western Bangladesh, India, Nepal, and diaspora
  • Population: 100,000,000 in Bangladesh (1994 UBS). 211,000,000 including second-language speakers (1999 WA). Population total all countries: 171,070,202.
  • Alternate names: Banga-Bhasa, Bangala, Bangla
  • Dialects: Languages or dialects in the Bengali group according to Grierson (1903-1928): Central (Standard) Bengali, Western Bengali (Kharia Thar, Mal Paharia, Saraki), Southwestern Bengali, Northern Bengali (Koch, Siripuria), Rajbanshi, Bahe, Eastern Bengali (East Central, including Sylhetti), Haijong, Southeastern Bengali (Chakma), Ganda, Vanga, Chittagonian (possible dialect of Southeastern Bengali).
  • Classification: Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bengali-Assamese
  • Script: Bengali

Linguistic notes

  • Morphology
    • Bengali marks plural by means of several different suffixes, the choice depending on the noun, and several cases, again by means of several different suffixes.
  • Writing System

Linguistic resources

  • Bhattacharya, Tanmoy. (2001): Bengali. In Facts About the World's Languages: An Encyclopedia of the World's Major Languages: Past and Present, ed. Jane Garry and Carl Rubino. New York / Dublin: H.W. Wilson Press. ISBN 0824209702.
  • Klaiman, M. H. (1987): Bengali. In The World's Major Languages, ed. Bernard Comrie, pp. 490-513. Oxford University Press. ISBN 978-0195065114.


  • Milne, W. S. (1993): A Practical Bengali Grammar. Laurier Books Ltd. ISBN: 8120608771 562 pp.
  • Mojumder, Atindra (1973): Bengali Language Historical Grammar. Calcutta: Firma K. L. Mukhopadhyay.
  • Smith, W. L. (1997): Bengali Reference Grammar. Stockholm: Association of International Studies. Stockholm Oriental Textbook Series 1. 197 pp.


  • Akhor: An English to Bengali "translation" tool is included in a large package downloadable from this site. It appears to be a large dictionary, so it could be fed a list of English words, and would output a list of Bengali words. Encoding is unclear.
  • Carey, William (1761-1834). A Dictionary of the Bengali Language. Laurier Books Ltd. 2160 pp. ISBN: 8120600940
  • Dev, Ashu Tosh (1961): Students' Favourite Dictionary: Bengali to English. 28th ed. Dev Sahitya Kutir. 998 p. ISBN 8173043418. Reprint. USD 7.75.
    • Also: 1961, 1291 pp. Calcutta: S.C. Mazumder.
  • Digital Dictionaries of South Asia at the University of Chicago:
  • GNU Dictionary: Monolingual dictionary, used by GNU project spell checker.
  • LTRC: Two Bengali-Hindi dictionaries, but no English glosses. ISCII encodings. There is also an English-Hindi dict, so it might be possible to cross-correlate entries. GPL license.
  • Online Bangla Obhidhan: Online lexicon, at least 6k unique entries. Akkhor encoding converter appears to work. This page gives gif of Bangla chars instead.
  • Virtual Bangladesh: Dictionary: On-line lexicon, claims 3k words. Results are displayed in a romanization and "often" in Bangla script using a Unicode font.
  • Wiktionary (Bengali). Monolingual. Unicode.

Topical word lists

  • Babynology: List of Bengali baby names in Roman transliteration



  • Bagchi, Tista (1996): "Bengali Writing," in William Bright and Peter Daniels (eds.) The World's Writing Systems. New York: Oxford University Press. pp. 399-403. ISBN 0-19-507993-0.


  • Bayer, Josef (2001): "Two grammars in one: sentential complements and complementizers in Bengali and other South-Asian languages," in Peri Bhaskararao and Karamuri Venkata Subbarao (eds.) The Yearbook of South Asian Languages and Linguistics: Tokyo Symposium on South-Asian Languages - Contact, Convergence and Typology. New Delhi: Sage Publications. PDF
  • Butt, Miriam (2001): "Case, agreement, pronoun incorporation, and pro-drop in South Asian languages," handout for talk presented at the workshop The Role of Agreement in Argument Structure, August 31-September 1, 2001, Utrecht.
  • Dirdal, Hildegunn: "The acquisition of articles by Bengali learners of English". Handout?
  • Fitzpatrick-Cole, Jennifer (1994): The Prosodic Domain Hierarchy in Reduplication. Ph.D. dissertation, Stanford.
  • Fitzpatrick-Cole, Jennifer (1996): "Reduplication meets the phonological phrase in Bengali," Linguistic Review 13.305-356.
  • Ghosh, Sanjukta and Probal Dasgupta: "The role of classifiers in quantification," Handout for talk presented at the 21st South Asian Language Analysis Roundtable, October 7-10, University of Konstanz.
  • Keane, Elinor (2001): Echo Words in Tamil. Unpublished Ph.D. dissertation, Merton College, Oxford. Contains a fairly extensive discussion of echo words in Bengali.
  • Khan, Zeeshan (1994): "Bangla Verb Classes and Alternations," in Douglas A. Jones, Robert C. Berwick, Franklin Cho, Zeeshan Khan, Karen T. Kohl, Naoyuki Nomura, Anand Radhakrishnan, Ulrich Sauerland, and Brian Ulicny, Verb Classes and Alternations in Bangla, German, English, and Korean (AI Memo # 1517). Cambridge, Massachusetts: MIT. pp. 36-50.
  • Lahiri, Aditi and Jennifer Fitzpatrick-Cole (1999): "Emphatic Clitics and Focus Intonation in Bengali," in René Kager & Wim Zonneveld (eds.) Phrasal Phonology. Pp. 119-144 Dordrecht: Foris.

Linguistic portals and bibliographies

Encoding and Fonts

Before the development and general use of Unicode, computer use of Bengali and other South Asian languages required special fonts using only one byte. Many of these fonts were specific to one website or another and used idiosyncratic encodings. To some extent that is still the case; and so this page includes some such sites (see News), and some resources for specific fonts and encoding converters. See, for example, eThikana below.

In addition, the Bureau of Indian Standards supports its own ISCII standard (below), which provides an 8-bit encoding using escape sequences to announce the language of the following coded character sequence.



The Unicode range for Bengali is 0980-09FF.


The Bureau of Indian Standards supports its own encoding standard. See ISCII.


  • Free download of fonts used by a number of Bengali websites. Grouped by encoding:
  • eThikana. All are 8-bit. Grouped by encoding:
    • Oporajita (≠ Aparajita), Sulekha, MahouaMJ. Same encoding as Sutonny.
    • AdarshaBangla
    • AdarshaLipi, Moina
    • Aparajita
    • Basundhara
    • Boishakhi
    • Ekush, Falgun
    • Progoty
    • SonarGaon
  • Omicron Lab. Unicode fonts for Bengali.
  • Penn State University. (2009.) Browser and Font Recommendations for Bengali. Also information on Setup for Keyboarding.
  • Rezaul: The site is a portal of sorts; this directory has links to fonts and encoding documentation.
  • South Asia Language Resource Center of the University of Chicago. Links to Bengali fonts (most of them available for free download), input schemes and keyboard layouts, and information about Mac vs. PC vs. Linux rendering issues.


  • Microsoft. Creating and supporting OpenType fonts for the Bengali script: Microsoft doc on Unicode 3.1 for Bengali. "Registered features of the Bengali script are defined and illustrated, encodings are listed, and templates are included for compiling Bengali layout tables for OpenType fonts. This document also presents information about the Bengali OpenType shaping engine of Uniscribe, an operating system component responsible for text layout."


  • Unicodify: From Lancaster University, producers of Emille corpus. Includes conversion for AdarshaLipi, AdarshaLipiExp, and AdarshaLipiNormal2 fonts among others. Runs on Windows (source code available).


  • ALA-LC Transliteration Table (American Library Assn. - Library of Congress)
  • Indian Language Converter: Type in Roman characters according to the Bengali character chart on the page and get Bengali text and HTML. On-web or download with GNU GPL. E.g.:
    Roman input: bMlaa
    Bengali output: বংলা
    HTML output: &#2476;&#2434;&#2482;&#2494;<br/>
  • ITRANS 5.30: Chopde, Avinash. "A package for printing text in Indian languages using English-encoded input [in a package-specific encoding system]. This page is here for historical purposes, this package is no longer under active development, nor is there any support available. All major operating systems now support Unicode, and have built-in input methods to enter Indic script letters, so there is no need for pre-processors for Indic scripts. - January 2006"
    [domain,, was formerly]

Data Sources

Monolingual Text

  • EMILLE corpus. Free license for non-profit research use. Approximately 5,520,000 words
    • from the Bengalnet news website: 1,980,000
    • from miscellaneous sources (incorporated from the CIIL Corpus): 3,540,000

News and portals

* see Poroshmoni under Conversion, and Sutonny under Fonts


Parallel Text


  • EMILLE corpus. Free license for non-profit research use. 442,000 words from radio broadcasts, plus "small amounts of demographically-sampled speech"


Tools and Other NLP Resources

Personal tools