From the LDC Language Resource Wiki
[Mamandel 17:06, 7 February 2012 (UTC)]
(Information from Ethnologue, 2012-02-01)
- ISO 639-3 code: ben
- Spoken in: Western Bangladesh, India, Nepal, and diaspora
- Population: 110,000,000 in Bangladesh (2001 census). 250,000,000 including L2 speakers. Population total all countries: 181,272,900.
- Alternate names: Banga-Bhasa, Bangala, Bangla
- Dialects: Languages or dialects in the Bengali group according to Grierson (1903-1928): Central (Standard) Bengali, Western Bengali (Kharia Thar, Mal Paharia, Saraki), Southwestern Bengali, Northern Bengali (Koch, Siripuria), Rajbanshi, Bahe, Eastern Bengali (East Central, including Sylhetti), Haijong, Southeastern Bengali (Chakma), Ganda, Vanga, Chittagonian (possible dialect of Southeastern Bengali).
- Classification: Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bengali-Assamese
- Script: Bengali
- LLMAP: Bengali
Bengali marks the plural by means of several different suffixes, the choice depending on the noun. There are several cases, also marked by suffixes.
The Bengali script is an abugida, a Brahmi derivative with ligatures and with vowel markers located in various positions around the consonant.
- Bagchi, Tista (1996): "Bengali Writing," in William Bright and Peter Daniels (eds.) The World's Writing Systems. New York: Oxford University Press. pp. 399-403. ISBN 0-19-507993-0.
- Ishida, Richard. Bengali script notes. [This version 2010-08-17]
- Unicode 0980-09FF
- Unicode Standard v6.0, Chapter 9: South Asian Scripts - I [PDF]
- Wikipedia: Bengali script (CC-BY-SA),(GFDL)
- MultiTree Digital Library of Language Relationships
- UCLA Language Materials Project Language profile.
- World Atlas of Language Structures Online
- Bhattacharya, Tanmoy. (2001): Bengali. In Facts About the World's Languages: An Encyclopedia of the World's Major Languages: Past and Present, ed. Jane Garry and Carl Rubino. New York / Dublin: H.W. Wilson Press. ISBN 0824209702.
- Klaiman, M. H. (1987): Bengali. In The World's Major Languages, ed. Bernard Comrie, pp. 490-513. Oxford University Press. ISBN 978-0195065114.
- Milne, W. S. (1993): A Practical Bengali Grammar. Laurier Books Ltd. ISBN 8120608771. 562 pp.
- Mojumder, Atindra (1973): Bengali Language Historical Grammar. Calcutta: Firma K. L. Mukhopadhyay.
- Smith, W. L. (1997): Bengali Reference Grammar. Stockholm: Association of International Studies. Stockholm Oriental Textbook Series 1. ISBN 9197085472. 197 pp.
- Akkhor: An English to Bengali "translation" tool is included in a large package downloadable from this site. It appears to be a large dictionary, so it could be fed a list of English words, and would output a list of Bengali words. Encoding is unclear.
- Bengali-Dictionary.com A Bilingual Dictionary of Words & Phrases (English-Bengali). Lookup in both directions.
- Carey, William (1761-1834). A Dictionary of the Bengali Language. Laurier Books Ltd. 2160 pp. ISBN 8120600940.
- Dev, Ashu Tosh (1961): Students' Favourite Dictionary: Bengali to English. 28th ed. Dev Sahitya Kutir. 998 p. Reprint. USD 7.75.
- Also: 1961, 1291 pp. Calcutta: S.C. Mazumder.
- Digital Dictionaries of South Asia at the University of Chicago:
- GNU Dictionary: Monolingual dictionary, used by GNU project spell checker.
- Online Bangla Obhidhan: Online lexicon, at least 6k unique entries. Akkhor encoding converter appears to work. This page gives gif of Bangla chars instead.
- Wiktionary. Unicode. Monolingual. 713 entries. (CC-BY-SA),(GFDL)
Topical word lists
- Pandanus Database of Indian Plants: 146 names of Indian plants in Bengali. Also has five other Indian languages (all in romanization), English, and Latin. Most of the other languages have more terms (e.g., Hindi has 670).
- Display Bengali names. Each name links to a list of names for the plant in all the languages and to a detailed set of descriptions from a number of sources.
- COMMON NAMES OF PLANTS GROWING IN BANGLADESH AND WEST BENGAL (BENGALI). PDF image of typewritten document. About 450 "common names", all in Roman letters, matched to botanical name(s). Common names are tagged for language: Bengali, English, unknown, a few others. Looks like around 35-40% are English, and some of those are misspelled.
- Common names of fish in Bangla: “215” common names, matched with scientific names.
- Babynology: List of Bengali given names in Roman transliteration, for naming babies.
- Bengali Script
- Roman transliteration
- travlang.com: Phrases for travelers: transliteration with sound files
- Bayer, Josef (2001): "Two grammars in one: sentential complements and complementizers in Bengali and other South-Asian languages," in Peri Bhaskararao and Karamuri Venkata Subbarao (eds.) The Yearbook of South Asian Languages and Linguistics: Tokyo Symposium on South-Asian Languages - Contact, Convergence and Typology. New Delhi: Sage Publications. PDF
- Butt, Miriam (2001): "Case, Agreement, Pronoun Incorporation and Pro-Drop in South Asian Languages". Talk held at the Workshop on The Role of Agreement in Argument Structure, August 31-September 1, 2001, Utrecht. PS, PDF
- Dirdal, Hildegunn (2005): "The acquisition of articles by Bengali learners of English". Presented at Forum for flerspråklig forskning — høstsemesteret 2005. Handout
- Fitzpatrick-Cole, Jennifer (1990): The minimal word in Bengali. In The Proceedings of the Ninth West Coast Conferenceon Formal Linguistics,Stanford Linguistics Association, 1990.
- Fitzpatrick-Cole, Jennifer (1994): The Prosodic Domain Hierarchy in Reduplication. Ph.D. dissertation, Stanford.
- Fitzpatrick-Cole, Jennifer (1996): "Reduplication meets the phonological phrase in Bengali," Linguistic Review 13.305-356.
- Ghosh, Sanjukta and Probal Dasgupta: "The role of classifiers in quantification," Handout for talk presented at the 21st South Asian Language Analysis Roundtable, October 7-10, University of Konstanz.
- Keane, Elinor (2001): Echo Words in Tamil. Unpublished Ph.D. dissertation, Merton College, Oxford. Contains a fairly extensive discussion of echo words in Bengali. Abstract
- Khan, Zeeshan (1994): "Bangla Verb Classes and Alternations," in Douglas A. Jones, Robert C. Berwick, Franklin Cho, Zeeshan Khan, Karen T. Kohl, Naoyuki Nomura, Anand Radhakrishnan, Ulrich Sauerland, and Brian Ulicny, Verb Classes and Alternations in Bangla, German, English, and Korean (AI Memo # 1517). Cambridge, Massachusetts: MIT. pp. 36-50.
- Lahiri, Aditi and Jennifer Fitzpatrick-Cole (1999): "Emphatic Clitics and Focus Intonation in Bengali," in René Kager & Wim Zonneveld (eds.) Phrasal Phonology. Pp. 119-144. Dordrecht: Foris.
Linguistic portals and bibliographies
- The LINGUIST List home page offers many options for search, some of which are listed in appropriate sections of this page.
- LINGUIST List publications page. Search form is at foot of page. Choose Publication Category (Books, Reviews, Dissertations, or All) and Subject Language.
- OLAC, the Open Language Archives Community
- SIL Bibliography from Ethnologue
- UCLA Language Materials Project. In the "Materials" field, select ALL MATERIALS, ALL TEACHING MATERIALS, ALL AUTHENTIC MATERIALS, or one of the listed subtypes.
Encoding and Fonts
Before the development and general use of Unicode, computer use of Bengali and other South Asian languages required special fonts using only one byte. Many of these fonts were specific to one website or another and used idiosyncratic encodings. To some extent that is still the case; and so this page includes some such sites (see News), and some resources for specific fonts and encoding converters. See, for example, eThikana below.
In addition, the Bureau of Indian Standards supports its own ISCII standard (below), which provides an 8-bit encoding using escape sequences to announce the language of the following coded character sequence.
The Unicode range for Bengali is 0980-09FF.
- Constable, Peter (2004): Encoding of Bengali Khanda Ta in Unicode. Unicode Consortium Public Review Issue #30.
- Pennsylvania State University information page. Includes information about fonts and rendering issues.
- Penn State chart of Unicode Entity Codes for the Bengali (Bangla) Script (including OS X and Windows keyboard entry)
- White, Andy. (2003). Exnet. “This site hosts documents relating to the encoding of Indic scripts. Most documents contain a bias towards the Bengali script (due to my own preferances).” (Last updated 10th March 2003)
Indian Script Code for Information Exchange. See ISCII.
This 8-bit encoding is used by a number of news sites. See eThikana: Sutonny under Fonts.
ITRANS and CS/CSX
ITRANS is a transliteration package, no longer supported [latest: Minor Updates Version 5.32, February 2011] but still available as freeware. “A package for printing text in Indian languages using English-encoded input.” Provides two input encodings:
- ITRANS: 7-bit; uses multi-character English codes to represent each Bengali letter
- CS/CSX: 8-bit; uses one-character codes
The 8-bit encoding, at least, is still used on some sites. See ItxBeng under Fonts.
- Ekushey. 24 plain and fancy Unicode fonts, free download. [Mamandel 120626]
- eThikana. All are 8-bit. Grouped by encoding:
- Sutonny encoding: Oporajita (≠ Aparajita), Sulekha, MahouaMJ.
- AdarshaLipi, Moina
- Aparajita (≠ Oporajita)
- Ekush, Falgun
- ItxBeng is part of the ITRANS transliteration package. The itxbeng.ttf TrueType font uses an 8-bit encoding. It is also available from RabindraSangeet.org.
- Omicron Lab. Unicode fonts for Bengali. “You will need Avro Keyboard ... to use these Unicode Bangla Fonts. Click here for FREE download”
- Penn State University. Browser and Font Recommendations for Bengali. Also information on Setup for Keyboarding. (Last Modified: Monday, 29-Aug-2011)
- Rezaul: The site is a portal of sorts; this directory has links to fonts and encoding documentation as well as typing software and other tools.
- South Asia Language Resource Center of the University of Chicago. Links to Bengali fonts (most of them available for free download), input schemes and keyboard layouts, and information about Mac vs. PC vs. Linux rendering issues.
- Microsoft. (2002) Creating and supporting OpenType fonts for the Bengali script: Microsoft doc on Unicode 3.1 for Bengali. “Registered features of the Bengali script are defined and illustrated, encodings are listed, and templates are included for compiling Bengali layout tables for OpenType fonts. This document also presents information about the Bengali OpenType shaping engine of Uniscribe, an operating system component responsible for text layout.”
- Unicodify: From Lancaster University, producers of Emille corpus. Includes conversion for AdarshaLipi, AdarshaLipiExp, and AdarshaLipiNormal2 fonts among others. Runs on Windows. Source code available (last updated on the 5th September 2004).
- Unicode Consortium: FAQ on Indic Scripts and Languages lists these conversion tools:
- ALA-LC Transliteration Table (American Library Assn. - Library of Congress)
- ITRANS: See ITRANS and CS/CSX
- EMILLE corpus. Free license for non-profit research use. Approximately 5,520,000 words. Documentation
- Wikipedia. Unicode. 23,014 articles. (CC-BY-SA),(GFDL)
News and portals
- BBC (British Broadcasting Corporation)
- Bhorerkagoj (Sutonny encoding*)
- CRI (China Radio International)
- The Daily Amadershomoy (Sutonny encoding*)
- Daily Jugantor (Sutonny encoding*)
- Deutsche Welle
- The Jaijaidin
- Daily Manab Zamin
- NHK in Bengali
- UKBD News (UK and Bangladesh)
- VOA (Voice of America)
* see eThikana: Sutonny under Fonts
- Banglablogs: Blog portal. Some of the blogs are in Bengali, some in English.
- Banglapundit: “This blog is open to invited readers only ...you might want to contact the blog author and request an invitation.”
- Deutsche Welle Bengali (ডয়চে ভেলে বাংলা বিভাগ). “News and information for Bangladesh and India and all ex-pats living abroad”. All posts are in Bengali script; some comments are in romanization. [Mamandel 120813]
- International Marxist Tendency: Multilingual parallel text, but apparently not systematic. A few articles are in Bengali as well as English and other languages.
- Bangladesh Awami League.
- Bengali in GNU/Linux HOWTO; Bangla Translation
- Colonel Taher: commemorating a colonel in Bangladesh war of independence; some political and news content. Bengali is in Unicode. English is not as complete as Bengali.
- EMILLE corpus. 200,000 words of text in English (information leaflets from the UK Government and various local authorities) with Bengali translation. Free license for non-profit research use.
- ODIN list of documents and pages with interlinear examples for Bengali
- EMILLE corpus. Free license for non-profit research use. 442,000 words from radio broadcasts, plus “small amounts of demographically-sampled speech”
- VOA podcasts
- Washington Bangla Radio
- Bengali Language Directory
- Bengali Language Resources on the Web
- Virtual Bangladesh
- Cyber Bangladesh: a variety of links to news profiles, literature, encyclopedias, songs, tourist information, fonts, etc.
- University of Cambridge International Examinations website: The following PDFs are lists of printed materials, both dated 07.02.14 (= Feb. 14, 2007)
- Periodicals to support O Level Bengali: some with postal address, none with web info
- Language Skills Practice to support O Level Bengali: Suggested resources for teachers to support the delivery of the syllabus, with postal address of publisher
Tools and Other NLP Resources
- Bangla in GNU/Linux HOWTO: A document on developing Bengali resources for GNU/Linux, and also about setting locales etc. for Bengali.
- Bhattacharya, Samit, & Choudhury, Monojit, & Sarkar, Sudeshna, & Basu, Anupam (2005): Inflectional morphology synthesis for Bengali noun, pronoun, and verb systems. National Conference on Computer Processing of Bangla. Independent University, Bangladesh.
- Dasgupta, Sajib and Vincent Ng (2007). Unsupervised morphological parsing of Bengali. Language Resources and Evaluation 40:3-4, pp. 311-330. Springer. DOI 10.1007/s10579-007-9031-y
- Faridee, Abu Zaher Md., and Francis M. Tyers. 2009. Development of a morphological analyser for Bengali. In Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, 2-3 November 2009, Universitat d’Alacant, Alacant, Spain; ed. Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Francis M.Tyers; pp. 43-50. PDF, 351KB.
- Language Technologies Research Centre (LTRC) of the International Institute of Information Technology, Hyderabad. “Basic and applied research on various aspects of natural language technology. The focus is on developing technologies in three major areas: Language Access and Machine Translation among English and Indian languages; Speech Processing for Indian languages; Search, Information Extraction and Retrieval for English and Indian languages”
- Archive of papers and keynote lectures from ICON-2008 thru 2011 (6th thru 9th International Conference on Natural Language Processing)
- Tools: Shallow Parser for Hindi; Hindi WordNet
- Downloads: bilingual dictionaries, morphological analyzers, font converters, corpus management resources, papers...
- Lekho: “a small collection of tools and resources to using bangla on computers... It is still likely (2011) that your computer does not support Bangla "out of the box". In order to read and write Bangla on your computer you will need apropriate fonts and standards aware software.” Links, downloads, information.
- Part-of-Speech Tagset: Indian Language Part-of-Speech Tagset: Bengali. Kalika Bali, Monojit Choudhury, Priyanka Biswas. 2010. “A corpus developed by Microsoft Research (MSR) India to support the task of Part-of-Speech Tagging (POS) and other data-driven linguistic research on Indian Languages in general. It is created as a part of the Indian Language Part-of-Speech Tagset (IL-POST) project” (LDC catalog #LDC2010T16)