Bengali/Bengali

From the LDC Language Resource Wiki

(Difference between revisions)
Jump to: navigation, search
m (Tools and Other NLP Resources)
m
Line 140: Line 140:
* [http://unicode.org/faq/indic.html#10 Unicode Consortium]: FAQ on Indic Scripts and Languages lists these conversion tools:  
* [http://unicode.org/faq/indic.html#10 Unicode Consortium]: FAQ on Indic Scripts and Languages lists these conversion tools:  
** Microsoft's [http://www.microsoft.com/typography/developers/volt/default.htm Visual OpenType Layout Tool] (VOLT)
** Microsoft's [http://www.microsoft.com/typography/developers/volt/default.htm Visual OpenType Layout Tool] (VOLT)
-
** Apple's [http://fonts.apple.com/Tools/index.html Font Tools]
+
** Apple's [http://developer.apple.com/fonts/ Font Tools]
** Adobe's [http://www.adobe.com/devnet/opentype/ Font Development Kit]
** Adobe's [http://www.adobe.com/devnet/opentype/ Font Development Kit]
** Pyrus' [http://www.fontlab.com/ FontLab]
** Pyrus' [http://www.fontlab.com/ FontLab]
Line 199: Line 199:
* [http://lekho.sourceforge.net/probad.html Bangla Proverbs]
* [http://lekho.sourceforge.net/probad.html Bangla Proverbs]
* [http://tldp.org/HOWTO/Bangla-HOWTO Bengali in GNU/Linux HOWTO]; [http://www.bengalinux.org/bn/documents/gnuhowtobn.html Bangla Translation]
* [http://tldp.org/HOWTO/Bangla-HOWTO Bengali in GNU/Linux HOWTO]; [http://www.bengalinux.org/bn/documents/gnuhowtobn.html Bangla Translation]
-
* [http://geocities.com/aboltabol_new/overview.htm Bengali Poetry]
 
* [http://www.col-taher.com/ Colonel Taher]: commemorating a colonel in Bangladesh war of independence; some political and news content. Bengali is in Unicode. [http://www.col-taher.com/english/default.asp English] is not as complete as Bengali.
* [http://www.col-taher.com/ Colonel Taher]: commemorating a colonel in Bangladesh war of independence; some political and news content. Bengali is in Unicode. [http://www.col-taher.com/english/default.asp English] is not as complete as Bengali.
* [http://www.ling.lancs.ac.uk/corplang/emille/ EMILLE] corpus. 200,000 words of text in English (information leaflets from the UK Government and various local authorities) with Bengali translation. Free license for non-profit research use.  
* [http://www.ling.lancs.ac.uk/corplang/emille/ EMILLE] corpus. 200,000 words of text in English (information leaflets from the UK Government and various local authorities) with Bengali translation. Free license for non-profit research use.  
Line 232: Line 231:
* [http://www.tldp.org/HOWTO/Bangla-HOWTO/ Bangla in GNU/Linux HOWTO]: A document on developing Bengali resources for GNU/Linux, and also about setting locales etc. for Bengali.
* [http://www.tldp.org/HOWTO/Bangla-HOWTO/ Bangla in GNU/Linux HOWTO]: A document on developing Bengali resources for GNU/Linux, and also about setting locales etc. for Bengali.
* [http://www.cel.iitkgp.ernet.in/~monojit/papers/nccpb.pdf Bhattacharya, Samit, & Choudhury, Monojit, & Sarkar, Sudeshna, & Basu, Anupam (2005)]:  Inflectional morphology synthesis for Bengali noun, pronoun, and verb systems. National Conference on Computer Processing of Bangla. Independent University, Bangladesh.  
* [http://www.cel.iitkgp.ernet.in/~monojit/papers/nccpb.pdf Bhattacharya, Samit, & Choudhury, Monojit, & Sarkar, Sudeshna, & Basu, Anupam (2005)]:  Inflectional morphology synthesis for Bengali noun, pronoun, and verb systems. National Conference on Computer Processing of Bangla. Independent University, Bangladesh.  
-
* [http://in.geocities.com/ad_rab/bengali/bwedit/ bwedit]: Bengali text editor for X11, writtin in Tcl/Tk. Can export ISCII.
 
* Dasgupta, Sajib and Vincent Ng (2007). [http://www.springerlink.com/content/d616315213617600/ Unsupervised morphological parsing of Bengali]. ''Language Resources and Evaluation'' 40:3-4, pp. 311-330. Springer. DOI 10.1007/s10579-007-9031-y
* Dasgupta, Sajib and Vincent Ng (2007). [http://www.springerlink.com/content/d616315213617600/ Unsupervised morphological parsing of Bengali]. ''Language Resources and Evaluation'' 40:3-4, pp. 311-330. Springer. DOI 10.1007/s10579-007-9031-y
* [http://www.mt-archive.info/FreeRBMT-2009-Faridee.pdf Faridee, Abu Zaher Md., and Francis M. Tyers. 2009.]  Development of a morphological analyser for Bengali. In ''Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation'', 2-3 November 2009, Universitat d’Alacant, Alacant, Spain; ed. Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Francis M.Tyers; pp. 43-50.  PDF, 351KB.
* [http://www.mt-archive.info/FreeRBMT-2009-Faridee.pdf Faridee, Abu Zaher Md., and Francis M. Tyers. 2009.]  Development of a morphological analyser for Bengali. In ''Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation'', 2-3 November 2009, Universitat d’Alacant, Alacant, Spain; ed. Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Francis M.Tyers; pp. 43-50.  PDF, 351KB.

Revision as of 20:38, 22 April 2011

Home > Bengali

বাংলা


BENGALI


Contents

General

Language summary

(Information from Ethnologue, 2009-05-13)

  • ISO 639-3 code: ben
  • Spoken in: Western Bangladesh, India, Nepal, and diaspora
  • Population: 100,000,000 in Bangladesh (1994 UBS). 211,000,000 including second-language speakers (1999 WA). Population total all countries: 171,070,202.
  • Alternate names: Banga-Bhasa, Bangala, Bangla
  • Dialects: Languages or dialects in the Bengali group according to Grierson (1903-1928): Central (Standard) Bengali, Western Bengali (Kharia Thar, Mal Paharia, Saraki), Southwestern Bengali, Northern Bengali (Koch, Siripuria), Rajbanshi, Bahe, Eastern Bengali (East Central, including Sylhetti), Haijong, Southeastern Bengali (Chakma), Ganda, Vanga, Chittagonian (possible dialect of Southeastern Bengali).
  • Classification: Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bengali-Assamese
  • Script: Bengali

Linguistic notes

Morphology

Bengali marks plural by means of several different suffixes, the choice depending on the noun, and several cases, again by means of several different suffixes.

Writing System

Bengali is written in a Brahmi derivative, with ligatures, and with vowel markers located in various positions around the consonant.

Linguistic resources

  • Bhattacharya, Tanmoy. (2001): Bengali. In Facts About the World's Languages: An Encyclopedia of the World's Major Languages: Past and Present, ed. Jane Garry and Carl Rubino. New York / Dublin: H.W. Wilson Press. ISBN 0824209702.
  • Klaiman, M. H. (1987): Bengali. In The World's Major Languages, ed. Bernard Comrie, pp. 490-513. Oxford University Press. ISBN 978-0195065114.


Grammar

  • Milne, W. S. (1993): A Practical Bengali Grammar. Laurier Books Ltd. ISBN 8120608771. 562 pp.
  • Mojumder, Atindra (1973): Bengali Language Historical Grammar. Calcutta: Firma K. L. Mukhopadhyay.
  • Smith, W. L. (1997): Bengali Reference Grammar. Stockholm: Association of International Studies. Stockholm Oriental Textbook Series 1. ISBN 9197085472. 197 pp.

Lexicon

Topical word lists

Names
  • Babynology: List of Bengali baby names in Roman transliteration

Phrasebooks

Writing

  • Bagchi, Tista (1996): "Bengali Writing," in William Bright and Peter Daniels (eds.) The World's Writing Systems. New York: Oxford University Press. pp. 399-403. ISBN 0-19-507993-0.

Monographs

  • Bayer, Josef (2001): "Two grammars in one: sentential complements and complementizers in Bengali and other South-Asian languages," in Peri Bhaskararao and Karamuri Venkata Subbarao (eds.) The Yearbook of South Asian Languages and Linguistics: Tokyo Symposium on South-Asian Languages - Contact, Convergence and Typology. New Delhi: Sage Publications.
  • Butt, Miriam (2001): "Case, agreement, pronoun incorporation, and pro-drop in South Asian languages," handout for talk presented at the workshop The Role of Agreement in Argument Structure, August 31-September 1, 2001, Utrecht.
  • Dirdal, Hildegunn: "The acquisition of articles by Bengali learners of English". Handout?
  • Fitzpatrick-Cole, Jennifer (1994): The Prosodic Domain Hierarchy in Reduplication. Ph.D. dissertation, Stanford.
  • Fitzpatrick-Cole, Jennifer (1996): "Reduplication meets the phonological phrase in Bengali," Linguistic Review 13.305-356.
  • Ghosh, Sanjukta and Probal Dasgupta: "The role of classifiers in quantification," Handout for talk presented at the 21st South Asian Language Analysis Roundtable, October 7-10, University of Konstanz.
  • Keane, Elinor (2001): Echo Words in Tamil. Unpublished Ph.D. dissertation, Merton College, Oxford. Contains a fairly extensive discussion of echo words in Bengali.
  • Khan, Zeeshan (1994): "Bangla Verb Classes and Alternations," in Douglas A. Jones, Robert C. Berwick, Franklin Cho, Zeeshan Khan, Karen T. Kohl, Naoyuki Nomura, Anand Radhakrishnan, Ulrich Sauerland, and Brian Ulicny, Verb Classes and Alternations in Bangla, German, English, and Korean (AI Memo # 1517). Cambridge, Massachusetts: MIT. pp. 36-50.
  • Lahiri, Aditi and Jennifer Fitzpatrick-Cole (1999): "Emphatic Clitics and Focus Intonation in Bengali," in René Kager & Wim Zonneveld (eds.) Phrasal Phonology. Pp. 119-144. Dordrecht: Foris.

Linguistic portals and bibliographies

Encoding and Fonts

Before the development and general use of Unicode, computer use of Bengali and other South Asian languages required special fonts using only one byte. Many of these fonts were specific to one website or another and used idiosyncratic encodings. To some extent that is still the case; and so this page includes some such sites (see News), and some resources for specific fonts and encoding converters. See, for example, eThikana below.

In addition, the Bureau of Indian Standards supports its own ISCII standard (below), which provides an 8-bit encoding using escape sequences to announce the language of the following coded character sequence.

Encodings

Unicode

The Unicode range for Bengali is 0980-09FF.

ISCII

See ISCII.

Sutonny

This 8-bit encoding is used by a number of news sites. See Poroshmoni under Conversion, and Bangla.com: Sutonny under Fonts.

ITRANS and CS/CSX

ITRANS is a transliteration package, no longer supported but still available. Provides two input encodings:

  • ITRANS: 7-bit; uses multi-character English codes to represent each Bengali letter
  • CS/CSX: 8-bit; uses one-character codes

The 8-bit encoding, at least, is still used on some sites. See ItxBeng under Fonts.

Fonts

  • Ekushey. Twelve Unicode fonts.
  • eThikana. All are 8-bit. Grouped by encoding:
    • Sutonny encoding: Oporajita (≠ Aparajita), Sulekha, MahouaMJ.
    • AdarshaBangla
    • AdarshaLipi, Moina
    • Aparajita (≠ Oporajita)
    • Basundhara
    • Boishakhi
    • Ekush, Falgun
    • Progoty
    • SonarGaon
  • ItxBeng is part of the ITRANS transliteration package. The itxbeng.ttf TrueType font uses an 8-bit encoding. It is also available from RabindraSangeet.org.
  • Omicron Lab. Unicode fonts for Bengali
  • Penn State University. (2009.) Browser and Font Recommendations for Bengali. Also information on Setup for Keyboarding.
  • Rezaul: The site is a portal of sorts; this directory has links to fonts and encoding documentation.
  • South Asia Language Resource Center of the University of Chicago. Links to Bengali fonts (most of them available for free download), input schemes and keyboard layouts, and information about Mac vs. PC vs. Linux rendering issues.

Also:

Conversion

  • Unicodify: From Lancaster University, producers of Emille corpus. Includes conversion for AdarshaLipi, AdarshaLipiExp, and AdarshaLipiNormal2 fonts among others. Runs on Windows (source code available).

Transliteration

Data Sources

Monolingual Text

News and portals

* see Poroshmoni under Conversion, and Bangla.com: Sutonny under Fonts

Blogs

Parallel Text

Speech

Portals

Tools and Other NLP Resources

Personal tools