Bengali/Bengali

From the LDC Language Resource Wiki

(Difference between revisions)
Jump to: navigation, search
m (Monolingual Text)
m
 
(52 intermediate revisions not shown)
Line 7: Line 7:
-
 
+
{{si|[[User:Mamandel|Mamandel]] 17:48, 3 May 2011 (UTC)}}
==General==
==General==
Line 27: Line 27:
Bengali is written in a Brahmi derivative, with ligatures, and with vowel markers located in various positions around the consonant.  
Bengali is written in a Brahmi derivative, with ligatures, and with vowel markers located in various positions around the consonant.  
* [http://www.omniglot.com/writing/bengali.htm Omniglot]
* [http://www.omniglot.com/writing/bengali.htm Omniglot]
-
* [http://en.wikipedia.org/wiki/Bengali_script Wikipedia: Bengali script]
+
* [http://en.wikipedia.org/wiki/Bengali_script Wikipedia: Bengali script] {{CC-BY-SA}},{{GFDL}}
* [http://people.w3.org/rishida/scripts/bengali/bengali-script/ Bengali script notes <nowiki>[Draft]</nowiki>], Richard Ishida. {{attrib|Last update 2005-02-04}}
* [http://people.w3.org/rishida/scripts/bengali/bengali-script/ Bengali script notes <nowiki>[Draft]</nowiki>], Richard Ishida. {{attrib|Last update 2005-02-04}}
Line 52: Line 52:
** [http://dsal.uchicago.edu/dictionaries/dasa/ Dasa Bengali beta]
** [http://dsal.uchicago.edu/dictionaries/dasa/ Dasa Bengali beta]
* [http://www.bengalinux.org/projects/dictionary/ GNU Dictionary]: Monolingual dictionary, used by GNU project spell checker.
* [http://www.bengalinux.org/projects/dictionary/ GNU Dictionary]: Monolingual dictionary, used by GNU project spell checker.
-
* [http://ltrc.iiit.net/onlineServices/Dictionaries/Dict_Frame.html LTRC]: Two Bengali-Hindi dictionaries, but no English glosses. ISCII encodings. There is also an English-Hindi dict, so it might be possible to cross-correlate entries. GPL license.
 
* [http://www.bangladict.org/dictext.php Online Bangla Obhidhan]: Online lexicon, at least 6k unique entries. Akkhor encoding converter appears to work. [http://www.bangladict.org/dicimage.php This page] gives gif of Bangla chars instead.
* [http://www.bangladict.org/dictext.php Online Bangla Obhidhan]: Online lexicon, at least 6k unique entries. Akkhor encoding converter appears to work. [http://www.bangladict.org/dicimage.php This page] gives gif of Bangla chars instead.
-
* [http://www.virtualbangladesh.com/dictionary.php Virtual Bangladesh: Dictionary]: On-line lexicon, claims 3k words. Results are displayed in a romanization and "often" in Bangla script using a Unicode font.
+
* [http://www.virtualbangladesh.com/dictionary.php Virtual Bangladesh: Dictionary]: On-line lexicon, claims 3k words. Results are displayed in a romanization and {{hq|often}} in Bangla script using a Unicode font. {{hq|The dictionary is still work in progress.}}
-
* [http://bn.wiktionary.org/wiki/ Wiktionary (Bengali)]. Monolingual. Unicode.
+
* [http://bn.wikipedia.org/wiki/প্রধান_পাতা Wiktionary]. Unicode. Monolingual. 21,347 entries. {{CC-BY-SA}},{{GFDL}} {{si|[[User:Mamandel|Mamandel]] 16:27, 3 May 2010 (UTC)}}
-
 
+
-
 
+
====Topical word lists====
====Topical word lists====
 +
{{si|[[User:Mamandel|Mamandel]] 16:40, 3 May 2011 (UTC)}}
-
* [http://filaman.ifm-geomar.de/comnames/scriptlist.cfm?script=bangla Common Names in Bangla]: Names of plants in Bengali with their scientific names
+
* [http://iu.ff.cuni.cz/pandanus/ Pandanus]  [http://iu.ff.cuni.cz/pandanus/database/ Database of Indian Plants]: {{hq|146}} names of Indian plants in Bengali, five other Indian languages (all in romanization), English, and Latin.
 +
*:[http://iu.ff.cuni.cz/pandanus/database/?enc=utf&sort=ka&display=20&reswind=this&ben=on&start=0 Display Bengali names.] Each name links to a list of names for the plant in all the languages and to a detailed set of descriptions from a number of sources. {{si
 +
* [http://www.beesfordevelopment.org/uploads/Common%20names%20of%20plants%20in%20Bangladesh.pdf COMMON NAMES OF PLANTS GROWING IN BANGLADESH AND WEST BENGAL (BENGALI)]. PDF image of typewritten document. About 450 "common names", all in Roman letters, matched to botanical name(s). Common names are tagged for language: Bengali, English, unknown, a few others. Looks like around 35-40% are English, and some of those are misspelled.  
 +
* [http://www.fishbase.us/comnames/scriptlist.cfm?script=bangla Common names of fish in Bangla]: {{hq|215}} common names, matched with scientific names.
=====Names=====
=====Names=====
Line 68: Line 69:
====Phrasebooks====
====Phrasebooks====
* Bengali Script
* Bengali Script
-
** http://www.virtualbangladesh.com/bd_phrases.html
 
** http://www.bengalitranslator.net/ENGLISH_WORDS_PHRASES_INTO%20BENGALI/index.html  
** http://www.bengalitranslator.net/ENGLISH_WORDS_PHRASES_INTO%20BENGALI/index.html  
* Roman transliteration
* Roman transliteration
Line 78: Line 78:
===Monographs===
===Monographs===
-
* Bayer, Josef (2001):  "Two grammars in one: sentential complements and complementizers in Bengali and other South-Asian languages," in Peri Bhaskararao and Karamuri Venkata Subbarao (eds.) ''The Yearbook of South Asian Languages and Linguistics: Tokyo Symposium on South-Asian Languages - Contact, Convergence and Typology''. New Delhi: Sage Publications. [http://ling.uni-konstanz.de/pages/home/bayer/pdf/two-grammars-in-one.pdf PDF]
+
* [http://ling.uni-konstanz.de/pages/home/bayer/pdf/two-grammars-in-one.pdf Bayer, Josef (2001)]:  "Two grammars in one: sentential complements and complementizers in Bengali and other South-Asian languages," in Peri Bhaskararao and Karamuri Venkata Subbarao (eds.) ''The Yearbook of South Asian Languages and Linguistics: Tokyo Symposium on South-Asian Languages - Contact, Convergence and Typology''. New Delhi: Sage Publications.
* Butt, Miriam (2001):  "Case, agreement, pronoun incorporation, and pro-drop in South Asian languages," [http://ling.uni-konstanz.de/pages/home/butt/utrecht01-hnd.pdf handout] for talk presented at the workshop ''The Role of Agreement in Argument Structure'', August 31-September 1, 2001, Utrecht.  
* Butt, Miriam (2001):  "Case, agreement, pronoun incorporation, and pro-drop in South Asian languages," [http://ling.uni-konstanz.de/pages/home/butt/utrecht01-hnd.pdf handout] for talk presented at the workshop ''The Role of Agreement in Argument Structure'', August 31-September 1, 2001, Utrecht.  
-
* Dirdal, Hildegunn:  "The acquisition of articles by Bengali learners of English". [http://www.hf.uio.no/forskningsprosjekter/sprik/docs/doc/Handout_Hildegunn_Dirdal.doc Handout?]
+
* Dirdal, Hildegunn:  "The acquisition of articles by Bengali learners of English". [http://www.hf.uio.no/ilos/forskning/prosjekter/sprik/docs/doc/Handout_Hildegunn_Dirdal.doc Handout]
* Fitzpatrick-Cole, Jennifer (1994):  ''The Prosodic Domain Hierarchy in Reduplication''. Ph.D. dissertation, Stanford.
* Fitzpatrick-Cole, Jennifer (1994):  ''The Prosodic Domain Hierarchy in Reduplication''. Ph.D. dissertation, Stanford.
* Fitzpatrick-Cole, Jennifer (1996):  "Reduplication meets the phonological phrase in Bengali," ''Linguistic Review'' 13.305-356.
* Fitzpatrick-Cole, Jennifer (1996):  "Reduplication meets the phonological phrase in Bengali," ''Linguistic Review'' 13.305-356.
* Ghosh, Sanjukta and Probal Dasgupta:  "The role of classifiers in quantification," [http://ling.uni-konstanz.de/pages/conferences/sala01/abstracts/ghosh_etal.pdf Handout] for talk presented at the 21st South Asian Language Analysis Roundtable, October 7-10, University of Konstanz.
* Ghosh, Sanjukta and Probal Dasgupta:  "The role of classifiers in quantification," [http://ling.uni-konstanz.de/pages/conferences/sala01/abstracts/ghosh_etal.pdf Handout] for talk presented at the 21st South Asian Language Analysis Roundtable, October 7-10, University of Konstanz.
* Keane, Elinor (2001):  ''Echo Words in Tamil''. [http://users.ox.ac.uk/~sjoh0535/Thesis/ Unpublished Ph.D. dissertation], Merton College, Oxford. Contains a fairly extensive discussion of echo words in Bengali.
* Keane, Elinor (2001):  ''Echo Words in Tamil''. [http://users.ox.ac.uk/~sjoh0535/Thesis/ Unpublished Ph.D. dissertation], Merton College, Oxford. Contains a fairly extensive discussion of echo words in Bengali.
-
* Khan, Zeeshan (1994):  "Bangla Verb Classes and Alternations," in Douglas A. Jones, Robert C. Berwick, Franklin Cho, Zeeshan Khan, Karen T. Kohl, Naoyuki Nomura, Anand Radhakrishnan, Ulrich Sauerland, and Brian Ulicny, ''Verb Classes and Alternations in Bangla, German, English, and Korean'' ([http://hdl.handle.net/1721.1/7197 AI Memo # 1517]). Cambridge, Massachusetts: MIT. pp. 36-50.  
+
* Khan, Zeeshan (1994):  "Bangla Verb Classes and Alternations," in Douglas A. Jones, Robert C. Berwick, Franklin Cho, Zeeshan Khan, Karen T. Kohl, Naoyuki Nomura, Anand Radhakrishnan, Ulrich Sauerland, and Brian Ulicny, ''Verb Classes and Alternations in Bangla, German, English, and Korean'' ([http://dspace.mit.edu/handle/1721.1/7197 AI Memo # 1517]). Cambridge, Massachusetts: MIT. pp. 36-50.  
-
* Lahiri, Aditi and Jennifer Fitzpatrick-Cole (1999):  "Emphatic Clitics and Focus Intonation in Bengali," in Ren&eacute; Kager &amp; Wim Zonneveld (eds.) ''Phrasal Phonology''. [http://ling.uni-konstanz.de/pages/home/fitzpatrick/lahirifitz99.pdf Pp. 119-144] Dordrecht: Foris.
+
* Lahiri, Aditi and Jennifer Fitzpatrick-Cole (1999):  "Emphatic Clitics and Focus Intonation in Bengali," in René Kager & Wim Zonneveld (eds.) ''Phrasal Phonology''. Pp. 119-144. Dordrecht: Foris.
===Linguistic portals and bibliographies===
===Linguistic portals and bibliographies===
Line 99: Line 99:
The Unicode range for Bengali is [http://www.unicode.org/charts/PDF/U0980.pdf 0980-09FF].  
The Unicode range for Bengali is [http://www.unicode.org/charts/PDF/U0980.pdf 0980-09FF].  
-
* Constable, Peter (2004):  [http://www.unicode.org/review/pr-30.pdf ''Encoding of Bengali Khanda Ta in Unicode''.] Unicode Consortium Public Review Issue #30.
+
* [http://www.unicode.org/review/pr-30.pdf Constable, Peter (2004)]:  ''Encoding of Bengali Khanda Ta in Unicode''. Unicode Consortium Public Review Issue #30.
* Pennsylvania State University [http://tlt.its.psu.edu/suggestions/international/bylanguage/bengali.html information page].  
* Pennsylvania State University [http://tlt.its.psu.edu/suggestions/international/bylanguage/bengali.html information page].  
-
**[http://tlt.psu.edu/suggestions/international/bylanguage/bengalichart.html Penn State chart of Unicode Entity Codes for the Bengali (Bangla) Script] (including OS X and Windows keyboard entry)
+
**[http://tlt.its.psu.edu/suggestions/international/bylanguage/bengalichart.html Penn State chart of Unicode Entity Codes for the Bengali (Bangla) Script] (including OS X and Windows keyboard entry)
-
* White, Andy. (2003). [http://www.exnet.btinternet.co.uk/ Exnet]. "This site hosts documents relating to the encoding of Indic scripts. Most documents contain a bias towards the Bengali script (due to my own preferances)." (Last updated 10th March 2003)
+
* White, Andy. (2003). [http://www.exnet.btinternet.co.uk/ Exnet]. {{hq|This site hosts documents relating to the encoding of Indic scripts. Most documents contain a bias towards the Bengali script (due to my own preferances).}} (Last updated 10th March 2003)
====ISCII====
====ISCII====
Line 108: Line 108:
====Sutonny====
====Sutonny====
-
This 8-bit encoding is used by a number of news sites. See ''Poroshmoni'' under [[#Conversion|Conversion]], and ''Bangla.com: Sutonny'' under [[#Fonts|Fonts]].
+
This 8-bit encoding is used by a number of news sites. See
 +
<!-- 110422 mam''Poroshmoni'' under [[#Conversion|Conversion]], and -->
 +
''Bangla.com: Sutonny'' under [[#Fonts|Fonts]].
====ITRANS and CS/CSX====
====ITRANS and CS/CSX====
Line 135: Line 137:
Also:
Also:
-
* Microsoft. [http://www.microsoft.com/typography/otfntdev/bengalot/ Creating and supporting OpenType fonts for the Bengali script]: Microsoft doc on Unicode 3.1 for Bengali. "Registered features of the Bengali script are defined and illustrated, encodings are listed, and templates are included for compiling Bengali layout tables for OpenType fonts. This document also presents information about the Bengali OpenType shaping engine of Uniscribe, an operating system component responsible for text layout."
+
* Microsoft. [http://www.microsoft.com/typography/otfntdev/bengalot/ Creating and supporting OpenType fonts for the Bengali script]: Microsoft doc on Unicode 3.1 for Bengali. {{hq|Registered features of the Bengali script are defined and illustrated, encodings are listed, and templates are included for compiling Bengali layout tables for OpenType fonts. This document also presents information about the Bengali OpenType shaping engine of Uniscribe, an operating system component responsible for text layout.}}
===Conversion===
===Conversion===
Line 142: Line 144:
* [http://unicode.org/faq/indic.html#10 Unicode Consortium]: FAQ on Indic Scripts and Languages lists these conversion tools:  
* [http://unicode.org/faq/indic.html#10 Unicode Consortium]: FAQ on Indic Scripts and Languages lists these conversion tools:  
** Microsoft's [http://www.microsoft.com/typography/developers/volt/default.htm Visual OpenType Layout Tool] (VOLT)
** Microsoft's [http://www.microsoft.com/typography/developers/volt/default.htm Visual OpenType Layout Tool] (VOLT)
-
** Apple's [http://fonts.apple.com/Tools/index.html Font Tools]
+
** Apple's [http://developer.apple.com/fonts/ Font Tools]
-
** Adobe's [http://www.adobe.com/devnet/opentype/ Font Development Kit]
+
** Adobe's [http://www.mediawiki.org/wiki/MediaWiki Font Development Kit]
** Pyrus' [http://www.fontlab.com/ FontLab]
** Pyrus' [http://www.fontlab.com/ FontLab]
** [http://pfaedit.fontforge.net/ PFAEDIT (X-11-based, for Mac OSX, Cygwin, etc.)] (for the Linux OS)
** [http://pfaedit.fontforge.net/ PFAEDIT (X-11-based, for Mac OSX, Cygwin, etc.)] (for the Linux OS)
-
 
-
* [https://addons.mozilla.org/en-US/firefox/addon/9975 Poroshmoni]: Firefox plug-in by Rifat Nabi: convert specific non-Unicode newspaper sites to Unicode 4.2. Ver.0.3 (2008-12-22) works with Firefox 1.5–3.0 but not newer versions. See also [http://www.vistaarc.com/downloads/poroshmoni developer's site] with user discussion blog.<br/> All these sites but Prathom-Alo are in Sutonny encoding now. Prathom-Alo is in Unicode (see [http://www.vistaarc.com/downloads/poroshmoni#comment-937 this comment] on the Poroshmoni site) but their archives may not be.
 
-
** [http://www.bhorerkagoj.net/ Bhorerkagoj]
 
-
** [http://www.amadershomoy.com/ The Daily Amadershomoy]
 
-
** [http://www.ittefaq.com/ Ittefaq]
 
-
** [http://www.jaijaidin.com/ The Jaijaidin]
 
-
** [http://www.manabzamin.net/ Manabzamin]
 
-
** [http://www.prothom-alo.com/ Prathom-Alo]
 
===Transliteration===
===Transliteration===
* [http://www.loc.gov/catdir/cpso/romanization/bengali.pdf ALA-LC Transliteration Table (American Library Assn. - Library of Congress)]
* [http://www.loc.gov/catdir/cpso/romanization/bengali.pdf ALA-LC Transliteration Table (American Library Assn. - Library of Congress)]
-
* [http://www.iit.edu/~laksvij/language/bengali.html Indian Language Converter]: Type in Roman characters according to the Bengali character chart on the page and get Bengali text and HTML. On-web or [http://www.iit.edu/~laksvij/language/index.html download] with GNU GPL. E.g.:
+
* [http://www.aczoom.com/itrans/  ITRANS 5.30]: Chopde, Avinash. Uses ItxBeng 8-bit font. No longer supported but still available  <small>''[Accessed 2009-09-5]''</small>. {{hq|A package for printing text in Indian languages using English-encoded input.}} {{attrib|domain, aczoom.com, was formerly www.ac<u>zone</u>.com}}
-
*: Roman input: <tt>bMlaa</tt><br>Bengali output: বংলা<br>HTML output: <tt>&amp;#2476;&amp;#2434;&amp;#2482;&amp;#2494;&lt;br/&gt;</tt>
+
-
* [http://www.aczoom.com/itrans/  ITRANS 5.30]: Chopde, Avinash. Uses ItxBeng 8-bit font. No longer supported but still available  <small>''[Accessed 2009-09-5]''</small>. "A package for printing text in Indian languages using English-encoded input." {{attrib|domain, aczoom.com, was formerly www.ac<u>zone</u>.com}}
+
==Data Sources==
==Data Sources==
Line 166: Line 158:
* [http://www.ling.lancs.ac.uk/corplang/emille/ EMILLE] corpus. Free license for non-profit research use. Approximately 5,520,000 words.  [http://www.emille.lancs.ac.uk/manual.pdf Documentation]
* [http://www.ling.lancs.ac.uk/corplang/emille/ EMILLE] corpus. Free license for non-profit research use. Approximately 5,520,000 words.  [http://www.emille.lancs.ac.uk/manual.pdf Documentation]
-
* [http://bn.wikipedia.org Wikipedia]. Unicode.
+
* [http://bn.wikipedia.org/wiki/প্রধান_পাতা Wikipedia]. Unicode. 22,250 articles. {{CC-BY-SA}},{{GFDL}} {{si|[[User:Mamandel|Mamandel]] 17:42, 3 May 2011 (UTC)}}
====News and portals====
====News and portals====
Line 183: Line 175:
* [http://www.jaijaidin.com/ The Jaijaidin] (Sutonny encoding*)
* [http://www.jaijaidin.com/ The Jaijaidin] (Sutonny encoding*)
* [http://www.dailyjanakantha.com/ Janakantha]
* [http://www.dailyjanakantha.com/ Janakantha]
-
* [http://www.manabzamin.net/ Manabzamin] (Sutonny encoding*)
+
* [http://www.mzamin.net/ Daily Manab Zamin] (Sutonny encoding*)
* [http://www.nhk.or.jp/bengali/ NHK in Bengali]
* [http://www.nhk.or.jp/bengali/ NHK in Bengali]
* [http://www.prothom-alo.com/ Prathom-Alo] (Unicode, but see note*)
* [http://www.prothom-alo.com/ Prathom-Alo] (Unicode, but see note*)
* [http://www.ukbdnews.com/ UKBD News] (UK and Bangladesh)
* [http://www.ukbdnews.com/ UKBD News] (UK and Bangladesh)
-
<nowiki>*</nowiki> see ''Poroshmoni'' under [[#Conversion|Conversion]], and ''Bangla.com: Sutonny'' under [[#Fonts|Fonts]]
+
<nowiki>*</nowiki> see ''Bangla.com: Sutonny'' under [[#Fonts|Fonts]]
====Blogs====
====Blogs====
* [http://www.banglablogs.org/ Banglablogs]: Blog portal. Some of the blogs are in Bengali, some in English.
* [http://www.banglablogs.org/ Banglablogs]: Blog portal. Some of the blogs are in Bengali, some in English.
-
* [http://ashikshams.blogspot.com Ashikshams]
+
* [http://banglapundit.blogspot.com Banglapundit]: {{hq|This blog is open to invited readers only ...you might want to contact the blog author and request an invitation.}}
-
* [http://banglapundit.blogspot.com Banglapundit]
+
===Parallel Text===
===Parallel Text===
-
* [http://www.ahsania.info/bangla/ahsania/index.htm Ahsania]: Religious (English, French, Spanish versions)
+
* [http://www.marxist.com/ In Defence of Marxism]: Argentina - The Revolution has begun. Approx 13 pg article: [http://www.marxist.com/languages/bengali/argentina.doc Bengali], [http://www.marxist.com/Latinam/argentina_revolution_has_begun.html English]
-
* [http://www.marxist.com/languages/bengali/argentina.doc Argentina - The Revolution has begun]: Approx 13 pg article, from the website "In Defence of Marxism". [http://www.marxist.com/Latinam/argentina_revolution_has_begun.html English version]
+
* [http://www.albd.org/ Bangladesh Awami League]: A pair of parallel articles: [http://www.albd.org/aldoc/15ptdemand.pdf Bengali] (unknown encoding) and [http://www.albd.org/aldoc/15point_en.htm English].
-
* [http://www.albd.org/ Bangladesh Awami League]: A pair of parallel articles: [http://www.albd.org/aldoc/15ptdemand.pdf Bengali] and [http://www.albd.org/aldoc/15point_en.htm English]. Bengali texts seem to be in PDF, with an unknown encoding. [http://www.albd.org/news/Archive.htm Archives]
+
* [http://lekho.sourceforge.net/probad.html Bangla Proverbs]
* [http://lekho.sourceforge.net/probad.html Bangla Proverbs]
* [http://tldp.org/HOWTO/Bangla-HOWTO Bengali in GNU/Linux HOWTO]; [http://www.bengalinux.org/bn/documents/gnuhowtobn.html Bangla Translation]
* [http://tldp.org/HOWTO/Bangla-HOWTO Bengali in GNU/Linux HOWTO]; [http://www.bengalinux.org/bn/documents/gnuhowtobn.html Bangla Translation]
-
* [http://geocities.com/aboltabol_new/overview.htm Bengali Poetry]
 
* [http://www.col-taher.com/ Colonel Taher]: commemorating a colonel in Bangladesh war of independence; some political and news content. Bengali is in Unicode. [http://www.col-taher.com/english/default.asp English] is not as complete as Bengali.
* [http://www.col-taher.com/ Colonel Taher]: commemorating a colonel in Bangladesh war of independence; some political and news content. Bengali is in Unicode. [http://www.col-taher.com/english/default.asp English] is not as complete as Bengali.
* [http://www.ling.lancs.ac.uk/corplang/emille/ EMILLE] corpus. 200,000 words of text in English (information leaflets from the UK Government and various local authorities) with Bengali translation. Free license for non-profit research use.  
* [http://www.ling.lancs.ac.uk/corplang/emille/ EMILLE] corpus. 200,000 words of text in English (information leaflets from the UK Government and various local authorities) with Bengali translation. Free license for non-profit research use.  
Line 207: Line 196:
===Speech===
===Speech===
-
* [http://www.ling.lancs.ac.uk/corplang/emille/ EMILLE] corpus. Free license for non-profit research use. 442,000 words from radio broadcasts, plus "small amounts of demographically-sampled speech"
+
* [http://www.ling.lancs.ac.uk/corplang/emille/ EMILLE] corpus. Free license for non-profit research use. 442,000 words from radio broadcasts, plus {{hq|small amounts of demographically-sampled speech}}
* [http://www.bbc.co.uk/bengali/ BBC].
* [http://www.bbc.co.uk/bengali/ BBC].
-
* [http://www.voanews.com/bangla/webcasts.cfm VOA].
+
* [http://www.voanews.com/bangla/podcasts VOA] {{si|110422 mam}}
==Portals==
==Portals==
Line 223: Line 212:
** [http://www.cie.org.uk/docs/dynamic/883.pdf Language Skills Practice to support O Level Bengali]: Suggested resources for teachers to support the delivery of the syllabus
** [http://www.cie.org.uk/docs/dynamic/883.pdf Language Skills Practice to support O Level Bengali]: Suggested resources for teachers to support the delivery of the syllabus
**http://www.Bangladesh.net
**http://www.Bangladesh.net
-
**http://www.bbc.co.uk/bengali  
+
**http://www.bbc.co.uk/bengali/
**http://www.indiamatch.com
**http://www.indiamatch.com
==Tools and Other NLP Resources==
==Tools and Other NLP Resources==
-
* [http://bengalinux.sourceforge.net/new/index.php Ankur]: Project to support Bengali, mostly on XServer but some on GNU Linux. {{attrib|Page last updated 2006-08-17; accessed 2009-04-20}}<br/>Includes
 
-
** [http://sourceforge.net/project/shownotes.php?group_id=43331&release_id=370749 OpenOffice 2.0]
 
-
** [http://sourceforge.net/project/showfiles.php?group_id=43331&package_id=68198 Bengali word list with over 100,000 words for GNU Aspell]
 
-
** [http://sourceforge.net/project/showfiles.php?group_id=43331&package_id=66357 bspeller]: "Light weight text editor with a Bengali spell checker"; runs under Linux. (The spell checker uses the Gnu Aspell program, and might be runnable as a stand-alone.)
 
* [http://www.tldp.org/HOWTO/Bangla-HOWTO/ Bangla in GNU/Linux HOWTO]: A document on developing Bengali resources for GNU/Linux, and also about setting locales etc. for Bengali.
* [http://www.tldp.org/HOWTO/Bangla-HOWTO/ Bangla in GNU/Linux HOWTO]: A document on developing Bengali resources for GNU/Linux, and also about setting locales etc. for Bengali.
-
* Bhattacharya, Samit, & Choudhury, Monojit, & Sarkar, Sudeshna, & Basu, Anupam (2005):  [http://www.cel.iitkgp.ernet.in/~monojit/papers/nccpb.pdf Inflectional morphology synthesis for Bengali noun, pronoun, and verb systems]. National Conference on Computer Processing of Bangla. Independent University, Bangladesh.
+
* [http://www.cel.iitkgp.ernet.in/~monojit/papers/nccpb.pdf Bhattacharya, Samit, & Choudhury, Monojit, & Sarkar, Sudeshna, & Basu, Anupam (2005)]:  Inflectional morphology synthesis for Bengali noun, pronoun, and verb systems. National Conference on Computer Processing of Bangla. Independent University, Bangladesh.  
-
* [http://in.geocities.com/ad_rab/bengali/bwedit/ bwedit]: Bengali text editor for X11, writtin in Tcl/Tk. Can export ISCII.
+
* Dasgupta, Sajib and Vincent Ng (2007). [http://www.springerlink.com/content/d616315213617600/ Unsupervised morphological parsing of Bengali]. ''Language Resources and Evaluation'' 40:3-4, pp. 311-330. Springer. DOI 10.1007/s10579-007-9031-y
* Dasgupta, Sajib and Vincent Ng (2007). [http://www.springerlink.com/content/d616315213617600/ Unsupervised morphological parsing of Bengali]. ''Language Resources and Evaluation'' 40:3-4, pp. 311-330. Springer. DOI 10.1007/s10579-007-9031-y
-
* Faridee, Abu Zaher Md., and Francis M. Tyers. 2009. [http://www.mt-archive.info/FreeRBMT-2009-Faridee.pdf Development of a morphological analyser for Bengali]. In ''Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation'', 2-3 November 2009, Universitat d’Alacant, Alacant, Spain; ed. Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Francis M.Tyers; pp. 43-50.  PDF, 351KB.
+
* [http://www.mt-archive.info/FreeRBMT-2009-Faridee.pdf Faridee, Abu Zaher Md., and Francis M. Tyers. 2009.]  Development of a morphological analyser for Bengali. In ''Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation'', 2-3 November 2009, Universitat d’Alacant, Alacant, Spain; ed. Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Francis M.Tyers; pp. 43-50.  PDF, 351KB.
-
* [http://www.janabhaaratii.org.in:9673/indicbhaaratii/ indicbhaaratii]: portal is meant for a collaboration effort for Indian Language Computing
+
* [http://ltrc.iiit.ac.in/ Language Technologies Research Centre (LTRC)] of the International Institute of Information Technology, Hyderabad. {{hq|Basic and applied research on various aspects of natural language technology. The focus is on developing technologies in three major areas: Language Access and Machine Translation among English and Indian languages; Speech Processing for Indian languages; Search, Information Extraction and Retrieval for English and Indian languages}} {{si|[[User:Mamandel|Mamandel]] 17:58, 3 May 2011 (UTC)}}
-
* [http://www.janabhaaratii.org.in/ Janabhaaratii]: Localisation of Free/Open Source Software for Indian languages
+
**Archive of papers and keynote lectures from ICON-2008 thru 2010 (6th thru 8th International Conference on Natural Language Processing, IIT Kharagpur)
 +
**[http://ltrc.iiit.ac.in/showfile.php?filename=release/ Tools]: Shallow Parser for Hindi; Hindi WordNet
* [http://lekho.sourceforge.net/ Lekho]: Bengali Unicode text editor, runs under UNIX and MS Windows.
* [http://lekho.sourceforge.net/ Lekho]: Bengali Unicode text editor, runs under UNIX and MS Windows.
 +
* [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2010T16 Part-of-Speech Tagset]: ''Indian Language Part-of-Speech Tagset: Bengali''. Kalika Bali, Monojit Choudhury, Priyanka Biswas. 2010. LDC2010T16. {{hq|A corpus developed by Microsoft Research (MSR) India to support the task of Part-of-Speech Tagging (POS) and other data-driven linguistic research on Indian Languages in general. It is created as a part of the [http://research.microsoft.com/en-us/groups/mls/default.aspx Indian Language Part-of-Speech Tagset (IL-POST)] project}} {{si|[[User:Mamandel|Mamandel]] 20:41, 1 March 2011 (UTC)}}
 +
 +
<!-- ==Miscellaneous== -->
<!-- ==Miscellaneous== -->
[[Category:Bengali]]
[[Category:Bengali]]

Latest revision as of 12:18, 10 May 2011

Home > Bengali

বাংলা


BENGALI


[Mamandel 17:48, 3 May 2011 (UTC)]

Contents

General

Language summary

(Information from Ethnologue, 2009-05-13)

  • ISO 639-3 code: ben
  • Spoken in: Western Bangladesh, India, Nepal, and diaspora
  • Population: 100,000,000 in Bangladesh (1994 UBS). 211,000,000 including second-language speakers (1999 WA). Population total all countries: 171,070,202.
  • Alternate names: Banga-Bhasa, Bangala, Bangla
  • Dialects: Languages or dialects in the Bengali group according to Grierson (1903-1928): Central (Standard) Bengali, Western Bengali (Kharia Thar, Mal Paharia, Saraki), Southwestern Bengali, Northern Bengali (Koch, Siripuria), Rajbanshi, Bahe, Eastern Bengali (East Central, including Sylhetti), Haijong, Southeastern Bengali (Chakma), Ganda, Vanga, Chittagonian (possible dialect of Southeastern Bengali).
  • Classification: Indo-European, Indo-Iranian, Indo-Aryan, Eastern zone, Bengali-Assamese
  • Script: Bengali

Linguistic notes

Morphology

Bengali marks plural by means of several different suffixes, the choice depending on the noun, and several cases, again by means of several different suffixes.

Writing System

Bengali is written in a Brahmi derivative, with ligatures, and with vowel markers located in various positions around the consonant.

Linguistic resources

  • Bhattacharya, Tanmoy. (2001): Bengali. In Facts About the World's Languages: An Encyclopedia of the World's Major Languages: Past and Present, ed. Jane Garry and Carl Rubino. New York / Dublin: H.W. Wilson Press. ISBN 0824209702.
  • Klaiman, M. H. (1987): Bengali. In The World's Major Languages, ed. Bernard Comrie, pp. 490-513. Oxford University Press. ISBN 978-0195065114.


Grammar

  • Milne, W. S. (1993): A Practical Bengali Grammar. Laurier Books Ltd. ISBN 8120608771. 562 pp.
  • Mojumder, Atindra (1973): Bengali Language Historical Grammar. Calcutta: Firma K. L. Mukhopadhyay.
  • Smith, W. L. (1997): Bengali Reference Grammar. Stockholm: Association of International Studies. Stockholm Oriental Textbook Series 1. ISBN 9197085472. 197 pp.

Lexicon

Topical word lists

[Mamandel 16:40, 3 May 2011 (UTC)]

Names
  • Babynology: List of Bengali baby names in Roman transliteration

Phrasebooks

Writing

  • Bagchi, Tista (1996): "Bengali Writing," in William Bright and Peter Daniels (eds.) The World's Writing Systems. New York: Oxford University Press. pp. 399-403. ISBN 0-19-507993-0.

Monographs

  • Bayer, Josef (2001): "Two grammars in one: sentential complements and complementizers in Bengali and other South-Asian languages," in Peri Bhaskararao and Karamuri Venkata Subbarao (eds.) The Yearbook of South Asian Languages and Linguistics: Tokyo Symposium on South-Asian Languages - Contact, Convergence and Typology. New Delhi: Sage Publications.
  • Butt, Miriam (2001): "Case, agreement, pronoun incorporation, and pro-drop in South Asian languages," handout for talk presented at the workshop The Role of Agreement in Argument Structure, August 31-September 1, 2001, Utrecht.
  • Dirdal, Hildegunn: "The acquisition of articles by Bengali learners of English". Handout
  • Fitzpatrick-Cole, Jennifer (1994): The Prosodic Domain Hierarchy in Reduplication. Ph.D. dissertation, Stanford.
  • Fitzpatrick-Cole, Jennifer (1996): "Reduplication meets the phonological phrase in Bengali," Linguistic Review 13.305-356.
  • Ghosh, Sanjukta and Probal Dasgupta: "The role of classifiers in quantification," Handout for talk presented at the 21st South Asian Language Analysis Roundtable, October 7-10, University of Konstanz.
  • Keane, Elinor (2001): Echo Words in Tamil. Unpublished Ph.D. dissertation, Merton College, Oxford. Contains a fairly extensive discussion of echo words in Bengali.
  • Khan, Zeeshan (1994): "Bangla Verb Classes and Alternations," in Douglas A. Jones, Robert C. Berwick, Franklin Cho, Zeeshan Khan, Karen T. Kohl, Naoyuki Nomura, Anand Radhakrishnan, Ulrich Sauerland, and Brian Ulicny, Verb Classes and Alternations in Bangla, German, English, and Korean (AI Memo # 1517). Cambridge, Massachusetts: MIT. pp. 36-50.
  • Lahiri, Aditi and Jennifer Fitzpatrick-Cole (1999): "Emphatic Clitics and Focus Intonation in Bengali," in René Kager & Wim Zonneveld (eds.) Phrasal Phonology. Pp. 119-144. Dordrecht: Foris.

Linguistic portals and bibliographies

Encoding and Fonts

Before the development and general use of Unicode, computer use of Bengali and other South Asian languages required special fonts using only one byte. Many of these fonts were specific to one website or another and used idiosyncratic encodings. To some extent that is still the case; and so this page includes some such sites (see News), and some resources for specific fonts and encoding converters. See, for example, eThikana below.

In addition, the Bureau of Indian Standards supports its own ISCII standard (below), which provides an 8-bit encoding using escape sequences to announce the language of the following coded character sequence.

Encodings

Unicode

The Unicode range for Bengali is 0980-09FF.

ISCII

See ISCII.

Sutonny

This 8-bit encoding is used by a number of news sites. See Bangla.com: Sutonny under Fonts.

ITRANS and CS/CSX

ITRANS is a transliteration package, no longer supported but still available. Provides two input encodings:

  • ITRANS: 7-bit; uses multi-character English codes to represent each Bengali letter
  • CS/CSX: 8-bit; uses one-character codes

The 8-bit encoding, at least, is still used on some sites. See ItxBeng under Fonts.

Fonts

  • Ekushey. Twelve Unicode fonts.
  • eThikana. All are 8-bit. Grouped by encoding:
    • Sutonny encoding: Oporajita (≠ Aparajita), Sulekha, MahouaMJ.
    • AdarshaBangla
    • AdarshaLipi, Moina
    • Aparajita (≠ Oporajita)
    • Basundhara
    • Boishakhi
    • Ekush, Falgun
    • Progoty
    • SonarGaon
  • ItxBeng is part of the ITRANS transliteration package. The itxbeng.ttf TrueType font uses an 8-bit encoding. It is also available from RabindraSangeet.org.
  • Omicron Lab. Unicode fonts for Bengali
  • Penn State University. (2009.) Browser and Font Recommendations for Bengali. Also information on Setup for Keyboarding.
  • Rezaul: The site is a portal of sorts; this directory has links to fonts and encoding documentation.
  • South Asia Language Resource Center of the University of Chicago. Links to Bengali fonts (most of them available for free download), input schemes and keyboard layouts, and information about Mac vs. PC vs. Linux rendering issues.

Also:

  • Microsoft. Creating and supporting OpenType fonts for the Bengali script: Microsoft doc on Unicode 3.1 for Bengali. Registered features of the Bengali script are defined and illustrated, encodings are listed, and templates are included for compiling Bengali layout tables for OpenType fonts. This document also presents information about the Bengali OpenType shaping engine of Uniscribe, an operating system component responsible for text layout.

Conversion

  • Unicodify: From Lancaster University, producers of Emille corpus. Includes conversion for AdarshaLipi, AdarshaLipiExp, and AdarshaLipiNormal2 fonts among others. Runs on Windows (source code available).

Transliteration

Data Sources

Monolingual Text

News and portals

* see Bangla.com: Sutonny under Fonts

Blogs

  • Banglablogs: Blog portal. Some of the blogs are in Bengali, some in English.
  • Banglapundit: This blog is open to invited readers only ...you might want to contact the blog author and request an invitation.

Parallel Text

Speech

  • EMILLE corpus. Free license for non-profit research use. 442,000 words from radio broadcasts, plus small amounts of demographically-sampled speech
  • BBC.
  • VOA [110422 mam]

Portals

Tools and Other NLP Resources

  • Bangla in GNU/Linux HOWTO: A document on developing Bengali resources for GNU/Linux, and also about setting locales etc. for Bengali.
  • Bhattacharya, Samit, & Choudhury, Monojit, & Sarkar, Sudeshna, & Basu, Anupam (2005): Inflectional morphology synthesis for Bengali noun, pronoun, and verb systems. National Conference on Computer Processing of Bangla. Independent University, Bangladesh.
  • Dasgupta, Sajib and Vincent Ng (2007). Unsupervised morphological parsing of Bengali. Language Resources and Evaluation 40:3-4, pp. 311-330. Springer. DOI 10.1007/s10579-007-9031-y
  • Faridee, Abu Zaher Md., and Francis M. Tyers. 2009. Development of a morphological analyser for Bengali. In Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, 2-3 November 2009, Universitat d’Alacant, Alacant, Spain; ed. Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Francis M.Tyers; pp. 43-50. PDF, 351KB.
  • Language Technologies Research Centre (LTRC) of the International Institute of Information Technology, Hyderabad. Basic and applied research on various aspects of natural language technology. The focus is on developing technologies in three major areas: Language Access and Machine Translation among English and Indian languages; Speech Processing for Indian languages; Search, Information Extraction and Retrieval for English and Indian languages [Mamandel 17:58, 3 May 2011 (UTC)]
    • Archive of papers and keynote lectures from ICON-2008 thru 2010 (6th thru 8th International Conference on Natural Language Processing, IIT Kharagpur)
    • Tools: Shallow Parser for Hindi; Hindi WordNet
  • Lekho: Bengali Unicode text editor, runs under UNIX and MS Windows.
  • Part-of-Speech Tagset: Indian Language Part-of-Speech Tagset: Bengali. Kalika Bali, Monojit Choudhury, Priyanka Biswas. 2010. LDC2010T16. A corpus developed by Microsoft Research (MSR) India to support the task of Part-of-Speech Tagging (POS) and other data-driven linguistic research on Indian Languages in general. It is created as a part of the Indian Language Part-of-Speech Tagset (IL-POST) project [Mamandel 20:41, 1 March 2011 (UTC)]
Personal tools