Breton/Breton

From the LDC Language Resource Wiki

(Difference between revisions)
Jump to: navigation, search
m
m
 
(50 intermediate revisions not shown)
Line 1: Line 1:
-
[[Main_Page|Home]] > [[LANGUAGE]]
+
{{Under construction}}
-
 
+
[[Main_Page|Home]] > [[Breton]]
<center><font size=7>BREZHONEG</font>
<center><font size=7>BREZHONEG</font>
Line 6: Line 6:
<font size=7>BRETON</font></center>
<font size=7>BRETON</font></center>
-
[[Category:Lwiki:Use|Breton]]
 
-
== <font color=red>NOTES FOR EDITING ==
+
 
-
*DELETE <u>Lwiki:</u> PREFIX FROM ALL WIKILINKS BEFORE PUBLISHING ON LRWIKI
+
-
*Remember to check [[:Category:Lwiki:Resources|Resource pages]]
+
-
*This icon [[Image:RedRx.gif]] indicates a ''general resource'' that may or may not contain useful material for this language in this category. Wherever you see this icon, check the resource listed in that entry and ''replace'' the entry with whatever is appropriate, or remove it entirely. The icon should never appear on a published page.
+
-
*Delete all notes in red type, including this section of "NOTES FOR EDITING". </font>
+
==General==
==General==
 +
<small>[[User:Ftyers|Ftyers]] 15:39, 22 April 2010 (UTC)</small>
===Language summary===
===Language summary===
-
[[Image:redRx.gif]] (Information from [http://www.ethnologue.com/show_language.asp?code=bre Ethnologue],
+
 
-
<!-- THIS AWKWARD USE OF NOWIKI SHOWS UP IN THE EDIT OF THE CREATED LANGUAGE PAGE, BUT NOT ON THE VISIBLE PAGE, AND IT DOES PREVENT PREMATURE SUBSTITUTION -->
+
 
-
<nowiki>2010</nowiki>-<nowiki>04</nowiki>-<nowiki>7</nowiki>
+
* ISO 639-3 code: bre
* ISO 639-3 code: bre
*Population:  
*Population:  
-
*Also spoken in:  
+
** 500,000 in France (1989 International Committee for the Defense of the Breton Language).
-
*Alternate names:  
+
** 1,200,000 know Breton who do not regularly use it.
-
*Dialects:  
+
** Population total all countries: 500,045.
-
*Classification: [http://www.ethnologue.com/show_lang_family.asp?code=bre ...]
+
*Also spoken in: -
 +
*Alternate names: -
 +
*Dialects: Leoneg (Leonais), Tregerieg (Tregorrois), Gwenedeg (Vannetais), Kerneveg (Cornouaillais).
 +
*Classification: [http://www.ethnologue.com/show_lang_family.asp?code=bre Indo-European, Celtic, Insular, Brythonic]
===Linguistic notes===
===Linguistic notes===
====Writing====
====Writing====
-
* [[Image:redRx.gif]] [http://www.omniglot.com/search.htm Omniglot]. Search '''www.omniglot.com''', not Web.
+
* [http://www.omniglot.com/writing/breton.htm Omniglot: Breton]
==Linguistic resources==
==Linguistic resources==
Line 39: Line 37:
===Grammar===
===Grammar===
 +
 +
* Ian Press (1986) ''A Grammar of Modern Breton'' (Mouton Grammar Library) ISBN 978-3-110105-79-7
 +
* Roparz Hemon (translated by Michael Everson) (2007) ''Breton Grammar''  (Cathair na Mart: Evertype) ISBN 978-1-904808-11-4
===Lexicon===
===Lexicon===
 +
* [http://br.wiktionary.org/ Wiktionary]. Monolingual. 12151 entries. {{CC-BY-SA}},{{GFDL}} {{si|[[User:Mamandel|Mamandel]] 19:35, 3 May 2010 (UTC)}}
 +
====Morphological====
-
<font color=red mam>''[Dictionaries, word and phrase lists, and translation tools]''</font mam>
+
====Bilingual====
-
* [[Image:redRx.gif]] [http://meta.wikimedia.org/wiki/Wiktionary#List_of_Wiktionaries Wiktionary]; click on the "Wiki" link in the table. Monolingual. Generally (always?) Unicode.
+
* [http://meskach.free.fr/arbo/dico/tomaz.html Le Geriadur Tomaz (Breton--French)] (~33,000 entries) {{GPL}}
-
** Generally at <nowiki>http://<lg>.wiktionary.org</nowiki>. (See [http://meta.wikimedia.org/wiki/Special:SiteMatrix Wikimedia wikis] for language codes.)
+
 
 +
====Multilingual====
 +
 
 +
* [http://br.wiktionary.org/wiki/Rummad:Brezhoneg Breton Wiktionary :: Rummad:Brezhoneg] {{GFDL}}
 +
* [http://fr.wiktionary.org/wiki/Catégorie:breton French Wiktionary :: Catégorie:breton ] {{GFDL}}
 +
* [http://en.wiktionary.org/wiki/Category:Breton_language English Wiktionary :: Category:Breton] {{GFDL}}
====Topical word lists====
====Topical word lists====
=====Names=====
=====Names=====
-
*  [[Image:redRx.gif]] [http://www.babynology.com/Breton_babynames.html Babynology]: List of Breton baby names in Roman transliteration
 
===Monographs===
===Monographs===
Line 56: Line 63:
===Linguistic portals and bibliographies===
===Linguistic portals and bibliographies===
-
*  [[Image:redRx.gif]] LINGUIST List resource pages
+
==Data Sources==
-
** (GET LINGUIST PAGE FOR Breton
+
-
*  SIL Bibliography
+
-
** [http://www.ethnologue.com/show_language.asp?code=bre Breton]
+
-
==Encoding and Fonts==
+
===Monolingual Text===
-
<font color=red mam>Before the development and general use of Unicode, computer use of Breton and other South Asian languages required special fonts using only one byte. Many of these fonts were specific to one website or another and used idiosyncratic encodings. To some extent that is still the case; and so this page includes some such sites (see [[#News|News]]), and some resources for specific fonts and encoding converters.</font mam>
+
-
===Encodings===
+
* [http://br.wikipedia.org/ Wikipedia]. Monolingual. 33,174 entries. {{CC-BY-SA}},{{GFDL}} {{si|[[User:Mamandel|Mamandel]] 19:39, 3 May 2010 (UTC)}}
-
====Unicode====
+
-
[[Image:redRx.gif]] The Unicode range for Breton is [http://www.unicode.org/charts/PDF/U____.pdf ____-____].
+
-
*[[Image:redRx.gif]]  [http://tlt.its.psu.edu/suggestions/international/bylanguage/Breton.html Penn State info page]; [http://tlt.psu.edu/suggestions/international/bylanguage/Bretonchart.html Penn State chart of Unicode Entity Codes for ____] (including OS X and Windows keyboard entry)
+
====News====
-
*[[Image:redRx.gif]]  <font color=red mam>[http://www.exnet.btinternet.co.uk/ Exnet], Andy White. "This site hosts documents relating to the encoding of Indic scripts. Most documents contain a bias towards the Bengali script (due to my own preferances)." (Last updated 10th March 2003)</font mam>
+
-
====ISCII====
+
* [http://www.agencebretagnepresse.com/index.php?langue=bzh Agence Bretagne Presse]
-
See [[Lwiki:ISCII]]. <font color=red>FOR SOUTH ASIAN LANGUAGES ONLY</font>
+
* [http://bremaik.free.fr/ Bremaik] (weekly news articles) {{GPL}}
-
===Fonts===
+
====Blogs====
-
<font color=red mam>
+
-
'''Lists of Unicode fonts'''
+
-
*[[Image:redRx.gif]]  [http://www.alanwood.net/unicode Unicode and Multilingual Support in HTML, Fonts, Web Browsers and Other Applications]. In Alan Wood’s Unicode Resources.
+
-
** <nowiki>http://www.alanwood.net/unicode/fonts_windows#Breton.html</nowiki>
+
-
**  ?? Search for ''Breton''.??
+
-
*[[Image:redRx.gif]]  [http://www.wazu.jp/ Wazu Japan's Gallery of Unicode fonts], and ??[http://www.wazu.jp/gallery/Test_Breton.html test page]??
+
-
** [http://www.wazu.jp/#test_pages Unicode test pages]
+
 +
===Parallel Text===
-
*[[Image:redRx.gif]]  The South Asia Language Resource Center of the University of Chicago has links to
+
* [http://elx.dlsi.ua.es/~fran/brfr_OAB_corpus Ofis ar Brezhoneg Aligned Corpus of Breton--French] (30,993 aligned sentences, [[NLP Resources#TMX|TMX]] and plain text format) {{GPL}}
-
** [http://salrc.uchicago.edu/resources/fonts/available/___/#fonts Breton fonts], most of them available for free download
+
-
** [http://salrc.uchicago.edu/resources/fonts/available/___/#layouts Input Schemes and Keyboard Layouts]
+
-
** [http://salrc.uchicago.edu/resources/fonts/available/___/#mac information about Mac vs. PC vs. Linux rendering issues]
+
-
</font mam>
+
 +
===Speech===
-
====Conversion====
+
===Video===
-
*[[Image:redRx.gif]]  [http://www.lancs.ac.uk/staff/hardiea/unicodify.htm  Unicodify]: From Lancaster University, producers of the Emille corpus. For Windows; source code available.
+
-
**<font color=red mam>MOSTLY FOR SOUTH ASIAN LANGUAGES-- CHECK BEFORE USING</font> "a suite of programs for converting text in a variety of 8-bit encodings to Unicode (using the UTF-16 encoding).<br> Unicodify was particularly designed to handle HTML-based text using non-ISCII 8-bit fonts to render South Asian scripts. However, elements of the suite can map other types of non-ASCII 8-bit encodings, such as Latin-2, ISCII and PASCII."
+
-
====Transliteration====
 
-
==Data Sources==
+
===IPR notes===
-
===Monolingual Text===
+
==Portals==
-
*[[Image:redRx.gif]]  <font color=red mam>'''''EMILLE ONLY FOR: Bengali, Panjabi, Tamil, and Urdu'''''</font mam> [http://www.ling.lancs.ac.uk/corplang/emille/ EMILLE] corpus. Approximately [[Lwiki:EMILLE corpus|'''NUMBERS HERE''']] words. Free license for non-profit research use. [http://www.emille.lancs.ac.uk/manual.pdf Documentation]
+
-
====News====
+
==Tools and Other NLP Resources==
-
* Newspaper portals:
+
-
**[[Image:redRx.gif]]  <font color=red>[http://www.indiapress.org/ India Press]: many South Asian languages</font>
+
-
* newspaper...
+
-
====Blogs====
+
===Morphological analysis===
-
===Parallel Text===
+
===Morphological disambiguation===
-
*[[Image:redRx.gif]]  [http://www.ling.lancs.ac.uk/corplang/emille/ EMILLE] corpus. <font color=red mam>'''ONLY FOR: Bengali, Panjabi, Tamil, and Urdu'''</font mam> 200,000 words of text in English (information leaflets from the UK Government and various local authorities) with Breton translation. Free license for non-profit research use.
+
-
*[[Image:redRx.gif]]  <font color=red mam> [http://www.multikulti.org.uk/__/ MultiKulti]  '''Langs listed: Albanian, Arabic, Bengali, Chinese, English, Farsi, French, Gujarati, Somali, Spanish, Portuguese, Turkish, Urdu.<br>PROB. SAME AS EMILLE BUT NOT ALL UNICODE. DON'T USE FOR EMILLE LANGUAGES (Bengali, Panjabi, Tamil, and Urdu).''' : 200k words from UK government leaflets (not news). Free for research, see license. However, some of the files are in PDF and present encoding problems when the text is copied.
+
-
** In general, a document with '''/__/''' in its pathname will have an English counterpart with '''/en/'''.
+
-
**pamphlets (PDF), e.g. http://www.multikulti.org.uk/__/education/welcome-to-your-library/public-libraries.pdf
+
-
** The [http://www.multikulti.org.uk/__/index.html Breton directory] lists directories that contain Breton pages, though not all of the pages are in Breton.
+
-
** The [http://www.multikulti.org.uk/__/racism-discrimination/index.html Breton racial discrimination directory] contains about a dozen pp. in Breton.</font mam>
+
-
===Speech===
+
===Machine translation===
 +
* [http://sourceforge.net/projects/apertium/files/ Apertium :: apertium-br-fr] {{GPL}}
-
===Video===
+
===Articles===
 +
* Tyers, F. M. (2010) "[http://www.mt-archive.info/EAMT-2010-Tyers.pdf Rule-based Breton to French machine translation"]. ''Proceedings of the 14th Annual Conference of the European Association of Machine Translation, EAMT10'' pp. 174&mdash;181.
 +
* Tyers, F. M. (2009) "[http://www.mt-archive.info/EAMT-2009-Tyers-2.pdf Rule-based augmentation of training data in Breton–French statistical machine translation]". ''Proceedings of the 13th Annual Conference of the European Association of Machine Translation, EAMT09''. pp. 213&mdash;218
-
===IPR notes===
+
==Miscellaneous==
-
==Portals==
 
-
*[[Image:redRx.gif]]  [http://www.oneindia.in/ OneIndia]. Hindi, Kannada, Malayalam, Tamil, Telugu, each at '''<nowiki>http://thats<language>.oneindia.in/</nowiki>'''
 
-
*[[Image:redRx.gif]]  <font color=red>SOUTH ASIAN LANGUAGES</font> [http://in.yahoo.com/ Yahoo! India]. Mostly http://in.Breton.yahoo.com/  (with Breton all lowercase):
 
-
** [http://in.jagran.yahoo.com Hindi ("jagran")]
 
-
** [http://in.tamil.yahoo.com Tamil]
 
-
** [http://in.gujarati.yahoo.com Gujarati]
 
-
** [http://in.kannada.yahoo.com Kannada]
 
-
** [http://in.malayalam.yahoo.com Malayalam]
 
-
** [http://in.telugu.yahoo.com Telugu]
 
-
** [http://in.punjabi.yahoo.com Punjabi]
 
-
==Tools and Other NLP Resources==
 
-
==Miscellaneous==
+
 
 +
[[Category:Breton|Breton]]

Latest revision as of 19:00, 3 May 2011

THIS PAGE IS

UNDER CONSTRUCTION


Home > Breton

BREZHONEG


BRETON




Contents

General

Ftyers 15:39, 22 April 2010 (UTC)

Language summary

  • ISO 639-3 code: bre
  • Population:
    • 500,000 in France (1989 International Committee for the Defense of the Breton Language).
    • 1,200,000 know Breton who do not regularly use it.
    • Population total all countries: 500,045.
  • Also spoken in: -
  • Alternate names: -
  • Dialects: Leoneg (Leonais), Tregerieg (Tregorrois), Gwenedeg (Vannetais), Kerneveg (Cornouaillais).
  • Classification: Indo-European, Celtic, Insular, Brythonic

Linguistic notes

Writing

Linguistic resources

Overview

Grammar

Lexicon

Morphological

Bilingual

Multilingual

Topical word lists

Names

Monographs

Linguistic portals and bibliographies

Data Sources

Monolingual Text

News

Blogs

Parallel Text

Speech

Video

IPR notes

Portals

Tools and Other NLP Resources

Morphological analysis

Morphological disambiguation

Machine translation

Articles

Miscellaneous

Personal tools