Breton/Breton
From the LDC Language Resource Wiki
Contents |
NOTES FOR EDITING
- DELETE Lwiki: PREFIX FROM ALL WIKILINKS BEFORE PUBLISHING ON LRWIKI
- Remember to check Resource pages
- This icon indicates a general resource that may or may not contain useful material for this language in this category. Wherever you see this icon, check the resource listed in that entry and replace the entry with whatever is appropriate, or remove it entirely. The icon should never appear on a published page.
- Delete all notes in red type, including this section of "NOTES FOR EDITING".
General
Language summary
(Information from Ethnologue, 2010-04-7
- ISO 639-3 code: bre
- Population:
- Also spoken in:
- Alternate names:
- Dialects:
- Classification: ...
Linguistic notes
Writing
- Omniglot. Search www.omniglot.com, not Web.
Linguistic resources
Overview
Grammar
Lexicon
[Dictionaries, word and phrase lists, and translation tools]
- Wiktionary; click on the "Wiki" link in the table. Monolingual. Generally (always?) Unicode.
- Generally at http://<lg>.wiktionary.org. (See Wikimedia wikis for language codes.)
Topical word lists
Names
- Babynology: List of Breton baby names in Roman transliteration
Monographs
Linguistic portals and bibliographies
Encoding and Fonts
Before the development and general use of Unicode, computer use of Breton and other South Asian languages required special fonts using only one byte. Many of these fonts were specific to one website or another and used idiosyncratic encodings. To some extent that is still the case; and so this page includes some such sites (see News), and some resources for specific fonts and encoding converters.
Encodings
Unicode
The Unicode range for Breton is ____-____.
- Penn State info page; Penn State chart of Unicode Entity Codes for ____ (including OS X and Windows keyboard entry)
- Exnet, Andy White. "This site hosts documents relating to the encoding of Indic scripts. Most documents contain a bias towards the Bengali script (due to my own preferances)." (Last updated 10th March 2003)
ISCII
See Lwiki:ISCII. FOR SOUTH ASIAN LANGUAGES ONLY
Fonts
Lists of Unicode fonts
- Unicode and Multilingual Support in HTML, Fonts, Web Browsers and Other Applications. In Alan Wood’s Unicode Resources.
- http://www.alanwood.net/unicode/fonts_windows#Breton.html
- ?? Search for Breton.??
- Wazu Japan's Gallery of Unicode fonts, and ??test page??
- The South Asia Language Resource Center of the University of Chicago has links to
- Breton fonts, most of them available for free download
- Input Schemes and Keyboard Layouts
- information about Mac vs. PC vs. Linux rendering issues
Conversion
- Unicodify: From Lancaster University, producers of the Emille corpus. For Windows; source code available.
- MOSTLY FOR SOUTH ASIAN LANGUAGES-- CHECK BEFORE USING "a suite of programs for converting text in a variety of 8-bit encodings to Unicode (using the UTF-16 encoding).
Unicodify was particularly designed to handle HTML-based text using non-ISCII 8-bit fonts to render South Asian scripts. However, elements of the suite can map other types of non-ASCII 8-bit encodings, such as Latin-2, ISCII and PASCII."
- MOSTLY FOR SOUTH ASIAN LANGUAGES-- CHECK BEFORE USING "a suite of programs for converting text in a variety of 8-bit encodings to Unicode (using the UTF-16 encoding).
Transliteration
Data Sources
Monolingual Text
- EMILLE ONLY FOR: Bengali, Panjabi, Tamil, and Urdu EMILLE corpus. Approximately NUMBERS HERE words. Free license for non-profit research use. Documentation
News
- Newspaper portals:
- India Press: many South Asian languages
- newspaper...
Blogs
Parallel Text
- EMILLE corpus. ONLY FOR: Bengali, Panjabi, Tamil, and Urdu 200,000 words of text in English (information leaflets from the UK Government and various local authorities) with Breton translation. Free license for non-profit research use.
- MultiKulti Langs listed: Albanian, Arabic, Bengali, Chinese, English, Farsi, French, Gujarati, Somali, Spanish, Portuguese, Turkish, Urdu.
PROB. SAME AS EMILLE BUT NOT ALL UNICODE. DON'T USE FOR EMILLE LANGUAGES (Bengali, Panjabi, Tamil, and Urdu). : 200k words from UK government leaflets (not news). Free for research, see license. However, some of the files are in PDF and present encoding problems when the text is copied.- In general, a document with /__/ in its pathname will have an English counterpart with /en/.
- pamphlets (PDF), e.g. http://www.multikulti.org.uk/__/education/welcome-to-your-library/public-libraries.pdf
- The Breton directory lists directories that contain Breton pages, though not all of the pages are in Breton.
- The Breton racial discrimination directory contains about a dozen pp. in Breton.
Speech
Video
IPR notes
Portals
- OneIndia. Hindi, Kannada, Malayalam, Tamil, Telugu, each at http://thats<language>.oneindia.in/
- SOUTH ASIAN LANGUAGES Yahoo! India. Mostly http://in.Breton.yahoo.com/ (with Breton all lowercase):