Main Page/description of wiki

From the LDC Language Resource Wiki

(Difference between revisions)
Jump to: navigation, search
m (New page: ''This description of the wiki was set up as a recruiting letter for editors, last modified Sept. 23, 2012. It was created as a Word document with hyperlinks, which are mostly to lost page...)
Line 53: Line 53:
# Tools and other NLP Resources specific to this language or group of languages.
# Tools and other NLP Resources specific to this language or group of languages.
# Miscellaneous.
# Miscellaneous.
-
 
-
 
-
===About the author===
 
-
Mark Mandel received his PhD in Linguistics from the University of California at Berkeley in 1981; his dissertation was titled Phonotactics and Morphophonology in American Sign Language. From 1990 through 2001 he was Senior Linguist at Dragon Systems, and briefly at Lernout & Hauspie. Since 2002 he has been a Research Administrator at the Linguistic Data Consortium of the University of Pennsylvania.
 

Revision as of 19:54, 31 October 2013

This description of the wiki was set up as a recruiting letter for editors, last modified Sept. 23, 2012. It was created as a Word document with hyperlinks, which are mostly to lost pages.


The LDC's Language Resource Wiki

Overview

The Linguistic Data Consortium of the University of Pennsylvania has established a Language Resource Wiki, a structured list of language resources for the research community, concentrating on languages that are underserved in terms of research resources, including NLP. We are seeding it from our own lists of found resources, but we hope that other researchers will enlarge it, both in breadth of languages covered and in depth of detail. The wiki also has pages for special areas of interest, such as language-independent NLP.

The wiki is intended as a listing of longterm resources (a few of which are available through our catalog), rather than a calendar of events.

Unlike Wikipedia and many similar wikis, the LRW is publicly readable, but writeable only by approved editors. We are actively seeking knowledgeable editors about the languages and topics we cover, as well as other languages or other topics that could usefully be added. If you are interested in becoming an editor, please write to lrwiki@ldc.upenn.edu, with "lrwiki" in the subject line.

To date we have entries for the following languages. Each oral language has a page and a category; sign languages, for which we have far fewer resources, are currently listed on a single page, but will be moved to individual pages as listings increase. The symbol ed means we have an editor.

Oral Languages

  • Ancient Greek ed
  • Bengali
  • Berber
  • Breton ed
  • Ewe
  • Indonesian ed
  • Latin ed
  • Panjabi
  • Pashto
  • Tagalog
  • Tamil
  • Urdu

Sign Languages

  • American Sign Language (ASL) ed
  • British Sign Language (BSL)
  • Catalan Sign Language (LSC)
  • Dutch Sign Language (NGT)
  • Flemish Sign Language (VGT)
  • German Sign Language (DGS)
  • Japanese Sign Language (JSL)
  • New Zealand Sign Language (NZSL)
  • Polish Sign Language (PJM) ed
  • Spanish Sign Language (LSE) ed
  • Swiss German Sign Language (DSGS)


Language-independent resources

  • General Meta-resources (Resource organizations, Multilingual resources, and Metadata standards and infrastructure)
  • NLP Resources (Software, and NLP literature)
  • Resources specific to the signed medium (General, Areal, and NLP resources)

Resource types The resources we list fall into the following general categories, with some variation between languages and some overlap (resources listed in more than one category):

  1. General: Basic information, usually based on the Ethnologue catalog; the writing system, dialects, and other linguistic notes of interest.
  2. Linguistic resources: Overviews, grammars, lexicons, monographs, and linguistic portals and bibliographies.
  3. Encoding and Fonts: The Unicode range for the language's writing system(s), and fonts that support it. In some cases, also non-Unicode encodings still in use; see, e.g., Tamil Encoding and Fonts and Input in older encodings.
  4. Data sources: Usually the largest section; includes monolingual text, parallel text, speech, and video.
  5. Portals to materials in the language, targeted at its speakers or at second-language learners or teachers. These can overlap with data sources, especially the monolingual text section.
  6. Tools and other NLP Resources specific to this language or group of languages.
  7. Miscellaneous.
Personal tools