Main Page/description of wiki

From the LDC Language Resource Wiki

Revision as of 19:55, 31 October 2013 by Mamandel (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This description of the wiki was set up as a recruiting letter for editors, last modified Sept. 23, 2012. It was created as a Word document with hyperlinks, which are mostly to lost pages.


Contents

The LDC's Language Resource Wiki

Overview

The Linguistic Data Consortium of the University of Pennsylvania has established a Language Resource Wiki, a structured list of language resources for the research community, concentrating on languages that are underserved in terms of research resources, including NLP. We are seeding it from our own lists of found resources, but we hope that other researchers will enlarge it, both in breadth of languages covered and in depth of detail. The wiki also has pages for special areas of interest, such as language-independent NLP.

The wiki is intended as a listing of longterm resources (a few of which are available through our catalog), rather than a calendar of events.

Unlike Wikipedia and many similar wikis, the LRW is publicly readable, but writeable only by approved editors. We are actively seeking knowledgeable editors about the languages and topics we cover, as well as other languages or other topics that could usefully be added. If you are interested in becoming an editor, please write to lrwiki@ldc.upenn.edu, with "lrwiki" in the subject line.

To date we have entries for the following languages. Each oral language has a page and a category; sign languages, for which we have far fewer resources, are currently listed on a single page, but will be moved to individual pages as listings increase. The symbol ed means we have an editor.

Oral Languages

  • Ancient Greek ed
  • Bengali
  • Berber
  • Breton ed
  • Ewe
  • Indonesian ed
  • Latin ed
  • Panjabi
  • Pashto
  • Tagalog
  • Tamil
  • Urdu

Sign Languages

  • American Sign Language (ASL) ed
  • British Sign Language (BSL)
  • Catalan Sign Language (LSC)
  • Dutch Sign Language (NGT)
  • Flemish Sign Language (VGT)
  • German Sign Language (DGS)
  • Japanese Sign Language (JSL)
  • New Zealand Sign Language (NZSL)
  • Polish Sign Language (PJM) ed
  • Spanish Sign Language (LSE) ed
  • Swiss German Sign Language (DSGS)


Language-independent resources

  • General Meta-resources (Resource organizations, Multilingual resources, and Metadata standards and infrastructure)
  • NLP Resources (Software, and NLP literature)
  • Resources specific to the signed medium (General, Areal, and NLP resources)


Resource types

The resources we list fall into the following general categories, with some variation between languages and some overlap (resources listed in more than one category):

  1. General: Basic information, usually based on the Ethnologue catalog; the writing system, dialects, and other linguistic notes of interest.
  2. Linguistic resources: Overviews, grammars, lexicons, monographs, and linguistic portals and bibliographies.
  3. Encoding and Fonts: The Unicode range for the language's writing system(s), and fonts that support it. In some cases, also non-Unicode encodings still in use; see, e.g., Tamil Encoding and Fonts and Input in older encodings.
  4. Data sources: Usually the largest section; includes monolingual text, parallel text, speech, and video.
  5. Portals to materials in the language, targeted at its speakers or at second-language learners or teachers. These can overlap with data sources, especially the monolingual text section.
  6. Tools and other NLP Resources specific to this language or group of languages.
  7. Miscellaneous.
Personal tools