Sandbox

From the LDC Language Resource Wiki

(Difference between revisions)
Jump to: navigation, search
m (NLP Literature)
m (NLP Literature)
Line 32: Line 32:
*''[http://www.rodopi.nl/senj.asp?BookId=LC+71 Corpus-linguistic applications]'', ed. Stefan Th. Gries, Stefanie Wulff, and Mark Davies. 2010. Electronic: ISBN 9789042028012; hardback: ISBN 9789042028005.
*''[http://www.rodopi.nl/senj.asp?BookId=LC+71 Corpus-linguistic applications]'', ed. Stefan Th. Gries, Stefanie Wulff, and Mark Davies. 2010. Electronic: ISBN 9789042028012; hardback: ISBN 9789042028005.
-
**Probabilistic tagging of minority language data: a case study using Qtag. Christopher Cox.  Reviewed in [http://linguistlist.org/issues/21/21-3318.html LINGUIST List 21.3318] by Andrew Caines (2010-08-17):<br>"Cox's theme is corpus planning. He considers the tagging process, and evaluates the time-accuracy trade-off in using (a) normalized/unnormalized orthography; (b) various chunk sizes for rounds of iterative, interactive tagging; (c) tagset size. He does so in the context of corpus building for minority languages which are on the whole associated with more modest resources than major language projects. ... a well-written paper with well-defined research questions and conclusions which are explicitly linked back to them -- an attribute which cannot be taken for granted in academic literature."
+
**Probabilistic tagging of minority language data: a case study using Qtag. Christopher Cox.  Reviewed in [http://linguistlist.org/issues/21/21-3318.html LINGUIST List 21.3318] by Andrew Caines (2010-08-17): "Cox ... considers the tagging process, and evaluates the time-accuracy trade-off in using (a) normalized/unnormalized orthography; (b) various chunk sizes for rounds of iterative, interactive tagging; (c) tagset size. He does so in the context of corpus building for minority languages which are on the whole associated with more modest resources than major language projects. ... a well-written paper with well-defined research questions and conclusions which are explicitly linked back to them -- an attribute which cannot be taken for granted in academic literature."
* [http://hypermedia.ids-mannheim.de/pls/lexpublic/bib_en.ansicht OBELEX: Online Bibliography of Electronic Lexicography]. All relevant articles, monographs, anthologies and reviews since 2000 and some older relevant works. Focus is on online lexicography. Dictionaries not included, but included in a supplementary database now under construction. Search by full text, keyword, person, analysed languages, or publication year. {{si|[[User:Mamandel|Mamandel]] 22:26, 28 April 2010 (UTC)}}
* [http://hypermedia.ids-mannheim.de/pls/lexpublic/bib_en.ansicht OBELEX: Online Bibliography of Electronic Lexicography]. All relevant articles, monographs, anthologies and reviews since 2000 and some older relevant works. Focus is on online lexicography. Dictionaries not included, but included in a supplementary database now under construction. Search by full text, keyword, person, analysed languages, or publication year. {{si|[[User:Mamandel|Mamandel]] 22:26, 28 April 2010 (UTC)}}

Revision as of 21:19, 10 May 2011

The Sandbox is a place to play. Use this page for practicing wiki editing, making links, anything! Don't expect anything you put here to last.

  • Learn how to manipulate the Wiki.
    • What Can I Do?
      • I can make things bold ('''bold''').
      • I can italicize (''italicize'').
      • I can timestamp and sign: Mamandel 14:52, 22 April 2010 (UTC) (four tildes: ~~~~)
        • or just timestamp: 14:52, 22 April 2010 (UTC) (five tildes: ~~~~~)
        • or just sign: Mamandel (three tildes: ~~~)
      • I can make an external link ([http://ldc.upenn.edu external link] -- space between URL and text).
      • I can make an internal link ([[Bengali/Bengali|internal link]] -- pipe character '|' between page title and text).
  I can make text preformatted and in a box (note, no auto-wrapping). (White space at beginning of line).

Some magic words and what they produce:

For much, much more info see Mediawiki's editing help.

FEEL FREE TO DELETE ANYTHING BELOW THE DOUBLE LINE,
BUT DON'T TOUCH THE DOUBLE LINE OR ANYTHING ABOVE IT. THANKS.
-- The Mgt.

WELCOME TO THE SANDBOX



THIS PAGE IS

UNDER CONSTRUCTION


[Mamandel 20:19, 10 May 2011 (UTC)]

This page is for language-independent resources for computational natural language processing.
Language-independent General Meta-resources that are not specific to NLP have their own page.

Software

  • An Crúbadán: Corpus building for minority languages. Web crawling software designed to exploit the vast quantities of text freely available on the web as a way of bringing the benefits of statistical NLP to languages with small numbers of speakers and/or limited computational resources. Kevin P. Scannell. [Mamandel 00:25, 14 May 2010 (UTC)]
  • Apertium. A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages.
  • Foma: a finite-state compiler and library. Hulden, Mans. 2009. Proceedings of the EACL 2009 Demonstrations Session, pages 29–32, Athens, Greece, 3 April 2009. PDF
  • Universal Networking Language (UNL). an artificial language for representing, describing, summarizing, refining, storing and disseminating information in a natural-language-independent format. It is a kind of mark-up language which represents not the formatting but the core information of a text. As HTML annotations can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages.
  • VISL Constraint Grammar. A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism.


NLP Literature

  • Machine Translation Archive. Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. >6400 items. Aims to be comprehensive on English-language publications since 1990; adding earlier papers and books to provide partial coverage from the 1950s.
  • Corpus-linguistic applications, ed. Stefan Th. Gries, Stefanie Wulff, and Mark Davies. 2010. Electronic: ISBN 9789042028012; hardback: ISBN 9789042028005.
    • Probabilistic tagging of minority language data: a case study using Qtag. Christopher Cox. Reviewed in LINGUIST List 21.3318 by Andrew Caines (2010-08-17): "Cox ... considers the tagging process, and evaluates the time-accuracy trade-off in using (a) normalized/unnormalized orthography; (b) various chunk sizes for rounds of iterative, interactive tagging; (c) tagset size. He does so in the context of corpus building for minority languages which are on the whole associated with more modest resources than major language projects. ... a well-written paper with well-defined research questions and conclusions which are explicitly linked back to them -- an attribute which cannot be taken for granted in academic literature."
Personal tools