NLP Resources

(Difference between revisions)

Revision as of 14:05, 22 May 2011

THIS PAGE IS

UNDER CONSTRUCTION

[Mamandel 22:40, 10 May 2011 (UTC)]

An Crúbadán: Corpus building for minority languages. Web crawling software “designed to exploit the vast quantities of text freely available on the web as a way of bringing the benefits of statistical NLP to languages with small numbers of speakers and/or limited computational resources.” Kevin P. Scannell. [Mamandel 00:25, 14 May 2010 (UTC)]

Apertium. A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages.

Foma. “a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers.”
- Foma: a finite-state compiler and library. Hulden, Mans. 2009. Proceedings of the EACL 2009 Demonstrations Session, pages 29–32, Athens, Greece, 3 April 2009. PDF

Helsinki Finite-State Transducer Technology (HFST). A free/open-source rewrite of the Xerox finite-state tools. It provides an implementation both of the lexc and twolc formalisms.

Universal Networking Language (UNL). “an artificial language for representing, describing, summarizing, refining, storing and disseminating information in a natural-language-independent format. It is a kind of mark-up language which represents not the formatting but the core information of a text. As HTML annotations can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages.”

VISL Constraint Grammar. A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism.

ISLE Meta Data Initiative (IMDI): “a proposed metadata standard to describe multi-media and multi-modal language resources. The standard provides interoperability for browsable and searchable corpus structures and resource descriptions with help of specific tools.... The web-based Browsable Corpus at the Max Planck Institute for Psycholinguistics allows you to browse through IMDI corpora and search for language resources.” [Mamandel 14:05, 22 May 2011 (UTC)]

OLAC: Open Language Archives Community: “an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.” [Mamandel 14:01, 22 May 2011 (UTC)]

Machine Translation Archive. “Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. Latest update: 30 April 2011 [now containing over 7700 items]” [2011-05-10]
“aims to cover comprehensively English-language publications since 1990. Papers and books from previous years are being added in order to provide good coverage from the beginnings of MT in the 1950s to 1990.”

Probabilistic tagging of minority language data: a case study using Qtag. Christopher Cox. 2010. In Corpus-linguistic applications, ed. Stefan Th. Gries, Stefanie Wulff, and Mark Davies. 2010. Electronic: ISBN 9789042028012; hardback: ISBN 9789042028005.
Reviewed in LINGUIST List 21.3318 (2010-08-17) by Andrew Caines.

OBELEX: Online Bibliography of Electronic Lexicography. “Articles, monographs, anthologies, and reviews from the field of electronic lexicography with a special focus on online lexicography.” Search by full text, keyword, person, analysed languages, or publication year. “c. 600 entries” [2011-05-10] (German home page)

@@ Line 23: / Line 23: @@
 ==Standards and Best Practices==
+* [http://www.mpi.nl/IMDI/ ISLE Meta Data Initiative] (IMDI): {{hq|a proposed metadata standard to describe multi-media and multi-modal language resources. The standard provides interoperability for browsable and searchable corpus structures and resource descriptions with help of specific tools.... The web-based Browsable Corpus at the Max Planck Institute for Psycholinguistics allows you to browse through IMDI corpora and search for language resources.}} {{si|[[User:Mamandel|Mamandel]] 14:05, 22 May 2011 (UTC)}}
 * [http://www.language-archives.org/ OLAC: Open Language Archives Community]: {{Hq|an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.}} {{si|[[User:Mamandel|Mamandel]] 14:01, 22 May 2011 (UTC)}}
 ==NLP Literature==