NLP Resources

From the LDC Language Resource Wiki

Jump to: navigation, search


[Mamandel 14:18, 22 May 2011 (UTC)]


This page is for language-independent resources for computational natural language processing.
Language-independent General Meta-resources that are not specific to NLP have their own page.
For metadata standards and infrastructure see the General Meta-resources page.


  • An Crúbadán: Corpus building for minority languages. Web crawling software designed to exploit the vast quantities of text freely available on the web as a way of bringing the benefits of statistical NLP to languages with small numbers of speakers and/or limited computational resources. Kevin P. Scannell. [Mamandel 00:25, 14 May 2010 (UTC)]
  • Apertium. A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages.
  • Foma. a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers.
  • Universal Networking Language (UNL). an artificial language for representing, describing, summarizing, refining, storing and disseminating information in a natural-language-independent format. It is a kind of mark-up language which represents not the formatting but the core information of a text. As HTML annotations can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages.
  • VISL Constraint Grammar. A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism.

NLP Literature

  • Machine Translation Archive. Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. Latest update: 30 April 2011 [now containing over 7700 items] [2011-05-10]
    aims to cover comprehensively English-language publications since 1990. Papers and books from previous years are being added in order to provide good coverage from the beginnings of MT in the 1950s to 1990.
Personal tools