NLP Resources

From the LDC Language Resource Wiki

[Mamandel 14:18, 22 May 2011 (UTC)]


This page is for language-independent resources for computational natural language processing.
Language-independent General Meta-resources that are not specific to NLP have their own page.
For metadata standards and infrastructure see the General Meta-resources page.


  • An Crúbadán: Corpus building for minority languages. Web crawling software designed to exploit the vast quantities of text freely available on the web as a way of bringing the benefits of statistical NLP to languages with small numbers of speakers and/or limited computational resources. Kevin P. Scannell. [Mamandel 00:25, 14 May 2010 (UTC)]
  • Apertium. A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages.
  • Foma. a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers.
  • Universal Networking Language (UNL). an artificial language for representing, describing, summarizing, refining, storing and disseminating information in a natural-language-independent format. It is a kind of mark-up language which represents not the formatting but the core information of a text. As HTML annotations can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages.
  • VISL Constraint Grammar. A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism.

NLP Literature

  • Machine Translation Archive. Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. Latest update: 30 April 2011 [now containing over 7700 items] [2011-05-10]
    aims to cover comprehensively English-language publications since 1990. Papers and books from previous years are being added in order to provide good coverage from the beginnings of MT in the 1950s to 1990.
