NLP Resources
From the LDC Language Resource Wiki
(Difference between revisions)
m (add Standards h2 section, with OLAC and IMDI) |
m (→Standards and Best Practices) |
||
Line 23: | Line 23: | ||
==Standards and Best Practices== | ==Standards and Best Practices== | ||
+ | * [http://www.mpi.nl/IMDI/ ISLE Meta Data Initiative] (IMDI): {{hq|a proposed metadata standard to describe multi-media and multi-modal language resources. The standard provides interoperability for browsable and searchable corpus structures and resource descriptions with help of specific tools.... The web-based Browsable Corpus at the Max Planck Institute for Psycholinguistics allows you to browse through IMDI corpora and search for language resources.}} {{si|[[User:Mamandel|Mamandel]] 14:05, 22 May 2011 (UTC)}} | ||
+ | |||
* [http://www.language-archives.org/ OLAC: Open Language Archives Community]: {{Hq|an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.}} {{si|[[User:Mamandel|Mamandel]] 14:01, 22 May 2011 (UTC)}} | * [http://www.language-archives.org/ OLAC: Open Language Archives Community]: {{Hq|an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.}} {{si|[[User:Mamandel|Mamandel]] 14:01, 22 May 2011 (UTC)}} | ||
- | |||
- | |||
==NLP Literature== | ==NLP Literature== |
Revision as of 14:05, 22 May 2011
UNDER CONSTRUCTION
[Mamandel 22:40, 10 May 2011 (UTC)]
Contents |
This page is for language-independent resources for computational natural language processing.
Language-independent General Meta-resources that are not specific to NLP have their own page.
Software
- An Crúbadán: Corpus building for minority languages. Web crawling software “designed to exploit the vast quantities of text freely available on the web as a way of bringing the benefits of statistical NLP to languages with small numbers of speakers and/or limited computational resources.” Kevin P. Scannell. [Mamandel 00:25, 14 May 2010 (UTC)]
- Apertium. A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages.
- Foma. “a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers.”
- Foma: a finite-state compiler and library. Hulden, Mans. 2009. Proceedings of the EACL 2009 Demonstrations Session, pages 29–32, Athens, Greece, 3 April 2009. PDF
- Helsinki Finite-State Transducer Technology (HFST). A free/open-source rewrite of the Xerox finite-state tools. It provides an implementation both of the
lexc
andtwolc
formalisms.
- Universal Networking Language (UNL). “an artificial language for representing, describing, summarizing, refining, storing and disseminating information in a natural-language-independent format. It is a kind of mark-up language which represents not the formatting but the core information of a text. As HTML annotations can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages.”
- VISL Constraint Grammar. A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism.
Standards and Best Practices
- ISLE Meta Data Initiative (IMDI): “a proposed metadata standard to describe multi-media and multi-modal language resources. The standard provides interoperability for browsable and searchable corpus structures and resource descriptions with help of specific tools.... The web-based Browsable Corpus at the Max Planck Institute for Psycholinguistics allows you to browse through IMDI corpora and search for language resources.” [Mamandel 14:05, 22 May 2011 (UTC)]
- OLAC: Open Language Archives Community: “an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources.” [Mamandel 14:01, 22 May 2011 (UTC)]
NLP Literature
- Machine Translation Archive. “Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. Latest update: 30 April 2011 [now containing over 7700 items]” [2011-05-10]
“aims to cover comprehensively English-language publications since 1990. Papers and books from previous years are being added in order to provide good coverage from the beginnings of MT in the 1950s to 1990.”
- Probabilistic tagging of minority language data: a case study using Qtag. Christopher Cox. 2010. In Corpus-linguistic applications, ed. Stefan Th. Gries, Stefanie Wulff, and Mark Davies. 2010. Electronic: ISBN 9789042028012; hardback: ISBN 9789042028005.
Reviewed in LINGUIST List 21.3318 (2010-08-17) by Andrew Caines.
- OBELEX: Online Bibliography of Electronic Lexicography. “Articles, monographs, anthologies, and reviews from the field of electronic lexicography with a special focus on online lexicography.” Search by full text, keyword, person, analysed languages, or publication year. “c. 600 entries” [2011-05-10] (German home page)