NLP Resources

From the LDC Language Resource Wiki

(Difference between revisions)
Jump to: navigation, search
m
m (moved standards & best pracs to General Meta-resources, where I'd been having them)
 
(43 intermediate revisions not shown)
Line 1: Line 1:
-
[[Category:General Resources]]([[User:Ftyers|Ftyers]] 19:13, 22 April 2010 (UTC))
+
{{Under construction}}
 +
{{si|[[User:Mamandel|Mamandel]] 14:18, 22 May 2011 (UTC)}}
-
This page is for language-independent NLP resources.
+
__TOC__
-
==Apertium==
+
This page is for language-independent resources for computational natural language processing. <br>
 +
Language-independent [[General Meta-resources]] that are not specific to NLP have their own page. <br>
 +
For metadata standards and infrastructure see the [[General Meta-resources#Metadata_standards_and_infrastructure|General Meta-resources]] page.
-
A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages.
+
==Software==
-
===Links===
+
* [http://borel.slu.edu/crubadan/index.html An Crúbadán]: Corpus building for minority languages. Web crawling software {{Hq|designed to exploit the vast quantities of text freely available on the web as a way of bringing the benefits of statistical NLP to languages with small numbers of speakers and/or limited computational resources.}} Kevin P. Scannell. {{si|[[User:Mamandel|Mamandel]] 00:25, 14 May 2010 (UTC)}}
-
* [http://www.apertium.org Apertium: Home]
+
* [http://www.apertium.org Apertium]. A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages.
-
==Foma==
+
* [http://sourceforge.net/projects/foma/ Foma]. {{hq|a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers.}}
 +
**[http://www.aclweb.org/anthology-new/E/E09/E09-2008.pdf Foma: a finite-state compiler and library]. Hulden, Mans. 2009. ''Proceedings of the EACL 2009 Demonstrations Session'', pages 29–32, Athens, Greece, 3 April 2009. PDF
-
==HFST==
+
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]. A free/open-source rewrite of the Xerox finite-state tools. It provides an implementation both of the <code>lexc</code> and <code>twolc</code> formalisms.
-
The Helsinki finite-state toolkit is a free/open-source rewrite of the Xerox finite-state tools. It provides an implementation both of the <code>lexc</code> and <code>twolc</code> formalisms.
+
*[http://www.unlweb.net/unlweb/ Universal Networking Language (UNL)]. {{hq|an artificial language for representing, describing, summarizing, refining, storing and disseminating information in a natural-language-independent format. It is a kind of mark-up language which represents not the formatting but the core information of a text. As HTML annotations can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages.}}
-
===Links===
+
* [http://beta.visl.sdu.dk/constraint_grammar.html VISL Constraint Grammar]. A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism.
-
* [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ HFST: Home]
+
==NLP Literature==
-
==Machine Translation Archive==
+
* [http://www.mt-archive.info/ Machine Translation Archive]. {{hq|Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. Latest update: 30 April 2011 [now containing over 7700 items]}} {{si|2011-05-10}} <br>{{hq|aims to cover comprehensively English-language publications since 1990.  Papers and books from previous years are being added in order to provide good coverage from the beginnings of MT in the 1950s to 1990.}}
-
[http://www.mt-archive.info/ Machine Translation Archive]. Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. >6400 items. Aims to be comprehensive on English-language publications since 1990; adding earlier papers and books to provide partial coverage from the 1950s. ([[User:Mamandel|Mamandel]] 20:53, 22 April 2010 (UTC))
+
-
==OBELEX==
+
* Probabilistic tagging of minority language data: a case study using Qtag. Christopher Cox. 2010.  In ''[http://www.rodopi.nl/senj.asp?BookId=LC+71 Corpus-linguistic applications]'', ed. Stefan Th. Gries, Stefanie Wulff, and Mark Davies. 2010. Electronic: ISBN 9789042028012; hardback: ISBN 9789042028005.  <br>Reviewed in [http://linguistlist.org/issues/21/21-3318.html LINGUIST List 21.3318] (2010-08-17) by Andrew Caines.
-
([[User:Mamandel|Mamandel]] 20:09, 27 April 2010 (UTC))
+
-
[[Image:RedRx.gif]] ''This is part of an [http://linguistlist.org/issues/21/21-1915.html announcement] on the LINGUIST List. At the moment the site is not loading in Firefox. 20:09, 27 April 2010 (UTC)''
+
* [http://hypermedia.ids-mannheim.de/pls/lexpublic/bib_en.ansicht OBELEX: Online Bibliography of Electronic Lexicography]. {{hq|Articles, monographs, anthologies, and reviews from the field of electronic lexicography with a special focus on online lexicography.}} Search by full text, keyword, person, analysed languages, or publication year. {{hq|c. 600 entries}} {{si|2011-05-10}} ([http://hypermedia.ids-mannheim.de/pls/lexpublic/bib.ansicht German home page])  
-
The Institute for German Language in Mannheim is working in a Online Bibliography of Electronic Lexicography (OBELEX) which is available at [http://hypermedia.ids-mannheim.de/pls/lexpublic/bib_en.ansicht www.owid.de/obelex/engl] and may be of interest to linguists. In OBELEX, the research contributions from this field are consolidated and are searchable by different criteria. OBELEX includes all relevant articles, monographs, anthologies and reviews since 2000 with respect to electronic lexicography, and some older relevant works. Our particular focus is on works about online lexicography. While information on dictionaries is not included in OBELEX, we are working on a database which contains information on online dictionaries as a supplement to OBELEX.
+
[[Category:Non-language-specific]]
-
 
+
-
==TMX==
+
-
 
+
-
An XML-based format for translation memories.
+
-
 
+
-
===Links===
+
-
 
+
-
* [http://www.lisa.org/Translation-Memory-e.34.0.html TMX: Home]
+
-
 
+
-
==VISL Constraint Grammar==
+
-
 
+
-
A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism.
+
-
 
+
-
===Links===
+
-
 
+
-
* [http://beta.visl.sdu.dk/constraint_grammar.html VISL Constraint Grammar: Home]
+

Latest revision as of 14:18, 22 May 2011

THIS PAGE IS

UNDER CONSTRUCTION


[Mamandel 14:18, 22 May 2011 (UTC)]

Contents


This page is for language-independent resources for computational natural language processing.
Language-independent General Meta-resources that are not specific to NLP have their own page.
For metadata standards and infrastructure see the General Meta-resources page.

Software

  • An Crúbadán: Corpus building for minority languages. Web crawling software designed to exploit the vast quantities of text freely available on the web as a way of bringing the benefits of statistical NLP to languages with small numbers of speakers and/or limited computational resources. Kevin P. Scannell. [Mamandel 00:25, 14 May 2010 (UTC)]
  • Apertium. A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages.
  • Foma. a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers.
  • Universal Networking Language (UNL). an artificial language for representing, describing, summarizing, refining, storing and disseminating information in a natural-language-independent format. It is a kind of mark-up language which represents not the formatting but the core information of a text. As HTML annotations can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages.
  • VISL Constraint Grammar. A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism.

NLP Literature

  • Machine Translation Archive. Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. Latest update: 30 April 2011 [now containing over 7700 items] [2011-05-10]
    aims to cover comprehensively English-language publications since 1990. Papers and books from previous years are being added in order to provide good coverage from the beginnings of MT in the 1950s to 1990.
Personal tools