NLP Resources
From the LDC Language Resource Wiki
(Difference between revisions)
m (→Foma) |
(check, structure, and arrange FTyers's items) |
||
Line 1: | Line 1: | ||
{{Under construction}} | {{Under construction}} | ||
- | {{si|[[User: | + | {{si|[[User:Mamandel|Mamandel]] 22:40, 10 May 2011 (UTC)}} |
This page is for language-independent resources for computational natural language processing. <br> | This page is for language-independent resources for computational natural language processing. <br> | ||
Language-independent [[General Meta-resources]] that are not specific to NLP have their own page. | Language-independent [[General Meta-resources]] that are not specific to NLP have their own page. | ||
- | == | + | ==Software== |
- | + | * [http://borel.slu.edu/crubadan/index.html An Crúbadán]: Corpus building for minority languages. Web crawling software {{Hq|designed to exploit the vast quantities of text freely available on the web as a way of bringing the benefits of statistical NLP to languages with small numbers of speakers and/or limited computational resources.}} Kevin P. Scannell. {{si|[[User:Mamandel|Mamandel]] 00:25, 14 May 2010 (UTC)}} | |
- | + | * [http://www.apertium.org Apertium]. A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages. | |
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
* Foma: a finite-state compiler and library. Hulden, Mans. 2009. ''Proceedings of the EACL 2009 Demonstrations Session'', pages 29–32, Athens, Greece, 3 April 2009. [http://www.aclweb.org/anthology-new/E/E09/E09-2008.pdf PDF] | * Foma: a finite-state compiler and library. Hulden, Mans. 2009. ''Proceedings of the EACL 2009 Demonstrations Session'', pages 29–32, Athens, Greece, 3 April 2009. [http://www.aclweb.org/anthology-new/E/E09/E09-2008.pdf PDF] | ||
- | + | * [http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/ Helsinki Finite-State Transducer Technology (HFST)]. A free/open-source rewrite of the Xerox finite-state tools. It provides an implementation both of the <code>lexc</code> and <code>twolc</code> formalisms. | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
+ | *[http://www.unlweb.net/unlweb/ Universal Networking Language (UNL)]. {{hq|an artificial language for representing, describing, summarizing, refining, storing and disseminating information in a natural-language-independent format. It is a kind of mark-up language which represents not the formatting but the core information of a text. As HTML annotations can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages.}} | ||
- | + | * [http://beta.visl.sdu.dk/constraint_grammar.html VISL Constraint Grammar]. A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism. | |
- | [http:// | + | |
- | + | ||
- | |||
- | + | ==NLP Literature== | |
- | + | * [http://www.mt-archive.info/ Machine Translation Archive]. Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. >6400 items. Aims to be comprehensive on English-language publications since 1990; adding earlier papers and books to provide partial coverage from the 1950s. | |
- | * [http:// | + | * Probabilistic tagging of minority language data: a case study using Qtag. Christopher Cox. 2010. In ''[http://www.rodopi.nl/senj.asp?BookId=LC+71 Corpus-linguistic applications]'', ed. Stefan Th. Gries, Stefanie Wulff, and Mark Davies. 2010. Electronic: ISBN 9789042028012; hardback: ISBN 9789042028005. Reviewed in [http://linguistlist.org/issues/21/21-3318.html LINGUIST List 21.3318] (2010-08-17) by Andrew Caines. |
+ | * [http://hypermedia.ids-mannheim.de/pls/lexpublic/bib_en.ansicht OBELEX: Online Bibliography of Electronic Lexicography]. {{hq|Articles, monographs, anthologies, and reviews from the field of electronic lexicography with a special focus on online lexicography.}} Dictionaries not included, but included in a supplementary database now under construction. Search by full text, keyword, person, analysed languages, or publication year. {{hq|c. 600 entries}} ([http://hypermedia.ids-mannheim.de/pls/lexpublic/bib.ansicht German home page]) | ||
[[Category:Non-language-specific]] | [[Category:Non-language-specific]] |
Revision as of 22:40, 10 May 2011
UNDER CONSTRUCTION
[Mamandel 22:40, 10 May 2011 (UTC)]
This page is for language-independent resources for computational natural language processing.
Language-independent General Meta-resources that are not specific to NLP have their own page.
Software
- An Crúbadán: Corpus building for minority languages. Web crawling software “designed to exploit the vast quantities of text freely available on the web as a way of bringing the benefits of statistical NLP to languages with small numbers of speakers and/or limited computational resources.” Kevin P. Scannell. [Mamandel 00:25, 14 May 2010 (UTC)]
- Apertium. A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages.
- Foma: a finite-state compiler and library. Hulden, Mans. 2009. Proceedings of the EACL 2009 Demonstrations Session, pages 29–32, Athens, Greece, 3 April 2009. PDF
- Helsinki Finite-State Transducer Technology (HFST). A free/open-source rewrite of the Xerox finite-state tools. It provides an implementation both of the
lexc
andtwolc
formalisms.
- Universal Networking Language (UNL). “an artificial language for representing, describing, summarizing, refining, storing and disseminating information in a natural-language-independent format. It is a kind of mark-up language which represents not the formatting but the core information of a text. As HTML annotations can be realized differently in the context of different applications, machines, displays, etc., so UNL expressions can have different realizations in different human languages.”
- VISL Constraint Grammar. A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism.
NLP Literature
- Machine Translation Archive. Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. >6400 items. Aims to be comprehensive on English-language publications since 1990; adding earlier papers and books to provide partial coverage from the 1950s.
- Probabilistic tagging of minority language data: a case study using Qtag. Christopher Cox. 2010. In Corpus-linguistic applications, ed. Stefan Th. Gries, Stefanie Wulff, and Mark Davies. 2010. Electronic: ISBN 9789042028012; hardback: ISBN 9789042028005. Reviewed in LINGUIST List 21.3318 (2010-08-17) by Andrew Caines.
- OBELEX: Online Bibliography of Electronic Lexicography. “Articles, monographs, anthologies, and reviews from the field of electronic lexicography with a special focus on online lexicography.” Dictionaries not included, but included in a supplementary database now under construction. Search by full text, keyword, person, analysed languages, or publication year. “c. 600 entries” (German home page)