NLP Resources

From the LDC Language Resource Wiki

Revision as of 17:36, 12 July 2010 by Mamandel (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

THIS PAGE IS

UNDER CONSTRUCTION

(Ftyers 19:13, 22 April 2010 (UTC))

This page is for language-independent NLP resources.

Apertium

A free/open-source rule-based machine translation platform offering free linguistic data (morphological analysers, bilingual dictionaries, etc.) in XML formats for a range of languages.

Links

Apertium: Home

An Crúbadán

Corpus building for minority languages: Home page for An Crúbadán, web crawling software by Kevin P. Scannell designed for corpus building for minority languages. [Mamandel 00:25, 14 May 2010 (UTC)]

Template:Heavy lqStatistical techniques are a key part of most modern natural language processing systems. Unfortunately, such techniques require the existence of large bodies of text, and in the past corpus development has proved to be quite expensive. As a result, substantial corpora exist primarily for languages like English, French, German, etc. where there is a market-driven need for NLP tools.

Template:Heavy quotes

Foma

HFST

The Helsinki finite-state toolkit is a free/open-source rewrite of the Xerox finite-state tools. It provides an implementation both of the lexc and twolc formalisms.

Links

HFST: Home

Machine Translation Archive

Machine Translation Archive. Electronic repository and bibliography of articles, books and papers on topics in machine translation, computer translation systems, and computer-based translation tools. >6400 items. Aims to be comprehensive on English-language publications since 1990; adding earlier papers and books to provide partial coverage from the 1950s. [Mamandel 20:53, 22 April 2010 (UTC)]

OBELEX

Online Bibliography of Electronic Lexicography (OBELEX). All relevant articles, monographs, anthologies and reviews since 2000 and some older relevant works. Focus is on online lexicography. Dictionaries not included, but included in a supplementary database now under construction. Search by full text, keyword, person, analysed languages, or publication year. (Mamandel 22:26, 28 April 2010 (UTC))

Home page in German.
Announcement on LINGUIST List [19-Apr-2010 ]

TMX

An XML-based format for translation memories.

Links

TMX: Home

Universal Networking Language

[from the home page]: Template:Heavy quotes [Mamandel 20:26, 6 May 2010 (UTC)]

Links

UNLWEB

University of Western Australia Web Text Mining and NLP Tools

[From LINGUIST List 21.2867]: “We have made available a list of web services for accessing text mining and NLP tools implemented at our research group such as boilerplate removal (known as HERCULES), semantic similarity/relatedness measures (i.e. Normalised Web Distance, n-Degree of Wikipedia), noun phrase chunking, triple extraction, noisy text cleaning (known as ISSAC), simple term extraction, and access to our multi-domain, 300 million token text corpora (which are continuously growing).
--Dr Wilson Wong, School of Computer Science & Software Engineering, The University of Western Australia” [Mamandel 17:36, 12 July 2010 (UTC)]

Links

The University of Western Australia (UWA) Text Mining Group
API directory
Write to wilson@csse.uwa.edu.au to obtain a free developer key

VISL Constraint Grammar

A free/open-source software reimplementation and extension of Fred Karlsson's Constraint Grammar formalism.

Links

VISL Constraint Grammar: Home

NLP Resources

From the LDC Language Resource Wiki

Contents

Apertium

Links

An Crúbadán

Foma

HFST

Links

Machine Translation Archive

OBELEX

TMX

Links

Universal Networking Language

Links

University of Western Australia Web Text Mining and NLP Tools

Links

VISL Constraint Grammar

Links

Views

Personal tools

Navigation

Search

Toolbox