CET-AT Service Catalogue

LinA – Linguistic Text Analysis

Provided by:

Peter Kolb & Prochazkova GbR - Linguatools

Function: Language Technologies

Task: Language Identification , Lemmatization , Morphological Annotation , Sentence Splitting , Tokenization

About
More

LinA is a software for automatic text processing. It performs linguistic analysis of large volumes of unstructured text, including social media. LinA's available modules are:

Text filters: extract content from text, HTML, PDF, MS Office, Open Office, XML, TMX, XLIFF and many other file formats.
Language identification: automatically determines the language of a document.
Sentence segmentation and tokenization: Identifies sentence and word boundaries.
Truecasing: normalizes the casing of a text, e.g. This Sentence Is in ENGLISH. → this sentence is in English.
Part-of-speech tagging: assigns to every word its part of speech.
Morphological analysis and lemmatization: analyses unknown words according to morphological rules (including compound splitting for German), and generates the baseform of a word in the current context.
Several configurable output methods: XML writer, Lucene index writer, etc.

It can process a large variety of input formats and is available for English and German. Some of the modules are available for other languages (contact the provider).

LinA is completely coded in Java, which makes it fast and run on Unix as well as Windows and MacOS.

Language Coverage

English (Latin), German (Latin)

Get Started with the service

: Contact the Provider

Support

Helpdesk: peter.kolb@linguatools.org

Other

LinA – Linguistic Text Analysis

Contact Information

Language Coverage

Get Started with the service

Support