Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms

Authors

  • Leonel Figueiredo de Alencar

Abstract

This paper presents LEXPOR, a prototype of a morphological component of Portuguese capable of segmenting and classifying the constituents of complex words resulting from suffixation of -ismo, -iano, -ês and -mente as well as from prefixing the words so derived with Greek or Latin prefixes such as neo-, pseudo-, anti-, or ultra-. We assume that a representation of complex words in terms of morphemes and morphosyntactic categories plays an important role not only in corpus linguistics, but also in other subfields of text technology, such as Information Extraction and Information Retrieval. This prototype consists of a lexical transducer modeling the set of words that can potentially be built using these derivational affixes. This transducer was compiled from a morphotactics and morphophonological description of this lexicon fragment as well as orthographic alternation rules formalized in the xfst and lexc finite-state programming languages. Its main feature is the ability to analyze neologisms built from non-lexicalized words borrowed from other languages. Since the use of foreign anthroponyms is one of the main causes of the extreme productivity of the derivational affixes we focus on, LEXPOR provides an adequate architecture for developing an automatic tagger for Portuguese, capable of overcoming the shortcomings of the CETENFolha corpus and of the parser for the VISL project. In both these cases, morphological analyses of complex words formed with the derivational affixes mentioned above are often either insufficiently detailed or simply incorrect.

Key words: derivation, suffixation, prefixation, automata, lexical transducers, finite-state morphology, automatic corpus annotation, corpus linguistics, computational linguistics.

Published

2021-05-27

How to Cite

Alencar, L. F. de. (2021). Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms. Calidoscópio, 7(3), 199–220. Retrieved from https://revistas.unisinos.br/index.php/calidoscopio/article/view/4874