Morphological productivity and text technology: Aspects of a lexical transducer of Portuguese capable of analyzing neologisms
Abstract
This paper presents LEXPOR, a prototype of a morphological component of Portuguese capable of segmenting and classifying the constituents of complex words resulting from suffixation of -ismo, -iano, -ês and -mente as well as from prefixing the words so derived with Greek or Latin prefixes such as neo-, pseudo-, anti-, or ultra-. We assume that a representation of complex words in terms of morphemes and morphosyntactic categories plays an important role not only in corpus linguistics, but also in other subfields of text technology, such as Information Extraction and Information Retrieval. This prototype consists of a lexical transducer modeling the set of words that can potentially be built using these derivational affixes. This transducer was compiled from a morphotactics and morphophonological description of this lexicon fragment as well as orthographic alternation rules formalized in the xfst and lexc finite-state programming languages. Its main feature is the ability to analyze neologisms built from non-lexicalized words borrowed from other languages. Since the use of foreign anthroponyms is one of the main causes of the extreme productivity of the derivational affixes we focus on, LEXPOR provides an adequate architecture for developing an automatic tagger for Portuguese, capable of overcoming the shortcomings of the CETENFolha corpus and of the parser for the VISL project. In both these cases, morphological analyses of complex words formed with the derivational affixes mentioned above are often either insufficiently detailed or simply incorrect.
Key words: derivation, suffixation, prefixation, automata, lexical transducers, finite-state morphology, automatic corpus annotation, corpus linguistics, computational linguistics.Downloads
Published
How to Cite
Issue
Section
License
I grant the journal Calidoscópio the first publication of my article, licensed under Creative Commons Attribution license (which allows sharing of work, recognition of authorship and initial publication in this journal).
I confirm that my article is not being submitted to another publication and has not been published in its entirely on another journal. I take full responsibility for its originality and I will also claim responsibility for charges from claims by third parties concerning the authorship of the article.
I also agree that the manuscript will be submitted according to the journal’s publication rules described above.