What is a corpus and how to build it? Lessons learned from developing several linguistic corpora

Authors

  • Sandra Maria Aluísio
  • Gladis Maria de Barcellos Almeida

Abstract

The research based on corpus has had in the last decade an ample development in the Brazilian context. Its relevancy is noticed in the Linguistics, Applied Linguistics and Computational Linguistics research areas. The approach of Corpus Linguistics comes out to systematize procedures and to give account of this new way to make research. The development of Brazilian Portuguese natural language processing tools can help Corpus Linguistics to reach a great development in Brazil. However, the advances in Corpus Linguistics in the international scenery have not happened yet in many of the research carried out in Brazil. The reasons for this is that the procedures and concepts world-wide accepted are not still settled here, in spite of having researchers developing extraordinary projects based on corpus in Brazil. Thus, this article has the intention to discuss several definitions of corpus, the requirements and procedures for its elaboration, the available corpora and tools and, finally, to present four projects involving corpus whose description and detailing can assist other researchers in the corpus building and processing.

Key-words: corpus; corpus linguistics; corpus processing.

Published

2021-05-27

How to Cite

Aluísio, S. M., & Almeida, G. M. de B. (2021). What is a corpus and how to build it? Lessons learned from developing several linguistic corpora. Calidoscópio, 4(3), 156–178. Retrieved from https://revistas.unisinos.br/index.php/calidoscopio/article/view/6002