What is a corpus and how to build it? Lessons learned from developing several linguistic corpora
Abstract
The research based on corpus has had in the last decade an ample development in the Brazilian context. Its relevancy is noticed in the Linguistics, Applied Linguistics and Computational Linguistics research areas. The approach of Corpus Linguistics comes out to systematize procedures and to give account of this new way to make research. The development of Brazilian Portuguese natural language processing tools can help Corpus Linguistics to reach a great development in Brazil. However, the advances in Corpus Linguistics in the international scenery have not happened yet in many of the research carried out in Brazil. The reasons for this is that the procedures and concepts world-wide accepted are not still settled here, in spite of having researchers developing extraordinary projects based on corpus in Brazil. Thus, this article has the intention to discuss several definitions of corpus, the requirements and procedures for its elaboration, the available corpora and tools and, finally, to present four projects involving corpus whose description and detailing can assist other researchers in the corpus building and processing.
Key-words: corpus; corpus linguistics; corpus processing.Downloads
Published
How to Cite
Issue
Section
License
I grant the journal Calidoscópio the first publication of my article, licensed under Creative Commons Attribution license (which allows sharing of work, recognition of authorship and initial publication in this journal).
I confirm that my article is not being submitted to another publication and has not been published in its entirely on another journal. I take full responsibility for its originality and I will also claim responsibility for charges from claims by third parties concerning the authorship of the article.
I also agree that the manuscript will be submitted according to the journal’s publication rules described above.