My Learning Log Week 3: Learning Corpus Linguistics

 

WEEK 3

Learning Corpus Linguistics

            Corpus linguistics can be described as the study of language based on text corpora. It is the study of language/linguistic phenomena through the analysis of data obtained from a corpus. So, what is a corpus. A corpus is a collection of machine-readable, authentic texts, chosen to characterize or represent a state or variety of a language. Corpus linguistics is included both scientific processes and IT tools to make studying corpus linguistics easier and reliable.

            Teacher suggested many corpus websites for us such as the British National Corpus (BNC), Corpus Leeds, Lextutor, etc. To design a specialized corpus has 5 things to be concerned about. The first is corpus size. There are no fixed rules; depending on research purposes, availability of data, and time, but there are limitations of ‘too small’ corpora e.g. not enough hits to make decent generalization, not covering enough concepts, terms, or patterns under investigation. The second is text extracts vs. full texts. The whole text offers more coverage because words or terms to be looked at may be randomly distributed throughout the text and specific sections may be helpful if we are looking for words or phrases. The third is a number of texts. Choices can be made between collect few texts of large size or a number of texts with smaller sizes depend on your research focus. The fourth is medium. Your corpus linguistic can be spoken or written texts or mixed. The fifth is subject and text type. It should mainly focus on the specialized text under investigation, although this is less clear-cut in multidisciplinary subjects.

After that teacher assigned us to do our own corpus linguistics on any topic. We can gather text from many sources such as printed materials, word document texts, CD-ROMs, texts on the web or online databases. For my corpus linguistics, the topic I chose is Queen Elizabeth’s Christmas messages. My corpus linguistics topic text can be gathered from the British royal website. I will have to collect 68 messages from the royal website and put them in the corpus program.

 Corpus linguistics is a way to study using of words and their collocation from authentic texts. We will know how to use the words or any grammar from corpus linguistics.  

ความคิดเห็น

บทความที่ได้รับความนิยม