Friday, May 17, 2019
Corpus Linguistics Essay
origination This paper includes in buildation somewhat head philology, its connection with lexicology and comment. The latter is the most essential one and I am keen on finding and introducing something which is mainly connected with my future profession. Frankly speaking that was not an easy pilgrimage but I am hopeful it is destined to be successful. A dealer is an electronically stored parade of samples of by nature get alongring quarrel. Most neo corpora be at least 1 jillion wrangling in size and be either of complete texts or of boastful extracts from long texts. usually the texts be selected to represent a type of communication or a variety of nomenclature for font, a head whitethorn be compiled to represent the slope riding habitd in history textbooks, or Canadian French, or Internet discussions of genetic modification. Corpora are investigated through the employment of dedicated software. head t severallyer philology whoremaster be regarded as a sop histicated method of finding answers to the kinds of questions linguists have al trends asked. A large head t to each oneer lav be a test bed for hypotheses and goat be utilize to match a quantitative dimension to m whatsoever linguistic studies.It is besides true, however, that star software presents the researcher with war crying in a var. that is not normally encountered and that this bottom highlight patterning that often goes unnoticed. dealer linguistics has also, therefore, led to a reassessment of what run-in is similar. During this journey we will try to find out What is head Linguistics head teacher Linguistics Terms and Their Meanings History of principal Linguistics Resources and Methodologies for Corpus Linguistics, Corpora Translation Corpus Linguistics and Linguistic Theory, Corpus-Based Descriptions So fasten the seat belts we are flyingWhat is Corpus Linguistics? Corpus linguistics is a bailiwick of spoken communication and a method of linguistic analysis which uses a collection of natural or real expression texts cognise as principal. Corpus linguistics is used to break down and research a number of linguistic questions and offers a unique insight into the dynamic of language which has make it one of the most widely used linguistic methodologies. Since school principal linguistics involves the use of large corpora that consist of millions or sometimes even billion rowing, it relies heavily on the use of ready reckoners to determine what rules govern the languageand what patters ( well-formed or lexical for instance) occur.Thus it is not surprising that corpus linguistics emerged in its modern form only after the computer revolution in the 1980s. The brownish Corpus, the first modern and electronically clean-cut corpus, however, was created by Henry Kucera and W. Nelson Francis as untimely as the 1960s. Corpus Linguistics Terms and Their Meanings Corpus (plural corpora). It refers to a collection of systematically or randomly collected texts of natural language which is electronically stored and processed. Corpus nates consist of texts in asingle or multiple languages.It check intos a large number of texts which throw overboard the researchers to 1 / 6 analyse linguistic rules but the corpus does not represent the entire language, no matter how large it is. Multilingual corpus. Like its name suggests, multilingual corpus consists of texts in multiple languages. Parsed corpus (treebank). It is a collection of texts in naturally occurring language in which each sentence is parsed syntactically analysed and annotated. syntactic analysis is typically given in a tree-like structure which is why parsed corpus is also known as treebank. Parallel corpora.The term refers to a collection of texts which are translations of each other. Annotation. It refers to an extension of the text by addition of sundry(a) linguistic information. Examples include parsing, tagging, etc. Annotation is often used in commendation to corpora as op wash upd to annotated corpora which consist of plain text in the raw state. Collocation. It refers to a sequence or pattern in which the address appear together or co-occur. Concordance. The term encompasses a word or phrase and its immediate mount.In corpus linguistics, concordance is used to analyse different use of a single word, word frequency andphrases or idioms. Orthography. It is a standardised writing system of a extra language and includes various grammatical rules such as spelling, capitalisation and punctuation marks. Orthography can pose a problem in analysis of writing systems which use accents because the native speakers of these languages sometimes use secondary characters to the accented letters or omit them completely.Token. It is an occurrence of an individual word which is plays an important role in the so-called tokenisation that involves division of the text or collection of dustup into token. This method is oftenused in the withdraw of languages which do not delimit oral communication with space. Lemmasation. The term derives from the word lemma which refers to a enclothe of different forms of a single word such as laugh and laughed for example. Lemmasation is the process of grouping of the words that have the same meaning. Wildcard.It refers to special characters such as question mark (? ) or asterisk (*) which can represent a character or word. 3A perspective. It is a research method that is used in corpus linguistics which was introduced by S. Wallis and G. Nelson. 3A stands for annotation, abstraction and analysis. History of Corpus LinguisticsHistory of corpus linguistics is typically divided into two periods early corpus linguistics, also known as pre-Chomsky corpus linguistics and modern corpus linguistics The early examples of corpus linguistics date to the late 19th century Germany.In 1897, German linguist J. Kading used a large corpus consisting of about 11 million words to analyse distribution of the letters and their sequences in German language. The impressively sized corpus that corresponds with the size of a modern corpus was revolutionary at the time.Other early linguists to use corpus to study language include Franz Boas (Handbook of NativeAmerican Indian Languages, 1911), Zellig Harris (Methods in Structural Linguistics, 1951), Charles C. Fries (The structure of face, 1952), Leonard Bloomfield (Language, 1933), Archibald A. Hill and others, generally American structural and field linguists. Some of them such as Fries and A. Aileen Traver also started to use corpus in pedagogical study of unusual language.In 1961, Henry Kucera and W. Nelson Francis from the Brown University started to work on the Brown University Standard Corpus of Present-Day American English, commonly known simply as the Brown Corpus which is the first modern, electronically readable corpus.It consists of 1 million word American English texts that are unionised into 15 categories . For the modern standards of corpus linguistics, the Brown Corpus is kind of small, however, it is widely considered one of the most important works in history of corpus linguistics. But this was also the time of Chomskys criticism of corpus linguistics which would result in a period of decline. Chomsky rejected the use of corpus as a tool for linguistic studies, arguing that linguist must model language on competence instead of performance. And according to Chomsky, corpus does allow 2 / 6 language modelling on competence.Corpus linguistics was not abandoned completely, however, it was not until the 1980s when linguists began to show an increased interest in the use of corpus for research. The revival of corpus linguistics and its emergence in the modern form was greatly influenced by the coming of computers and network technology in the 1980s which allowed the linguists to use electronic language samples as well as electronic tools.The use of computers, however, dates back to t he early 1970s when the Montreal French Project developed the first computerised form of spoken language, while Jan Svartvik began to work on the London-Lund corpus with the aid of theBrown Corpus and the Survey of English Usage (SEU) at University College London.All mentioned works before the 1980s as well as the early examples of corpus linguistics paved the way to modern study of language on the basis of corpora as we know it today. The term corpus linguistics has been finally adopted after J. Aarts and W. Meijs published Corpus linguistics Recent breedings in the use of computer corpora in English language research in 1984. Resources and Methodologies for Corpus Linguistics, Corpora The basic resource for corpus linguistics is a collection of texts, called a corpus.Corpora can be of variable sizes, are compiled for different purposes, and are composed of texts of different types. All corpora are homogeneous to a sealed extent they are composed of texts from one language or on e variety of a language or one register, etc. They also are all heterogeneous to a certain extent, in that at the very least they are composed of a number of different texts. Most corpora contain information in addition to the texts that make them up, such as information about the texts themselves, part-of- speech tags for each word, and parsing information. ?What Corpus Linguistics DoesGives an access to naturalistic linguistic information. As mentioned before, corpora consist of real word texts which are mostly a product of real life situations. This makes corpora a valuable research source for dialectology, sociolinguistics and stylistics. Facilitates linguistic research. Electronically readable corpora have dramatically reduced the time needed to find particular words or phrases. A research that would take days or even years to complete manually can be done in a matter of seconds with the highest degree of accuracy. Enables the study of wider patterns and collocation of words. i n the lead the advent of computers, corpus linguistics was studying only single words and their frequency. Modern technology allowed the study of wider patters and collocation of words. Allows analysis of multiple parameters at the same time. Various corpus linguistics software programmes, online trade and analytical tools allow the researchers to analyse a larger number of parameters simultaneously. In addition, many corpora are enriched with various linguistic information such as annotation.Facilitates the study of the second language. Study of the second language with the use of naturallanguage allows the students to get a better feeling for the language and learn the language like it is used in real rather than invented situations. What Corpus Linguistics Does Not Does not explain why. The study of corpora tells us what and how happened but it does not tell us why the frequency of a particular word has increased over time for instance. Does not represent the entire language.Cor pus linguistics studies the language by using randomly or systematically selected corpora. They typically consist of a large number of naturally occurring texts, however, they do not represent the entire language.Linguistic analyses that use the methods and tools of corpus linguistics thus do not represent the entire language. Searches, Software, and Methodologies Corpora are interrogated through the use of dedicated software, the nature of which inevitably reflects assumptions about methodology in corpus investigation. At the most basic level, corpus software . searches the corpus for a given target item, 3 / 6 . counts the number of instances of the target item in the corpus and calculates congress frequencies, . displays instances of the target item so that the corpus user can carry out further investigation.It is homely that corpus methodologies are essentially quantitative. Indeed, corpus linguistics has been criticized for allowing only the observation of congeneric quantit y and for weakness to expand the explanatory power of linguistic theory (for discussion, see Meyer, 2002 25). It is shown in this article that corpus linguistics can indeed enrich language theory, though only if preconceptions about what that theory consists of are allowed to change. Here, however, we leave that argument aside as we review corpus investigation software in more than than detail. Corpus Linguistics and Linguistic Theory, Corpus-Based Descriptions.As has been noted, corpus linguistics is essentially a methodology or set of methodologies, rather than a theory of language description. Essentially, corpus linguistics means this . expression at naturally occurring language . looking at relatively large amounts of such language . notice relative frequencies, either in raw form or mediated through statistical operations . observing patterns of association, either between a feature and a text type or between groups of words.reduced to its essence in this way, corpus ling uistics appears to be theory neutral, although thepractice of doing corpus linguistics is never neutral, as each practitioner defines what is meant by a feature and what frequencies should be observed, in line with a theoretical approach to what matters in language. Approaches to the use of a corpus that essentially rely on the macrocosm of categories derived from noncorpus investigations of language are sometimes referred to as corpus based (Tognini-Bonelli, 2001).Studies of this kind can test hypotheses arising from grammatical descriptions based on intuition or on limited data. Experiments have been designed specifically to do this (Nelson et al., 2002 257283).For example, Meyer (2002 78) describes work on ellipsis from a typological and psycholinguistic point of view that predicts that of the three doable clause locations of ellipsis in American spoken English, one will be much more frequent than the others. A corpus study reveals this to be an accurate prediction. On the oth er hand, the study of pseudo-titles mentioned in the plane section Languages and Varieties shows how assumptions about language in this instance about the influence of one variety of English on another can be shown to be false. Biber et al.(1999 7) comment that corpus-based analysis of grammatical structure can uncover characteristics that were previously unsuspected. They mention as examples of this the surprisingly high frequency of complex relative clause constructions in conversation, and the frequency of simplified grammatical constructions in academic prose. A clearer integration between linguistic theory and corpus linguistics is demonstrated by Matthiessens work on probability (see the section Probability).This work takes its categories from an existing description of English (Hallidays (1985) systemic functionalgrammar), but the corpus study was more integral to the theory, as it was the only way that statements about probability of occurrence of each item in the system could be made with accuracy. Corpus-Driven Descriptions However, more radical challenges to language description can be found. Sinclair (1991, 2004) argues that the kind of patterning observable in a corpus (and nowhere else) necessitate descriptions of a markedly different kind from those commonly available.Both the descriptions and the theories that they in turn inspire are, in Tognini-Bonellis (2001) terms, corpus driven. Someof the challenges to tradition that corpus-driven theories involve are these . Lexis and grammar are not distinct, and grammar is not an abstract system vestigial language . Choice of any kind is heavily restricted by choice of lexis . Meaning is not atomistic, residing in words, but prosodic, belonging to variable social units of meaning and always located in texts.4 / 6 Evidence for these conveys is presented in the section Observing patterned behavior above. The notion of pattern grammar focuses on the way that different lexical items behave different ly in terms of how they are complemented.Grammatical generalizations about complementation cannot be made without describing that individual lexical behavior. Similarly, choice between features such as positive and interdict depends to some extent on lexical item, as some verbs (such as afford) occur in the disconfirming much more frequently than most. In other words, the probability of any grammatical categorys occurring is strongly affected not only by the register but also by the lexis used. Finally, the indorse of phraseology is that it makes more sense to see meaning as belonging to phrases than to individual words.Findings such as these have led many writers to see a need for descriptions of language that are radically different from those currently available. Sinclair (1991, 2004) proposes, for example, that meaning be seen as belonging to units of meaning, each unit being describable in the way set out in He criticized conventional grammar for distinguishing between stru ctures (a series of slots) and lexis (the fillers), such that it appears that any slot can be filled by any filler there are no restrictions other than what the speaker wishes to say.This is clearly sometimes the case, andwhen it is, Sinclair Translation Corpora can be used to train translators, used as a resource for practicing translators, and used as a means of studying the process of translation and the kinds of choices that translators make. Parallel corpora are often used in these applications, and software exists that will align two corpora such that the translation of each sentence in the original text is immediately identifiable. This allows one to observe how a given word has been translated in different contexts.One interesting finding is that apparently equivalent words such as English go and Swedish ga , orEnglish with and German mit (Viberg, 1996 Schmied and Fink, 2000) occur as translations of each other in only a minority of instances. This suggests differences in the ways those languages use the items concerned. More generally, examination of parallel corpora emphasizes that what translators translate is not the word but a larger unit (Teubert andC ? erma? kova? , 2004).Although a single word may have many equivalents when translated, a word in context may well have only one such equivalent. For example, although travail as an individual word is sometimes translated as work and sometimes as labor, the phrase travaux pre?paratoires is translated only as preparatory work. Thus, Teubert and C ? erma? kova? argue, travaux pre? paratoires and preparatory work may be considered to be equivalent translation units, whereas no such claim can be made for travaux and work. As well as giving information about languages, corpus studies have also indicated that translated language is not the same as nontranslated language.Studies of corpora of translated texts have shown that they tend to have high incidences of very frequent words and that they tend to be more explicit in terms of grammar (Baker, 1993). They may also be influenced by the structureof the source language, as was indicated in the discussion of wh- clefts in English and Swedish in the section Languages and Varieties. In communities where people read a large number of translated texts, the foreign language, via its translations, may even influence the home language. Gellerstam (1996) notes that some words in Swedish have interpreted on the meanings of English that look similar and argues that this is because translators tend to translate the English word with the similar looking Swedish word, thereby using the Swedish word with a new meaning, which then enters the language.One example is the Swedish word dramatisk, which used to indicate something relating to drama but which now, like the English word dramatic, also means substantial and surprising. Conclusion So every journey has its end. Ours isnt an exception. It was a long journey but it was worth it. Corpus lingu istics is a relatively new discipline, and a fast-changing one. As computer resources, particularly web-based ones, develop, sophisticated corpus investigations come within the reach of 5 / 6 the ordinary translator, language learner, or linguist.Our understanding of the ways that types oflanguage might vary from one another, and our appreciation of the ways that words pattern in language, have been immeasurably improved by corpus studies. Even more significant, perhaps, is the development of new theories of language that take corpus research as their starting point. The list of used writings 1. M. A. K. Halliday Lexicology and Corpus Linguistics 2. Teubert and C ? erma? kova? 2004 3. Wallis, S. and Nelson G. cognition discovery in grammatically analysed corpora. Data Mining and Knowledge Discovery, 5 307340. 2001 POWERED BY TCPDF (WWW. TCPDF. ORG)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.