Corpus Analysis
Study Course Implementer
Dzirciema street 16, Rīga, szf@rsu.lv
About Study Course
Objective
Preliminary Knowledge
Learning Outcomes
Knowledge
1.students are able to recognize principles of meaning, basic difference between paradigms of meaning; principles of making and using language corpora.
Skills
1.Students can use Sketch Engine programme, find words frequencies, concordance and collocations, extract statistical data of language use, interpret data qualitatively and quantitatively.
Competences
1.Students formulate a problem in the field of their professional interests as a problem of language use, advances a hypothesis, extracts and interprets quantitative and qualitative data, formulates conclusions and recommendations.
Assessment
Individual work
|
Title
|
% from total grade
|
Grade
|
|---|---|---|
|
1.
Individual work |
-
|
-
|
|
Interpretation of meaning of essentially contested notions in the student's major discipline using Latvian language corpora.
In order to evaluate the quality of the study course as a whole, the student must fill out the study course evaluation questionnaire on the Student Portal.
|
||
Examination
|
Title
|
% from total grade
|
Grade
|
|---|---|---|
|
1.
Examination |
-
|
-
|
|
Use of social science terms and/or key words of the student's field of specialization in political, legal, media and everyday public discourse.
The report includes the following sections:
- the formulation of a research problem related to the use of language in some area of public life (justice, journalism, politics, health care, economy, public relations, education) (1/2 page);
- a list of key words (notions, terms, concepts) and the justification for their choice (1-3 words);
- statistical hypotheses about the use of key words in corpora and subcorpora (3-5 hypotheses);
- description of the corpora selected for the study and justification of choice (1/2 page);
- Description of Sketch Engine queries (for one keyword, if the queries of other words are similar) (1/2 page);
- summary of statistical data in tables (frequency, normalized frequency, relative frequency, t-value, MI-value, logDice, statistical significance in the z-test, statistical significance in the Pearson's test χ2, Cramer's V - depending on the needs of the research), description of the formulas used;
- interpretation of concordance (up to 1 page) with examples of query results;
- interpretation of statistical data (up to 1/2 page);
- integration of qualitative and quantitative data in conclusions (1/2 page).
Report volume: 4-5 pages. text; 1-3 p. examples of tables and concordances.
|
||
|
2.
Examination |
-
|
-
|
|
Knows the corpora, identifies a research problem, formulates a hypothesis, selects key words, extracts and interprets collocations and concordances, extracts and process statistical data, formulates conclusions. Independent work, participation in classes, exam.
|
||
Study Course Theme Plan
-
Video Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
The concept of Corpus. Statistic regularity of language use. Sketch Engine programme.
|
-
Video Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Word frequency. Data normalization and relative frequency.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Statement of a language use research problem, selection of key words.
|
-
Video Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Word frequency (cntd.). Concordance.
|
-
Video Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Collocation. Statistical analysis (MI-, MS-, t-scores)
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Defining meaning of words: dictionary and corpus.
|
-
Video Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Collocation (cntd.). Statistical analysis (logDice).
|
-
Video Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Corpus statistics (Pearson's chi-squared test, Cramer's V, Residuals)
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Collocation of the researched words: qualitative and quantitative analysis.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Presentation and discussion of researh paper.
|
Bibliography
Required Reading
Kruks, S., I. Skulte. (2016). „Politikas izzušana Saeimas diskursā”. Latvijas Zinātņu akadēmijas Vēstis: 49-56.
Kruks, S. (2020). “Uzticības, sadarbības un vienotības konceptu izpratne Nacionālajā attīstības plānā 2021.-2027. gadam”. Akadēmiskā Dzīve 56: 131-147.
McEnery, Tony and Andrew Hardie. (2012). Corpus Linguistics. Method, Theory and Practice. Oxford: Oxford University Press.
Paquot, M. and S. Gries. (2020). Practical Handbook of Corpus Linguistics. Springer.
Additional Reading
Barczewska, Shala. (2017). Corpus-Based Analysis of US Press Discourse. Cambridge Scholars Publisher.
Cunningham, Clark D. and Jesse Egbert. (2020). Analyzing legal discourse in the United States. Pp. 462-480 in Friginal E., Hardy J. (eds) The Routledge Handbook of Corpus Approaches to Discourse Analysis. London: Routledge.
Darģis, R., G. Rābante-Buša, I. Auziņa, S. Kruks. (2016). „ParliSearch – a system for large text corpus discourse analysis”. Pp. 115-121 in I. Skadiņa, R. Rozis (eds) Human Language Technologies – The Baltic Perspective. IOS Press.
Gerring, J. (1999). "What Makes a Concept Good? A Criterial Framework for Understanding Concept Formation in the Social Sciences". Polity, Vol.31, No.3, (Spring), pp. 357–393.
Kaal, B., I. Maks and A. van Elfrinkhof. (2014). From Text to Political Positions. Amsterdam: John Benjamins.
Marmor, Andrei and Scott Soames. (2011). Philosophical Foundations of Language and Law. Oxford: Oxford University Press.