Categorical Data Analysis
Study Course Implementer
14 Baložu street, Block A, Riga, statistika@rsu.lv, +371 67060897
About Study Course
Objective
Preliminary Knowledge
Learning Outcomes
Knowledge
1.• On successful course completion students will be familiar with a range of statistical analysis methodology available for categorical data. They will know and interpret the large sample as well the small sample tests. • Will detect the nature of categorical data; how to measure dependence between categorical variables based of the type of study and type of variables (nominal or ordinal). • Students will demonstrate how to model a binary outcome variable using continuous or categorical variables.
Skills
1.• Student will understand and explain the effect of different types of data collection methods to the random nature of the frequency table. Interpret the distributional models for the frequency table, for its rows and columns. • Explain the dependence measures defined on the joint distribution of 2 categorical variables (relative risk, odds ratio, etc.), can interpret and estimate them. • Can test goodness-of-fit of data with the assumed distributional model, can test independence of categorical variables. • Can model the categorical variable (binary, in special case) by other variables. • Can apply independently his/her knowledge on real data.
Competences
1.• On successful course completion student will be competent to read and critically assess the scientific publications that have used categorical data in their analysis. • Student will be competent to plan and execute data analysis with categorical data.
Assessment
Individual work
|
Title
|
% from total grade
|
Grade
|
|---|---|---|
|
1.
Individual work |
-
|
-
|
|
1. Individual work with the course material in preparation to lectures according to plan.
2. Independently prepared homeworks by practicing the concepts studied in the course.
In order to evaluate the quality of the study course as a whole, the student should fill out the study course evaluation questionnaire on the Student Portal.
|
||
Examination
|
Title
|
% from total grade
|
Grade
|
|---|---|---|
|
1.
Examination |
-
|
-
|
|
Assessment on the 10-point scale according to the RSU Educational Order:
• 2 independent homeworks – 50%.
• Attendance and active participation during practical classes – 25%.
• Final written exam – 25%.
|
||
Study Course Theme Plan
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
The nature of categorical data. Classification by purpose and scale. Types of studies. Probability distributions. Overdispersion.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Joint distribution of categorical variables. Conditional and marginal distributions. Maximum likelihood estimates to probabilities.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Independence. Measures of dependence, relative risk, odds, odds ratio. 2x2 frequency tables. Estimates from frequency table. Conditional probabilities – sensitivity, specificity. True negative, false positive.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Introduction to "Jamovi". Visualising categorical data. Comparing distributions. Frequency tables. Conditional frequencies. Estimating measures of dependence.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Larger than 2 x 2 tables. Measurements of dependence for ordinal and nominal data. Hypothesis about population distribution. Chi-square test.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Measurements of dependence for ordinal and nominal data. Hypothesis about independence and conditional independence.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Large sample case, hypothesis about independence. Chi-square and likelihood ratio test. Small sample case, Fisher’s exact test.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Hypothesis about independence and conditional independence.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Asymptotic distribution of multinomial frequencies. Confidence intervals for odds ratio and relative risk. Testing homogeneity of marginal distribution in case of paired observations.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Interval estimation for measures of dependence. McNemar test.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Models for binary outcome variable – Logit- and log-linear models. Models for retrospective studies. Decision trees for classification.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Modelling categorical data. Classified data and raw data. Modelling and decision trees. Classification error.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
The nature of categorical data. Classification by purpose and scale. Types of studies. Probability distributions. Overdispersion.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Joint distribution of categorical variables. Conditional and marginal distributions. Maximum likelihood estimates to probabilities.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Independence. Measures of dependence, relative risk, odds, odds ratio. 2x2 frequency tables. Estimates from frequency table. Conditional probabilities – sensitivity, specificity. True negative, false positive.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Introduction to "Jamovi". Visualising categorical data. Comparing distributions. Frequency tables. Conditional frequencies. Estimating measures of dependence.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Larger than 2 x 2 tables. Measurements of dependence for ordinal and nominal data. Hypothesis about population distribution. Chi-square test.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Pool
|
2
|
Topics
|
Measurements of dependence for ordinal and nominal data. Hypothesis about independence and conditional independence.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Large sample case, hypothesis about independence. Chi-square and likelihood ratio test. Small sample case, Fisher’s exact test.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Hypothesis about independence and conditional independence.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Asymptotic distribution of multinomial frequencies. Confidence intervals for odds ratio and relative risk. Testing homogeneity of marginal distribution in case of paired observations.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Interval estimation for measures of dependence. McNemar test.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Models for binary outcome variable – Logit- and log-linear models. Models for retrospective studies. Decision trees for classification.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Modelling categorical data. Classified data and raw data. Modelling and decision trees. Classification error.
|
Bibliography
Required Reading
Agresti, Alan. Categorical Data Analysis. Wiley, 2012 (or 1990, 2002 editions).
Additional Reading
Agresti, Alan. An Introduction to Categorical Data Analysis. Wiley, 2019 (or 1996, 2007 editions).