Multivariate Statistics
Study Course Implementer
14 Balozu street, Block A, Riga, +371 67060897, statistika@rsu.lv, www.rsu.lv/statlab
About Study Course
Objective
The aim of the course is to introduce the tools and concepts of multivariate data analysis with a strong focus on applications with R program.
Preliminary Knowledge
Higher mathematics, probability, statistics, linear models, basic knowledge of R programming.
Learning Outcomes
Knowledge
1.Student: • has gained an in-depth knowledge of the theoretic probabilistic concepts related to multivariate analysis. • illustrates the visualization techniques describing the multivariate data. • assesses the most important multivariate techniques such as principal components analysis, factor analysis, cluster analysis and discriminant analysis.
Skills
1.• Implements appropriate multivariate data visualizations in R programme. • Can independently apply multivariate data analysis techniques in R programme, to carry out research activities or highly qualified professional functions.
Competences
1.• Can compare and understand the aims of various multivariate data analysis methods and choose the most appropriate for the analysis of the data set. • Can generate hypothesis and make analysis-based decisions related to multivariate data.
Assessment
Individual work
|
Title
|
% from total grade
|
Grade
|
|---|---|---|
|
1.
Individual work |
-
|
-
|
|
1. Review of compulsory and additional literature to expand the knowledge acquired in lectures and classes.
2. Students will be expected to prepare five R based home assignments related to each of the topics:
a. Principal components analysis (2nd practical class).
b. Factor analysis (3rd practical class).
c. Discriminant analysis (4th practical class).
d. Cluster analysis (5th practical class).
e. Multivariable linear regression (6th practical class).
|
||
Examination
|
Title
|
% from total grade
|
Grade
|
|---|---|---|
|
1.
Examination |
-
|
-
|
|
Assessment on the 10-point scale according to the RSU Educational Order:
• 5 home assignments to be handed in – 70%.
• Written Exam – 30%.
|
||
Study Course Theme Plan
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Introduction to multivariate analysis, multivariate dataset examples, Covariance, correlation, multivariate normal distribution. Repetition of basic linear algebra elements: determinants, inverse, eigenvalues and eigenvectors, decompositions and quadratic forms.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Visualizing of multivariate data with R. Practicing matrix algebra calculations in R.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Principal components (PC) analysis. A geometrical approach to reducing data matrix dimension. Definition, interpretation and inference on principal components. Normalized principal components.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Real-world data example analysis in R: calculating PCs, determining statistical significance, drawing plots to interpreting PCs.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Factor analysis. Orthogonal factor model. Interpretation of factors. Test for the number of common factors. Comparison with PC analysis.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Factor analysis in R: estimating the factor model, testing the number of factors, drawing plots to interpret factors.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Discriminant analysis. Classes, labels and classification accuracy measures. Linear and quadratic discriminant analysis.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Discriminant analysis in R.: estimation, interpretation, comparison of methods.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Cluster analysis. Proximity between objects, distance functions. Various clusterization algorithms.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Cluster analysis in R: realizing and comparing various clusterization algorithms. Determining the optimal number of clusters.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
2
|
Topics
|
Multivariable linear regression. Multivariate normality tests.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Multivariable linear regression in R: estimation, testing and interpretation.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Introduction to multivariate analysis, multivariate dataset examples, Covariance, correlation, multivariate normal distribution. Repetition of basic linear algebra elements: determinants, inverse, eigenvalues and eigenvectors, decompositions and quadratic forms.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Visualizing of multivariate data with R. Practicing matrix algebra calculations in R.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Principal components (PC) analysis. A geometrical approach to reducing data matrix dimension. Definition, interpretation and inference on principal components. Normalized principal components.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Real-world data example analysis in R: calculating PCs, determining statistical significance, drawing plots to interpreting PCs.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Factor analysis. Orthogonal factor model. Interpretation of factors. Test for the number of common factors. Comparison with PC analysis.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Factor analysis in R: estimating the factor model, testing the number of factors, drawing plots to interpret factors.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Discriminant analysis. Classes, labels and classification accuracy measures. Linear and quadratic discriminant analysis.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Discriminant analysis in R.: estimation, interpretation, comparison of methods.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Cluster analysis. Proximity between objects, distance functions. Various clusterization algorithms.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Cluster analysis in R: realizing and comparing various clusterization algorithms. Determining the optimal number of clusters.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Auditorium
|
1
|
Topics
|
Multivariable linear regression. Multivariate normality tests.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Multivariable linear regression in R: estimation, testing and interpretation.
|
Bibliography
Required Reading
W. K. Haerdle Härdle, L. Simar, Applied Multivariate Statistical Analysis. Springer. 2015Suitable for English stream
D. Zelterman. Applied Multivariate Statistics with R. Springer, Statistics for biology and health series, 2015Suitable for English stream
Additional Reading
R. A. Johnson, D.W. Wickern, Applied Multivariate Statistical Analysis, 6th edition. Prentice & Hall, 2007Suitable for English stream
T. Hothorn, B. Everitt, An Introduction to Applied Multivariate Analysis with R. Springer, Use R! series, 2011Suitable for English stream