Veidlapa Nr. M-3 (8)

Study Course Description

Data Analytics, Machine Learning, and Applications in Health Care

Main Study Course Information

Course Code

SVI_002

Branch of Science

Mathematics; Theory of probability and mathematical statistics

ECTS

5.00

Target Audience

Health Management; Public Health

LQF

Level 7

Study Type And Form

Full-Time

Study Course Implementer

Course Supervisor

Uģis Kārlis Sprūdžs

Structure Unit Manager

Daiga Behmane

Structural Unit

Health Management Teaching Group

Contacts

Riga, Kronvalda boulevard 9, +371 67338307

About Study Course

Objective

To acquire in-depth knowledge, abilities and skills in specific methods of mathematical statistics and latest data science for study purposes; work in a public health specialty; and to promote the learning and practical application of data science terminology.

Preliminary Knowledge

Research methodology, basic principles of statistics, mathematics (preferable) – logarithms, differentials, computer skills, health data types and elements.

Learning Outcomes

Knowledge

1.As a result of successfully completing the study course, students: Will recognise the terminology of time series analysis and its use; Will be familiar with the functionality offered by Oxmetrics in time series analysis; Will learn how to formulate, develop and deploy regression and classification models using the KNIME platform.

Skills

1.As a result of successfully completing the study course, students will be able: - To open, create, and edit time series data in Oxmetrics; - To correctly prepare a descriptive model of a univariate series using the Oxmetrics platform; - To correctly prepare a descriptive model of a multivariate series using the Oxmetrics platform; - To open and prepare data for the development of regression and classification models using the KNIME platform; - To set up and execute regression and classification model procedures on the KNIME platform; - To assess the validity of models using the KNIME platform; - To identify the main factors of a model and the form of their impact on the KNIME platform; - To explain the model deployment and monitoring practices; - To create a description of the methods and results used.

Competences

1.As a result of successfully completing the study course, students will be able: To correctly interpret and evaluate the use of time series models in the public health sector; To plan, set up and evaluate regression and classification models using healthcare data.

Assessment

Individual work

Title	% from total grade	Grade
1. Individual work	-	-
Individual work with course materials – preparation for a class, according to thematic planning; performance of homework.

Examination

Title	% from total grade	Grade
1. Examination	-	-
Active participation in lectures and practical classes. Checking submitted homework. At the end of the study course, an examination in which knowledge of terminology and methods and practical application thereof are tested – 40%; Practical tasks – 30%; Test work – 30%. For each missed class – a summary of the topic using the specified literature (min. one A4 page).

Study Course Theme Plan

FULL-TIME

Part 1

Lecture

Modality	Location	Contact hours
On site	Computer room	2

Topics

Time series data specifics. Examples. Univariate time series analysis: trend, stability, seasonality, extreme values. Univariate time series forecasting models. Trends and seasonality factors. Differencing. One-time factors. Autoregression. Model quality criteria. Notation. Forecasting interval.

Class/Seminar

Modality	Location	Contact hours
On site	Computer room	2

Topics

Lecture

Modality	Location	Contact hours
On site	Computer room	2

Topics

Multivariate time series data models. Structural and forecasting models. Simultaneous effects. Time-delayed effects. Missing effects. False correlation. Logarithmic formulation.

Class/Seminar

Modality	Location	Contact hours
On site	Computer room	2

Topics

Multivariate time series data models. Structural and forecasting models. Simultaneous effects. Time-delayed effects. Missing effects. False correlation. Logarithmic formulation.

Lecture

Modality	Location	Contact hours
On site	Computer room	2

Topics

Multivariate time series data model as a simulation platform. Time series models compared to traditional SIR type epidemiological models.

Class/Seminar

Modality	Location	Contact hours
On site	Computer room	2

Topics

Multivariate time series data model as a simulation platform. Time series models compared to traditional SIR type epidemiological models.

Lecture

Modality	Location	Contact hours
On site	Computer room	2

Topics

Traditional regression model compared to the random forest data science algorithm. Decision tree structure. Preparation of data. Model parameters. Comparative diagnostics of models.

Class/Seminar

Modality	Location	Contact hours
On site	Computer room	2

Topics

Traditional regression model compared to the random forest data science algorithm. Decision tree structure. Preparation of data. Model parameters. Comparative diagnostics of models.

Lecture

Modality	Location	Contact hours
On site	Computer room	2

Topics

Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.

Lecture

Modality	Location	Contact hours
On site	Computer room	2

Topics

Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.

Class/Seminar

Modality	Location	Contact hours
On site	Computer room	2

Topics

Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.

Class/Seminar

Modality	Location	Contact hours
On site	Computer room	2

Topics

Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.

Lecture

Modality	Location	Contact hours
On site	Computer room	2

Topics

Deployment of predictive models. Structuring of data. Problem of future information and its prevention. Monitoring of data and models. Model corrections.

Class/Seminar

Modality	Location	Contact hours
On site	Computer room	2

Topics

Deployment of predictive models. Structuring of data. Problem of future information and its prevention. Monitoring of data and models. Model corrections.

Lecture

Modality	Location	Contact hours
On site	Computer room	2

Topics

Model engineering. Problem of missing values and its correction. Derivation of new variables. Filtering potential forecasters and optimising model parameters. AutoML procedure.

Class/Seminar

Modality	Location	Contact hours
On site	Computer room	2

Topics

Model engineering. Problem of missing values and its correction. Derivation of new variables. Filtering potential forecasters and optimising model parameters. AutoML procedure.

Lecture

Modality	Location	Contact hours
On site	Computer room	2

Topics

Demonstration of model’s contribution in practice. Specifics, planning and statistical analysis of experimental method and field experiments. Detection of causal improvement factors.

Lecture

Modality	Location	Contact hours
On site	Computer room	2

Topics

Demonstration of model’s contribution in practice. Specifics, planning and statistical analysis of experimental method and field experiments. Detection of causal improvement factors.

Total ECTS (Creditpoints):

5.00

Contact hours:

36 Academic Hours

Final Examination:

Exam (Written)

Bibliography

Required Reading

Timothy L. Wiemken and Robert R. Kelley: Machine Learning in Epidemiology and Health Outcomes Research. Annual Review of Public Health 2020 41:1, 21-36.Suitable for English stream

Uģis Sprūdžs. Sirds un asinsrites slimību mirstības riska prognoze nākamajam gadam no anonimizētiem Latvijas veselības aprūpes sistēmas datiem: XGBoost mašīnmācīšanās algoritma iespējamības pārbaude. Akadēmiskā Dzīve (lu.lv) 2023 59, 88-94Suitable for English stream

KNIMESuitable for English stream

Additional Reading

Bradley Efron, Trevor Hastie. Computer Age Statistical Inference, Student Edition Algorithms, Evidence, and Data Science. Cambridge University Press, 2021.Suitable for English stream

Jurgen Doornik. An Introduction to OxMetrics 9. Timberlake Press, 2021.Suitable for English stream

Data Analytics, Machine Learning, and Applications in Health Care

Main Study Course Information

Study Course Implementer

About Study Course

Objective

Preliminary Knowledge

Learning Outcomes

Knowledge

Skills

Competences

Assessment

Individual work

Examination

Study Course Theme Plan

Lecture

Topics

Class/Seminar

Topics

Lecture

Topics

Class/Seminar

Topics

Lecture

Topics

Class/Seminar

Topics

Lecture

Topics

Class/Seminar

Topics

Lecture

Topics

Lecture

Topics

Class/Seminar

Topics

Class/Seminar

Topics

Lecture

Topics

Class/Seminar

Topics

Lecture

Topics

Class/Seminar

Topics

Lecture

Topics

Lecture

Topics

Bibliography

Required Reading

Additional Reading