Veidlapa Nr. M-3 (8)
Study Course Description

Data Analytics, Machine Learning, and Applications in Health Care

Main Study Course Information

Course Code
SVI_002
Branch of Science
Mathematics; Theory of probability and mathematical statistics
ECTS
5.00
Target Audience
Health Management; Public Health
LQF
Level 7
Study Type And Form
Full-Time

Study Course Implementer

Course Supervisor
Structure Unit Manager
Structural Unit
Health Management Teaching Group
Contacts

Riga, Kronvalda boulevard 9, +371 67338307

About Study Course

Objective

To acquire in-depth knowledge, abilities and skills in specific methods of mathematical statistics and latest data science for study purposes; work in a public health specialty; and to promote the learning and practical application of data science terminology.

Preliminary Knowledge

Research methodology, basic principles of statistics, mathematics (preferable) – logarithms, differentials, computer skills, health data types and elements.

Learning Outcomes

Knowledge

1.As a result of successfully completing the study course, students: Will recognise the terminology of time series analysis and its use; Will be familiar with the functionality offered by Oxmetrics in time series analysis; Will learn how to formulate, develop and deploy regression and classification models using the KNIME platform.

Skills

1.As a result of successfully completing the study course, students will be able: - To open, create, and edit time series data in Oxmetrics; - To correctly prepare a descriptive model of a univariate series using the Oxmetrics platform; - To correctly prepare a descriptive model of a multivariate series using the Oxmetrics platform; - To open and prepare data for the development of regression and classification models using the KNIME platform; - To set up and execute regression and classification model procedures on the KNIME platform; - To assess the validity of models using the KNIME platform; - To identify the main factors of a model and the form of their impact on the KNIME platform; - To explain the model deployment and monitoring practices; - To create a description of the methods and results used.

Competences

1.As a result of successfully completing the study course, students will be able: To correctly interpret and evaluate the use of time series models in the public health sector; To plan, set up and evaluate regression and classification models using healthcare data.

Assessment

Individual work

Title
% from total grade
Grade
1.

Individual work

-
-
Individual work with course materials – preparation for a class, according to thematic planning; performance of homework.

Examination

Title
% from total grade
Grade
1.

Examination

-
-
Active participation in lectures and practical classes. Checking submitted homework. At the end of the study course, an examination in which knowledge of terminology and methods and practical application thereof are tested – 40%; Practical tasks – 30%; Test work – 30%. For each missed class – a summary of the topic using the specified literature (min. one A4 page).

Study Course Theme Plan

FULL-TIME
Part 1
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Time series data specifics. Examples. Univariate time series analysis: trend, stability, seasonality, extreme values. Univariate time series forecasting models. Trends and seasonality factors. Differencing. One-time factors. Autoregression. Model quality criteria. Notation. Forecasting interval.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Time series data specifics. Examples. Univariate time series analysis: trend, stability, seasonality, extreme values. Univariate time series forecasting models. Trends and seasonality factors. Differencing. One-time factors. Autoregression. Model quality criteria. Notation. Forecasting interval.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Multivariate time series data models. Structural and forecasting models. Simultaneous effects. Time-delayed effects. Missing effects. False correlation. Logarithmic formulation.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Multivariate time series data models. Structural and forecasting models. Simultaneous effects. Time-delayed effects. Missing effects. False correlation. Logarithmic formulation.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Multivariate time series data model as a simulation platform. Time series models compared to traditional SIR type epidemiological models.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Multivariate time series data model as a simulation platform. Time series models compared to traditional SIR type epidemiological models.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Traditional regression model compared to the random forest data science algorithm. Decision tree structure. Preparation of data. Model parameters. Comparative diagnostics of models.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Traditional regression model compared to the random forest data science algorithm. Decision tree structure. Preparation of data. Model parameters. Comparative diagnostics of models.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Deployment of predictive models. Structuring of data. Problem of future information and its prevention. Monitoring of data and models. Model corrections.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Deployment of predictive models. Structuring of data. Problem of future information and its prevention. Monitoring of data and models. Model corrections.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Model engineering. Problem of missing values and its correction. Derivation of new variables. Filtering potential forecasters and optimising model parameters. AutoML procedure.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Model engineering. Problem of missing values and its correction. Derivation of new variables. Filtering potential forecasters and optimising model parameters. AutoML procedure.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Demonstration of model’s contribution in practice. Specifics, planning and statistical analysis of experimental method and field experiments. Detection of causal improvement factors.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Demonstration of model’s contribution in practice. Specifics, planning and statistical analysis of experimental method and field experiments. Detection of causal improvement factors.
Total ECTS (Creditpoints):
5.00
Contact hours:
36 Academic Hours
Final Examination:
Exam (Written)

Bibliography

Required Reading

1.

Timothy L. Wiemken and Robert R. Kelley: Machine Learning in Epidemiology and Health Outcomes Research. Annual Review of Public Health 2020 41:1, 21-36.Suitable for English stream

2.

Uģis Sprūdžs. Sirds un asinsrites slimību mirstības riska prognoze nākamajam gadam no anonimizētiem Latvijas veselības aprūpes sistēmas datiem: XGBoost mašīnmācīšanās algoritma iespējamības pārbaude. Akadēmiskā Dzīve (lu.lv) 2023 59, 88-94Suitable for English stream

3.

KNIMESuitable for English stream

Additional Reading

1.

Bradley Efron, Trevor Hastie. Computer Age Statistical Inference, Student Edition Algorithms, Evidence, and Data Science. Cambridge University Press, 2021.Suitable for English stream

2.

Jurgen Doornik. An Introduction to OxMetrics 9. Timberlake Press, 2021.Suitable for English stream