Data Analytics, Machine Learning, and Applications in Health Care
Study Course Implementer
Riga, Kronvalda boulevard 9, +371 67338307
About Study Course
Objective
To acquire in-depth knowledge, abilities and skills in specific methods of mathematical statistics and latest data science for study purposes; work in a public health specialty; and to promote the learning and practical application of data science terminology.
Preliminary Knowledge
Research methodology, basic principles of statistics, mathematics (preferable) – logarithms, differentials, computer skills, health data types and elements.
Learning Outcomes
Knowledge
1.As a result of successfully completing the study course, students: Will recognise the terminology of time series analysis and its use; Will be familiar with the functionality offered by Oxmetrics in time series analysis; Will learn how to formulate, develop and deploy regression and classification models using the KNIME platform.
Skills
1.As a result of successfully completing the study course, students will be able: - To open, create, and edit time series data in Oxmetrics; - To correctly prepare a descriptive model of a univariate series using the Oxmetrics platform; - To correctly prepare a descriptive model of a multivariate series using the Oxmetrics platform; - To open and prepare data for the development of regression and classification models using the KNIME platform; - To set up and execute regression and classification model procedures on the KNIME platform; - To assess the validity of models using the KNIME platform; - To identify the main factors of a model and the form of their impact on the KNIME platform; - To explain the model deployment and monitoring practices; - To create a description of the methods and results used.
Competences
1.As a result of successfully completing the study course, students will be able: To correctly interpret and evaluate the use of time series models in the public health sector; To plan, set up and evaluate regression and classification models using healthcare data.
Assessment
Individual work
|
Title
|
% from total grade
|
Grade
|
|---|---|---|
|
1.
Individual work |
-
|
-
|
|
Individual work with course materials – preparation for a class, according to thematic planning; performance of homework.
|
||
Examination
|
Title
|
% from total grade
|
Grade
|
|---|---|---|
|
1.
Examination |
-
|
-
|
|
Active participation in lectures and practical classes.
Checking submitted homework.
At the end of the study course, an examination in which knowledge of terminology and methods and practical application thereof are tested – 40%;
Practical tasks – 30%;
Test work – 30%.
For each missed class – a summary of the topic using the specified literature (min. one A4 page).
|
||
Study Course Theme Plan
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Time series data specifics. Examples. Univariate time series analysis: trend, stability, seasonality, extreme values. Univariate time series forecasting models. Trends and seasonality factors. Differencing. One-time factors. Autoregression. Model quality criteria. Notation. Forecasting interval.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Time series data specifics. Examples. Univariate time series analysis: trend, stability, seasonality, extreme values. Univariate time series forecasting models. Trends and seasonality factors. Differencing. One-time factors. Autoregression. Model quality criteria. Notation. Forecasting interval.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Multivariate time series data models. Structural and forecasting models. Simultaneous effects. Time-delayed effects. Missing effects. False correlation. Logarithmic formulation.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Multivariate time series data models. Structural and forecasting models. Simultaneous effects. Time-delayed effects. Missing effects. False correlation. Logarithmic formulation.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Multivariate time series data model as a simulation platform.
Time series models compared to traditional SIR type epidemiological models.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Multivariate time series data model as a simulation platform.
Time series models compared to traditional SIR type epidemiological models.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Traditional regression model compared to the random forest data science algorithm. Decision tree structure. Preparation of data. Model parameters. Comparative diagnostics of models.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Traditional regression model compared to the random forest data science algorithm. Decision tree structure. Preparation of data. Model parameters. Comparative diagnostics of models.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Classification models: synchronous and diachronous classification. Decision trees and decision tree ensembles. GBM and XGBoost algorithms. Diagnostics and interpretation of classification models.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Deployment of predictive models. Structuring of data. Problem of future information and its prevention. Monitoring of data and models. Model corrections.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Deployment of predictive models. Structuring of data. Problem of future information and its prevention. Monitoring of data and models. Model corrections.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Model engineering. Problem of missing values and its correction. Derivation of new variables.
Filtering potential forecasters and optimising model parameters. AutoML procedure.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Model engineering. Problem of missing values and its correction. Derivation of new variables.
Filtering potential forecasters and optimising model parameters. AutoML procedure.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Demonstration of model’s contribution in practice. Specifics, planning and statistical analysis of experimental method and field experiments. Detection of causal improvement factors.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Demonstration of model’s contribution in practice. Specifics, planning and statistical analysis of experimental method and field experiments. Detection of causal improvement factors.
|
Bibliography
Required Reading
Timothy L. Wiemken and Robert R. Kelley: Machine Learning in Epidemiology and Health Outcomes Research. Annual Review of Public Health 2020 41:1, 21-36.Suitable for English stream
Uģis Sprūdžs. Sirds un asinsrites slimību mirstības riska prognoze nākamajam gadam no anonimizētiem Latvijas veselības aprūpes sistēmas datiem: XGBoost mašīnmācīšanās algoritma iespējamības pārbaude. Akadēmiskā Dzīve (lu.lv) 2023 59, 88-94Suitable for English stream
Additional Reading
Bradley Efron, Trevor Hastie. Computer Age Statistical Inference, Student Edition Algorithms, Evidence, and Data Science. Cambridge University Press, 2021.Suitable for English stream