Veidlapa Nr. M-3 (8)
Study Course Description

Machine Learning and Big Data

Main Study Course Information

Course Code
SL_120
Branch of Science
Mathematics; Theory of probability and mathematical statistics
ECTS
3.00
Target Audience
Life Science
LQF
Level 7
Study Type And Form
Full-Time; Part-Time

Study Course Implementer

Course Supervisor
Structure Unit Manager
Structural Unit
Statistics Unit
Contacts

14 Balozu street, Block A, Riga, +371 67060897, statistika@rsu.lv, www.rsu.lv/statlab

About Study Course

Objective

Machine learning (ML) involves the study of algorithms that can extract information automatically and induce new knowledge from data. ML tasks are often related to large datasets, that create challenges in the areas of data storage, organization and processing. The response to these challenges is addressed by the discipline of the big data analytics. The aim of this course is to introduce students to the most important methods of machine learning: variations of regression and classification algorithms, as well as introduce the concepts of deep learning and big data analytics. The methods will be explored by case studies implemented in R program.

Preliminary Knowledge

Higher mathematics, probability, statistics, basic knowledge of R programming.

Learning Outcomes

Knowledge

1.• Selects the resampling methods and criteria of model accuracy assessment. • Explain the most important regression and classification algorithms. • Identifies the Big Data concept.

Skills

1.• Can independently implement regression and classification machine learning algorithms in R. • Analytical evaluation R computational limitations and selects strategies to overcome those.

Competences

1.• Can critically compare various machine learning strategies and choose the appropriate algorithm for the problem at hand.

Assessment

Individual work

Title
% from total grade
Grade
1.

Individual work

-
-
1. Review of compulsory and additional literature to expand the knowledge acquired in lectures and classes. 2. Students will be expected to hand in 4 R based computer assignments related to course topics.

Examination

Title
% from total grade
Grade
1.

Examination

-
-
Assessment on the 10-point scale according to the RSU Educational Order: • Computer assignments to be handed in – 70%. • Final exam – 30%.

Study Course Theme Plan

FULL-TIME
Part 1
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Introduction to machine learning. Assessing model accuracy, bias-variance trade-off, resampling methods (validation set approach, crossvalidation and bootstrap).
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

R case study: assessing bias-variance trade-off for linear models. Setting up models with caret library in R.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Linear model selection: subset selection and shrinkage methods (Ridge, Lasso). Principal component regression.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Implementing regression methods in R. Comparing the performance of various regression models.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Classification methods I: KNN, tree-classification, random forests.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Implementing simple classification models in R. Comparing the performance of various models.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Classification methods II: Ensamble methods for classification trees (bagging, boosting, Xgboost), Support Vector Machines.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Implementing classification models with ensamble methods and SVM in R. Comparing the performance of various models.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Principles of neural networks and deep learning. Data representation via tensors, tensor operations and gradient. Layers, loss functions and optimizers.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Setting up a keras workstation. Exploring deep learning applications for regression, text and image classification using keras library in R.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
2

Topics

Concept and history of Big Data. Limitations of R and possible solutions: parallel computing, data.table library, Spark for R.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Setting up Spark for R. Analysing large data processing with R: comparing ease of use and computation times between base, data.table, parallel and Spark approaches.
Total ECTS (Creditpoints):
3.00
Contact hours:
24 Academic Hours
Final Examination:
Exam (Written)
PART-TIME
Part 1
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
1

Topics

Introduction to machine learning. Assessing model accuracy, bias-variance trade-off, resampling methods (validation set approach, crossvalidation and bootstrap).
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

R case study: assessing bias-variance trade-off for linear models. Setting up models with caret library in R.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
1

Topics

Linear model selection: subset selection and shrinkage methods (Ridge, Lasso). Principal component regression.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Implementing regression methods in R. Comparing the performance of various regression models.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
1

Topics

Classification methods I: KNN, tree-classification, random forests.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Implementing simple classification models in R. Comparing the performance of various models.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
1

Topics

Classification methods II: Ensamble methods for classification trees (bagging, boosting, Xgboost), Support Vector Machines.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Implementing classification models with ensamble methods and SVM in R. Comparing the performance of various models.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
1

Topics

Principles of neural networks and deep learning. Data representation via tensors, tensor operations and gradient. Layers, loss functions and optimizers.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Setting up a keras workstation. Exploring deep learning applications for regression, text and image classification using keras library in R.
  1. Lecture

Modality
Location
Contact hours
On site
Computer room
1

Topics

Concept and history of Big Data. Limitations of R and possible solutions: parallel computing, data.table library, Spark for R.
  1. Class/Seminar

Modality
Location
Contact hours
On site
Computer room
2

Topics

Setting up Spark for R. Analysing large data processing with R: comparing ease of use and computation times between base, data.table, parallel and Spark approaches.
Total ECTS (Creditpoints):
3.00
Contact hours:
18 Academic Hours
Final Examination:
Exam (Written)

Bibliography

Required Reading

1.

Chollet, F., Allaire, J.J. (2018) Deep learning with R, Manning Publications, Shelter Island. Parts I, II and III.Suitable for English stream

2.

Luraschi, J., Kuo, K., Ruiz E. (2019) Mastering Spark with R. O’Reilly. Chapters 1 – 4.Suitable for English stream

Additional Reading

1.

James, G., Witten, D., Hastie, T. and Tibshirani (2013). An Introduction to Statistical Learning with Applications in R., R., Springer-VerlagSuitable for English stream

2.

Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning., Springer-VerlagSuitable for English stream

3.

Simon Walkowiak (2016). Big data analytics with R. Utilize R to uncover hidden patterns in your Big Data. Packt Publishing, Birmingham, Chapters 3 - 7.Suitable for English stream

4.

Torgo, J. (2017) Data mining with R: learning with Case Studies, Chapman & Hall/CRCSuitable for English stream