Machine Learning and Big Data
Study Course Implementer
14 Balozu street, Block A, Riga, +371 67060897, statistika@rsu.lv, www.rsu.lv/statlab
About Study Course
Objective
Preliminary Knowledge
Learning Outcomes
Knowledge
1.• Selects the resampling methods and criteria of model accuracy assessment. • Explain the most important regression and classification algorithms. • Identifies the Big Data concept.
Skills
1.• Can independently implement regression and classification machine learning algorithms in R. • Analytical evaluation R computational limitations and selects strategies to overcome those.
Competences
1.• Can critically compare various machine learning strategies and choose the appropriate algorithm for the problem at hand.
Assessment
Individual work
|
Title
|
% from total grade
|
Grade
|
|---|---|---|
|
1.
Individual work |
-
|
-
|
|
1. Review of compulsory and additional literature to expand the knowledge acquired in lectures and classes.
2. Students will be expected to hand in 4 R based computer assignments related to course topics.
|
||
Examination
|
Title
|
% from total grade
|
Grade
|
|---|---|---|
|
1.
Examination |
-
|
-
|
|
Assessment on the 10-point scale according to the RSU Educational Order:
• Computer assignments to be handed in – 70%.
• Final exam – 30%.
|
||
Study Course Theme Plan
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Introduction to machine learning. Assessing model accuracy, bias-variance trade-off, resampling methods (validation set approach, crossvalidation and bootstrap).
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
R case study: assessing bias-variance trade-off for linear models. Setting up models with caret library in R.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Linear model selection: subset selection and shrinkage methods (Ridge, Lasso). Principal component regression.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Implementing regression methods in R. Comparing the performance of various regression models.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Classification methods I: KNN, tree-classification, random forests.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Implementing simple classification models in R. Comparing the performance of various models.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Classification methods II: Ensamble methods for classification trees (bagging, boosting, Xgboost), Support Vector Machines.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Implementing classification models with ensamble methods and SVM in R. Comparing the performance of various models.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Principles of neural networks and deep learning. Data representation via tensors, tensor operations and gradient. Layers, loss functions and optimizers.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Setting up a keras workstation. Exploring deep learning applications for regression, text and image classification using keras library in R.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Concept and history of Big Data. Limitations of R and possible solutions: parallel computing, data.table library, Spark for R.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Setting up Spark for R. Analysing large data processing with R: comparing ease of use and computation times between base, data.table, parallel and Spark approaches.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
1
|
Topics
|
Introduction to machine learning. Assessing model accuracy, bias-variance trade-off, resampling methods (validation set approach, crossvalidation and bootstrap).
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
R case study: assessing bias-variance trade-off for linear models. Setting up models with caret library in R.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
1
|
Topics
|
Linear model selection: subset selection and shrinkage methods (Ridge, Lasso). Principal component regression.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Implementing regression methods in R. Comparing the performance of various regression models.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
1
|
Topics
|
Classification methods I: KNN, tree-classification, random forests.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Implementing simple classification models in R. Comparing the performance of various models.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
1
|
Topics
|
Classification methods II: Ensamble methods for classification trees (bagging, boosting, Xgboost), Support Vector Machines.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Implementing classification models with ensamble methods and SVM in R. Comparing the performance of various models.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
1
|
Topics
|
Principles of neural networks and deep learning. Data representation via tensors, tensor operations and gradient. Layers, loss functions and optimizers.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Setting up a keras workstation. Exploring deep learning applications for regression, text and image classification using keras library in R.
|
-
Lecture
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
1
|
Topics
|
Concept and history of Big Data. Limitations of R and possible solutions: parallel computing, data.table library, Spark for R.
|
-
Class/Seminar
|
Modality
|
Location
|
Contact hours
|
|---|---|---|
|
On site
|
Computer room
|
2
|
Topics
|
Setting up Spark for R. Analysing large data processing with R: comparing ease of use and computation times between base, data.table, parallel and Spark approaches.
|
Bibliography
Required Reading
Chollet, F., Allaire, J.J. (2018) Deep learning with R, Manning Publications, Shelter Island. Parts I, II and III.Suitable for English stream
Luraschi, J., Kuo, K., Ruiz E. (2019) Mastering Spark with R. O’Reilly. Chapters 1 – 4.Suitable for English stream
Additional Reading
James, G., Witten, D., Hastie, T. and Tibshirani (2013). An Introduction to Statistical Learning with Applications in R., R., Springer-VerlagSuitable for English stream
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning., Springer-VerlagSuitable for English stream
Simon Walkowiak (2016). Big data analytics with R. Utilize R to uncover hidden patterns in your Big Data. Packt Publishing, Birmingham, Chapters 3 - 7.Suitable for English stream
Torgo, J. (2017) Data mining with R: learning with Case Studies, Chapman & Hall/CRCSuitable for English stream