Veidlapa Nr. M-3 (8)

Study Course Description

Data engineering

Main Study Course Information

Course Code

SZF_174

Branch of Science

Other social sciences

ECTS

6.00

Target Audience

Business Management; Information and Communication Science; Management Science

LQF

Level 7

Study Type And Form

Full-Time

Study Course Implementer

Course Supervisor

Linda Alksne

Structure Unit Manager

Ieva Puzo

Structural Unit

Faculty of Social Sciences

Contacts

Dzirciema street 16, Rīga, szf@rsu.lv

About Study Course

Objective

This course aims to provide business and project managers with an understanding of the fundamentals of data engineering and its importance in modern business. As part of the course, participants will gain knowledge about data flow and data processing processes, which will help them plan and manage projects that use data more successfully, as well as understand the requirements and challenges in creating and maintaining data infrastructure.

Preliminary Knowledge

In order to successfully participate in this data engineering course, participants should have a basic understanding of computer science and IT infrastructure, as well as basic knowledge of databases and data analysis. An understanding of business processes and how data is used to make decisions would also be helpful. Knowledge of project management to better oversee and coordinate data projects from a business perspective will be an advantage.

Learning Outcomes

Knowledge

1.Describe the role and responsibilities of the data engineer and analyse aspects of cooperation with IT specialists and business units.

2.Explain the structure of data flows and compare EV and ELT processes by assessing their benefits and constraints in different contexts.

3.Analyse the structures of data storage systems and compare the suitability of SQL and NoSQL databases for different processing scenarios.

4.Explain the basic principles of batch and streaming data processing and assess their applicability to IoT data processing and telemetry analysis situations.

Individual work and tests

Presentation on the topic studied

5.Demonstrate understanding of the operation of distributed computing systems (Spark, Hadoop) and analyze their use in processing large amounts of data.

6.Compares the functionality of key cloud services (AWS, GCP, Azure) and evaluates their usability in different data engineering contexts.

Individual work and tests

Presentation on the topic studied

7.Describe data integration processes and identify best practices in data quality assurance to maintain accuracy and consistency.

8.Identify key tools and technologies in the data processing ecosystem and explain their role in different environments (local, cloud, etc.).

Individual work and tests

Presentation on the topic studied

9.Analyze data storage room architecture, describe dimensional modeling, and explain the role of OLAP processes in data analysis.

10.Explain the architecture of data lakes and assess best practices in data storage and access in data lakes.

11.Demonstrate knowledge of real-time data processing technologies (Apache Kafka, Flink) and explain their suitability for telemetry data analysis.

12.Explain the planning, monitoring and implementation stages of data engineering projects and analyse the role of communication in their successful execution.

Skills

1.Skills to work with data flows, data processing and integration tools (Apache Spark, Hadoop, Apache Kafka, Airflow, etc.) and databases (MySQL, PostgreSQL, MongoDB).

2.Skills to work with cloud service platforms and use cloud infrastructure solutions to store, process, and analyze data.

3.Skills to develop and implement data quality assurance plans such as validation and purification processes.

4.Skills to optimize data flows by improving performance and efficiency.

Competences

1.Ability to identify problems in data integration, storage and processing, as well as ability to offer effective solutions using appropriate technologies.

2.Ability to work effectively with other data engineers, analysts, developers, and project leaders to achieve common goals.

3.Competence to manage the data infrastructure by ensuring its efficient operation, compliance and security.

4.Ability to use up-to-date technologies and techniques such as artificial intelligence and machine learning to improve data processing processes.

Assessment

Individual work

Title	% from total grade	Grade
1. Presentation on the topic studied	-	Test
Each of the students will be given a topic to learn independently and be able to present.

Examination

Title	% from total grade	Grade
1. Exam	-	10 points

Study Course Theme Plan

FULL-TIME

Part 1

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Real-time data processing

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data pipelines

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Design and architecture of data warehouses

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Big data processing, distributed computing (Spark, Hadoop)

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data Processing Ecosystem

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data storage systems and databases.

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data engineering project management

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Batch VS Streaming data processing, telemetry and IoT data

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Big data processing, distributed computing (Spark, Hadoop)

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data lake structures and best practices

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data storage systems and databases.

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Big data processing, distributed computing (Spark, Hadoop)

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Batch VS Streaming data processing, telemetry and IoT data

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data lake structures and best practices

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Design and architecture of data warehouses

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data engineering project management

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Clod computing (AWS, Google Cloud, Azure)

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Batch VS Streaming data processing, telemetry and IoT data

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Clod computing (AWS, Google Cloud, Azure)

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data Engineer Role and Responsibilities

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data pipelines

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data storage systems and databases.

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data integration and data quality assurance

Lecture

Modality	Location	Contact hours
On site	Auditorium	2

Topics

Data integration and data quality assurance

Total ECTS (Creditpoints):

6.00

Contact hours:

48 Academic Hours

Final Examination:

Exam

Bibliography

Required Reading

Kleppmann M. 2017. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable SystemsSuitable for English stream

Akidau T., Chernyak S., Lax R. 2018. Streaming Systems: The What, Where, When, and How of Large-Scale Data ProcessingSuitable for English stream

Dutt D.G. 2019. Cloud Native Data Center NetworkingSuitable for English stream

Akerkar R. 2014. Big Data: Principles and Paradigms (akceptējams izdevums)Suitable for English stream

Krishnan K. 2013. Data Warehousing in the Age of Big Data (akceptējams izdevums)Suitable for English stream

Additional Reading

Glass R., Callahan S. 2014. The Big Data-Driven Business: How to Use Big Data to Win Customers, Beat Competitors, and Boost ProfitsSuitable for English stream

Shalender K., Singla B., Singh N., Singla R. 2025. Integrating AI with Data Science: Realising Full Potential of Data-driven Decision Making. Navigating Data Science in the Age of AI: Exploring Possibilities of Generative Intelligence, pp. 1 - 11Suitable for English stream

Data engineering

Main Study Course Information

Study Course Implementer

About Study Course

Objective

Preliminary Knowledge

Learning Outcomes

Knowledge

Skills

Competences

Assessment

Individual work

Examination

Study Course Theme Plan

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Lecture

Topics

Bibliography

Required Reading

Additional Reading