Veidlapa Nr. M-3 (8)
Study Course Description

Data engineering

Main Study Course Information

Course Code
SZF_174
Branch of Science
Other social sciences
ECTS
6.00
Target Audience
Business Management; Information and Communication Science; Management Science
LQF
Level 7
Study Type And Form
Full-Time

Study Course Implementer

Course Supervisor
Structure Unit Manager
Structural Unit
Faculty of Social Sciences
Contacts

Dzirciema street 16, Rīga, szf@rsu.lv

About Study Course

Objective

This course aims to provide business and project managers with an understanding of the fundamentals of data engineering and its importance in modern business. As part of the course, participants will gain knowledge about data flow and data processing processes, which will help them plan and manage projects that use data more successfully, as well as understand the requirements and challenges in creating and maintaining data infrastructure.

Preliminary Knowledge

In order to successfully participate in this data engineering course, participants should have a basic understanding of computer science and IT infrastructure, as well as basic knowledge of databases and data analysis. An understanding of business processes and how data is used to make decisions would also be helpful. Knowledge of project management to better oversee and coordinate data projects from a business perspective will be an advantage.

Learning Outcomes

Knowledge

1.Describe the role and responsibilities of the data engineer and analyse aspects of cooperation with IT specialists and business units.

2.Explain the structure of data flows and compare EV and ELT processes by assessing their benefits and constraints in different contexts.

3.Analyse the structures of data storage systems and compare the suitability of SQL and NoSQL databases for different processing scenarios.

4.Explain the basic principles of batch and streaming data processing and assess their applicability to IoT data processing and telemetry analysis situations.

Individual work and tests

Presentation on the topic studied

5.Demonstrate understanding of the operation of distributed computing systems (Spark, Hadoop) and analyze their use in processing large amounts of data.

6.Compares the functionality of key cloud services (AWS, GCP, Azure) and evaluates their usability in different data engineering contexts.

Individual work and tests

Presentation on the topic studied

7.Describe data integration processes and identify best practices in data quality assurance to maintain accuracy and consistency.

8.Identify key tools and technologies in the data processing ecosystem and explain their role in different environments (local, cloud, etc.).

Individual work and tests

Presentation on the topic studied

9.Analyze data storage room architecture, describe dimensional modeling, and explain the role of OLAP processes in data analysis.

10.Explain the architecture of data lakes and assess best practices in data storage and access in data lakes.

11.Demonstrate knowledge of real-time data processing technologies (Apache Kafka, Flink) and explain their suitability for telemetry data analysis.

12.Explain the planning, monitoring and implementation stages of data engineering projects and analyse the role of communication in their successful execution.

Skills

1.Skills to work with data flows, data processing and integration tools (Apache Spark, Hadoop, Apache Kafka, Airflow, etc.) and databases (MySQL, PostgreSQL, MongoDB).

2.Skills to work with cloud service platforms and use cloud infrastructure solutions to store, process, and analyze data.

3.Skills to develop and implement data quality assurance plans such as validation and purification processes.

4.Skills to optimize data flows by improving performance and efficiency.

Competences

1.Ability to identify problems in data integration, storage and processing, as well as ability to offer effective solutions using appropriate technologies.

2.Ability to work effectively with other data engineers, analysts, developers, and project leaders to achieve common goals.

3.Competence to manage the data infrastructure by ensuring its efficient operation, compliance and security.

4.Ability to use up-to-date technologies and techniques such as artificial intelligence and machine learning to improve data processing processes.

Assessment

Individual work

Title
% from total grade
Grade
1.

Presentation on the topic studied

-
Test

Each of the students will be given a topic to learn independently and be able to present.

Examination

Title
% from total grade
Grade
1.

Exam

-
10 points

Study Course Theme Plan

FULL-TIME
Part 1
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Real-time data processing
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data pipelines
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Design and architecture of data warehouses
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Big data processing, distributed computing (Spark, Hadoop)
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data Processing Ecosystem
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data storage systems and databases.
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data engineering project management
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Batch VS Streaming data processing, telemetry and IoT data
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Big data processing, distributed computing (Spark, Hadoop)
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data lake structures and best practices
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data storage systems and databases.
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Big data processing, distributed computing (Spark, Hadoop)
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Batch VS Streaming data processing, telemetry and IoT data
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data lake structures and best practices
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Design and architecture of data warehouses
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data engineering project management
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Clod computing (AWS, Google Cloud, Azure)
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Batch VS Streaming data processing, telemetry and IoT data
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Clod computing (AWS, Google Cloud, Azure)
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data Engineer Role and Responsibilities
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data pipelines
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data storage systems and databases.
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data integration and data quality assurance
  1. Lecture

Modality
Location
Contact hours
On site
Auditorium
2

Topics

Data integration and data quality assurance
Total ECTS (Creditpoints):
6.00
Contact hours:
48 Academic Hours
Final Examination:
Exam

Bibliography

Required Reading

1.

Kleppmann M. 2017. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable SystemsSuitable for English stream

2.

Akidau T., Chernyak S., Lax R. 2018. Streaming Systems: The What, Where, When, and How of Large-Scale Data ProcessingSuitable for English stream

3.

Dutt D.G. 2019. Cloud Native Data Center NetworkingSuitable for English stream

4.

Akerkar R. 2014. Big Data: Principles and Paradigms (akceptējams izdevums)Suitable for English stream

5.

Krishnan K. 2013. Data Warehousing in the Age of Big Data (akceptējams izdevums)Suitable for English stream

Additional Reading

1.

Glass R., Callahan S. 2014. The Big Data-Driven Business: How to Use Big Data to Win Customers, Beat Competitors, and Boost ProfitsSuitable for English stream

2.

Shalender K., Singla B., Singh N., Singla R. 2025. Integrating AI with Data Science: Realising Full Potential of Data-driven Decision Making. Navigating Data Science in the Age of AI: Exploring Possibilities of Generative Intelligence, pp. 1 - 11Suitable for English stream