Programming 1, Programming 2, Software Engineering Lab, Mathematics for Computer Scientists 1, as well as Fundamentals of Algorithms and Data Structures (all recommended)
Successful participation in the exercises/project entitles the student to take part in the final exam.
Will be determined from performance in exams, exercises, and (optionally) practical tasks. The exact modalities will be announced at the beginning of the module.
2 h lectures + 2 h tutorial = 4 h (weekly)
60 h of classes + 120 h private study = 180 h (= 6 ECTS)
The lecture provides basic knowledge of fundamental concepts of data management and data analysis in Big Data Engineering.
As part of the exercises, a project can be carried out during the semester. This can be, for example, a social network (Facebook style) or any other project where data management techniques can be practiced (e.g., natural science data, image data, other web applications, etc.). First, this project will be modeled in E/R, then realized and implemented in a database schema. Then the project is extended to manage and analyze unstructured data as well. Altogether, all fundamental techniques that are important for managing and analyzing data are thus demonstrated on a single project.
1 Introduction and classification Classification and delimitation: "Big Data" Value of Data: The gold of the 21st century Importance of database systems What is data? Modeling vs Reality Costs of inadequate modeling Using a database system vs developing it yourself Positive examples for apps Requirements References Lecture mode 2 Data modeling Motivation E/R Relational Model domains, attributes entity type vs entity relation type vs relation Hierarchical Data keys, foreign keys inheritance Redundancy, normalization, denormalization 3 query languages Relational Algebra Graph-oriented query languages 4 SQL Basics Relationship to relational algebra CRUD-style vs analytical SQL SQL standards joins, grouping, aggregation, having PostgreSQL Integrity constraints Transaction concept ACID Views 5 Basic query optimization Overview from WHAT to HOW Costs of different operations EXPLAIN Physical Design Indexes, Tuning Database tuning Rule-based query optimization Cost-based query optimization 6 Automatic Concurrency control Serializability theory Isolation levels Pessimistic concurrency control lock-based approaches, 2PL-variants 7 Grahical Data recursion in SQL, WITH RECURSIVE graph-oriented query languages: e.g. Cypher, Neo4J 8 Database Security SQL injection passwords salt and pepper 9 Ethical Aspects of Big Data mass surveillance NSA the "big data arithmetic" counter measures
Will be announced before the start of the course on the course page on the Internet.
This module was formerly also known as Informationssysteme. This module is identical in content to the German language module Big Data Engineering.
This module is part of the following study programmes: