Programming 1, Programming 2, Software Engineering Lab, Mathematics for Computer Scientists 1, as well as Fundamentals of Algorithms and Data Structures (all recommended)
Successful participation in the exercises/project entitles the student to take part in the final exam.
Will be determined from performance in exams, exercises, and (optionally) practical tasks. The exact modalities will be announced at the beginning of the module.
2 h lectures
+ 2 h tutorial
= 4 h (weekly)
60 h of classes
+ 120 h private study
= 180 h (= 6 ECTS)
The lecture provides basic knowledge of fundamental concepts of data management and data analysis in Big Data Engineering.
As part of the exercises, a project can be carried out during the semester. This can be, for example, a social network (Facebook style) or any other project where data management techniques can be practiced (e.g., natural science data, image data, other web applications, etc.). First, this project will be modeled in E/R, then realized and implemented in a database schema. Then the project is extended to manage and analyze unstructured data as well. Altogether, all fundamental techniques that are important for managing and analyzing data are thus demonstrated on a single project.
1 Introduction and classification
Classification and delimitation: "Big Data"
Value of Data: The gold of the 21st century
Importance of database systems
What is data?
Modeling vs Reality
Costs of inadequate modeling
Using a database system vs developing it yourself
Positive examples for apps
Requirements
References
Lecture mode
2 Data modeling
Motivation
E/R
Relational Model
domains, attributes
entity type vs entity
relation type vs relation
Hierarchical Data
keys, foreign keys
inheritance
Redundancy, normalization, denormalization
3 query languages
Relational Algebra
Graph-oriented query languages
4 SQL
Basics
Relationship to relational algebra
CRUD-style vs analytical SQL
SQL standards
joins, grouping, aggregation, having
PostgreSQL
Integrity constraints
Transaction concept
ACID
Views
5 Basic query optimization
Overview
from WHAT to HOW
Costs of different operations
EXPLAIN
Physical Design
Indexes, Tuning
Database tuning
Rule-based query optimization
Cost-based query optimization
6 Automatic Concurrency control
Serializability theory
Isolation levels
Pessimistic concurrency control
lock-based approaches, 2PL-variants
7 Grahical Data
recursion in SQL, WITH RECURSIVE
graph-oriented query languages: e.g. Cypher, Neo4J
8 Database Security
SQL injection
passwords
salt and pepper
9 Ethical Aspects of Big Data
mass surveillance
NSA
the "big data arithmetic"
counter measures
Will be announced before the start of the course on the course page on the Internet.
This module was formerly also known as Informationssysteme. This module is identical in content to the German language module Big Data Engineering.
This module is part of the following study programmes: