Master of Science in Data Science La Salle Campus Barcelona URL

Master of Science in Data Science

Become an expert in analysing, structuring, filtering, visualizing and valuing the production of generated data

Data structures and storage

Description
We will focus on Informational Systems as the main source of data for our developments. For this there will be two main points of view. The first of them will be a complement to the physical infrastructures worked on in the subject “MD002 - Computing infrastructures” but focusing on the organization and transformation of information based on the required end use. The second of them will be in the life cycle of the data, from its origin to its transformation into information and lastly and most importantly, into value. All with the aim of providing you with an end-to-end vision of the different data architectures that you can find in the business market when it comes to understanding and exploiting the informational data cycle. This cycle goes from the origins of the information (operational / transactional world) to the data environments prepared for the exploitation of the data (DataLake, DWH or SANDBOX’s). When we talk about data architecture environments, we are not only referring to physical architectures (HW), or to the logical and technological definition (tools) of these, but also to those data structures that allow the optimization and governance of data already transformed. in information.
Type Subject
Primer - Obligatoria
Semester
First
Course
1
Credits
5.00
Previous Knowledge
Objectives

The goals will focus on:
• Know the key concepts and bases of the information models necessary to understand the role of the existing profiles around an information system and its exploitation.
• Have the broadest possible view of the different informational architectures, both those already existing and established in the business market, and the new disruptive visions based on the new available technologies.
• Learn to collect, transform and process information based on its origin, volume, format and periodicity of analysis.
• Understand the different forms of Data Governance in order to understand the contribution to the day-to-day work of a Data Scientist.

Contents

1. Key concepts and bases of information models
1.1. History of Databases
1.2. Key figures around informational systems (Data Science, Data Analyst, Product Owner, etc.)
1.3. Roles of a Data Scientist and interaction with the rest
1.4. DBMS vs RDBMS (information exploitation concept)
1.5. Theory and practical examples of Relational Models
1.6. Definition and concept of ETL

2. What is information and how do we extract value from it?
2.1. Concept of data and information (from the origin of the data to exploiting information and achieving value)
2.2. Business Intelligence and how is it understood with Data Science?
23. Big data
2.3.1. ELT vs ETL
2.3.2. Architecture concept
2.4. Types of data architectures
2.4.1. Logic / Technological / Physical
2.4.2. Information environments (DEV, PRO, SandBox, ...)
2.4.3. Concepts DataLake, DWH, etc.
2.5. IA Data Model Architecture
2.5.1. Development
2.5.2. Validation and promotion to PRO (batch and online)
2.5.3. Monitoring

3. Exploit the different types of information
3.1. Structured DB (review and expansion of what has already been seen)
3.2. Semi-structured DB
3.3. Unstructured DB

4. Web Scraping as a data source
4.1. What is Web Scraping?
4.2. Legal aspects
4.3. Tools

5. Cloud Computing for Data Scientist

6. The importance of data traceability and reliability (Data Governance)

Note: Topics can be adjusted and/or modified at the discretion of the master's coordination.

Methodology

The methodology used combines master classes, student participation, exercises, and practices. For the student, this will involve both individual and group works, as well as conceptual exercises, written exercises, and oral presentations.

Evaluation

This subject will be assessed on a continuous via from exercises, assignments, practices, and presentations in class.

Evaluation Criteria

Continuous assessment
This subject will be assessed on a continuous via from exercises, assignments, practices, and presentations in class. The final grade will be a weighting of:
- Practice on a relational information system: 30%
- Practice on unstructured data systems: 30%
- Final work and presentation: 40%

Extraordinary call
The exam and/or works of extraordinary call will be determined from the coordination of the subject.

Copies regulations
The subject is governed by the general regulations of copies of La Salle Campus BCN:
https://www.salleurl.edu/en/copies-regulation
The training activities will be considered to have the following category:
• Exercises: moderately significant
• Project: highly significant
• Final Evaluation: highly significant

Basic Bibliography

The bibliography will be detailed throughout the course:

• Class/Lecture notes
• Documentation and papers uploaded to Intranet (eStudy)

Additional Material

All class material (presentations, exercises, articles, documents, etc.) will be shared in the subject folder of the La Salle Intranet: eStudy.