There is a direct link between how well an organization manages its data resources and its financial performance. The goal of “Advanced data processing and analysis” is to help you to take good decisions with data.
Two general blocks:
(a) data governance, management and security. Principles, definitions and models to use when handling the use of data in organisations. Both theoretical framework and practical implementation
(b) algorithms and tools to analyse structured and non-structured data. Practical implementation as well as theory. Statistical interpretation of ML models. Foundational models. Evaluation and risks of ML/FM.
Titular Professors
Professors
Databases. Data Analysis Tools. Programming. Algorithms and data structure
The subject serves as a practical introduction to
- Data management
- Data governance
- Data warehousing
- Data Pipelines
- Data security
- Data regulation
- Practical Machine learning in the context of data processing (manament and governance)
- Practical use of Foundation Models in the context of data processing
Advanced data processing and analysis involves actions and methods performed on data that help describe facts, detect patterns, develop explanations and test hypotheses. This includes: - Data governance - Statistical data analysis - Modeling - Interpretation of results.
Each session starts with a masterclass and has a section with practical work, either individual or in groups. Most sessions have a in-class assignment. For some sessions readings are assigned. These are tested by various in-class assignments. The course has a final capstone project submitted in groups. Most of the mandatory homework consists of partial submissions working toward this final project.
The midterm and final exams are individual.
What | Weight | Importance | Note |
Assistance and participation | 20% | Medium | |
Number of classes attended less 4 | 5% | Low | |
Classroom assignments, best of 6 | 8% | Low | |
Attitude & Contribution | 7% | Medium | |
Group Project | 20% | Medium | |
Midterm Presentation | 20% | High | >4 to pass |
Capstone project | 20% | High | >4 to pass |
Presentation | 10% | Medium | |
Report | 10% | High | |
Final Exam | 20% | High | >4 to pass |
-
- Standards & governance/security
- ISO/IEC 38500:2024 (governance of IT; principles/model/framework). https://www.iso.org/standard/81684.html"> style="font-size: 10.5pt;">[iso.org], https://webstore.iec.ch/en/publication/92580"> style="font-size: 10.5pt;">[webstore.iec.ch]
- Ladley, Data Governance (2e, Academic Press, 2019). https://shop.elsevier.com/books/data-governance/ladley/978-0-12-815831-9... style="font-size: 10.5pt;">[shop.elsevier.com]
- Eryurek et?al., Data Governance: The Definitive Guide (O’Reilly, 2021). https://www.amazon.com/Data-Governance-Definitive-Operationalize-Trustwo... style="font-size: 10.5pt;">[amazon.com]
- Fitzgerald, CISO Compass (Auerbach/CRC Press). https://www.taylorfrancis.com/books/mono/10.1201/9780429399015/ciso-comp... style="font-size: 10.5pt;">[taylorfrancis.com]
- Hyppönen, If It’s Smart, It’s Vulnerable (Wiley). https://www.wiley.com/en-us/If+It%27s+Smart%2C+It%27s+Vulnerable-p-97811... style="font-size: 10.5pt;">[wiley.com]
- Data management & platforms (free/trial friendly)
- Microsoft Fabric 60?day trial; Lakehouse labs. https://learn.microsoft.com/en-us/fabric/fundamentals/fabric-trial"> style="font-size: 10.5pt;">[learn.microsoft.com], https://learn.microsoft.com/en-us/fabric/data-engineering/tutorial-lakeh... style="font-size: 10.5pt;">[learn.microsoft.com]
- Databricks Free Edition (cloud workspace for Spark/Delta/MLflow). https://community.cloud.databricks.com/"> style="font-size: 10.5pt;">[community....bricks.com]
- Snowflake 30?day trial w/ credits. https://signup.snowflake.com/"> style="font-size: 10.5pt;">[signup.snowflake.com]
- Google BigQuery Sandbox (no card; 10?GB storage + 1?TB query/month free). https://docs.cloud.google.com/bigquery/docs/sandbox"> style="font-size: 10.5pt;">[docs.cloud...google.com]
- Amazon Redshift serverless free trial credit. https://aws.amazon.com/redshift/free-trial/"> style="font-size: 10.5pt;">[aws.amazon.com]
- dbt Core (open?source transformations), Airbyte (open?source EL/ELT), DuckDB, PostgreSQL. https://docs.getdbt.com/docs/core/installation-overview"> style="font-size: 10.5pt;">[docs.getdbt.com], https://docs.airbyte.com/"> style="font-size: 10.5pt;">[docs.airbyte.com], https://duckdb.org/"> style="font-size: 10.5pt;">[duckdb.org], https://www.postgresql.org/"> style="font-size: 10.5pt;">[postgresql.org]
- Delta Lake (open lakehouse table format). https://docs.delta.io/"> style="font-size: 10.5pt;">[docs.delta.io]
- Machine Learning & Foundational Models
- scikit?learn user guide for classical ML. https://scikit-learn.org/stable/user_guide.html"> style="font-size: 10.5pt;">[scikit-learn.org]
- Hugging Face Transformers — pipeline API for quick FM inference. https://huggingface.co/docs/transformers/v5.0.0/en/pipeline_tutorial"> style="font-size: 10.5pt;">[huggingface.co]
- MLflow docs for experiment tracking & GenAI tracing/evaluation. https://mlflow.org/docs/latest/"> style="font-size: 10.5pt;">[mlflow.org]
-