Degree in Business Intelligence and Data Analytics

Lead the transformation of companies through the use and analysis of data.

Data mining

Description
This course introduces data mining, the extraction of useful information and knowledge from large volumes of data, to improve business decision-making. This course provides a comprehensive introduction to the various techniques and methods used in data mining. Topics covered include data preprocessing, data exploration and visualization, data modelling and prediction. The course also covers real-world applications and case studies in the industry. The goal of this course is to provide students with a solid understanding of data mining techniques and their applications using Python, so they can use them to analyse and extract insights from data in various fields.
Type Subject
Tercer - Obligatoria
Semester
Second
Course
2
Credits
6.00

Titular Professors

Previous Knowledge
Objectives

Learning Outcomes of this subject are:
R1. Understanding of data mining concepts and techniques.
R2. Ability to analyze and interpret large datasets to extract meaningful insights and patterns.
R3. Knowledge of the various tools and technologies used in data mining using python, including numpy, pandas, matplotlib, seaborn and scikit-learn.
R4. Ability to critically evaluate data mining results and determine their reliability and validity.
R5. Ability to communicate and present findings from data mining analysis effectively.

Contents

First part of the semester:
- Introduction to Data Mining
- Data Preprocessing
- Regression Models
- Classification Models

Second part of the semester:
- Cross-Validation
- Feature Selection
- Tree Based Models
- Text Mining
Project: Predicting Startup Success using Twitter

Methodology

R1 - Understanding of data mining concepts and techniques: Introduction to Data Mining
R2 - Ability to analyze and interpret large datasets to extract meaningful insights and patterns:
- Data Preprocessing
- Feature Selection
- Cross-Validation
R3 - Knowledge of the various tools and technologies used in data mining using python, including numpy, pandas, matplotlib, seaborn and scikit-learn:
- Regression Models
- Classification Models
- Tree Based Models
R4 - Ability to critically evaluate data mining results and determine their reliability and validity:
- Cross-Validation
- Feature Selection
R5 - Ability to communicate and present findings from data mining analysis effectively: Project: Predicting Startup Success using Twitter

Evaluation

The evaluation system will be continuous combining several activities to facilitate the assimilation of knowledge by the student.
The following table shows the percentage of evaluation of each activity based on the final grade:

R1, R2 - Homework - 20%
R2, R3 - MidTerm Exam - 20%
R4, R5 - Project - 30%
R2, R3 - Final Exam - 30%

The objectives of the continuous evaluation are the following:
- Progressive learning of the subject and evaluation of the activity
- Evaluation of the knowledge acquired in exams
- Practice the subject with a real-world project

Evaluation Criteria
Basic Bibliography

- Mueller, A., Guido, S. (2016). Introduction to Machine Learning with Python, O'Really
- James, G et al (2021). An Introduction to Statistical Learning, Springer
- Provost, F., Fawcett, T. (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking, O'Really
- Matthes, E. (2015). Python Crash Course: A Hands-On, Project-Based Introduction to Programming, No Starch Press

Additional Material