This course introduces data mining, the extraction of useful information and knowledge from large volumes of data, to improve business decision-making. This course provides a comprehensive introduction to the various techniques and methods used in data mining. Topics covered include data preprocessing, data exploration and visualization, data modelling and prediction. The course also covers real-world applications and case studies in the industry. The goal of this course is to provide students with a solid understanding of data mining techniques and their applications using Python, so they can use them to analyse and extract insights from data in various fields.
Titular Professors
Professors
--
The "Data Mining" (IN015) course focuses on extracting valuable knowledge from large volumes of data to enhance business decision-making. Throughout the semester, you will explore the complete data mining process, including data preprocessing, exploration, and predictive modeling using regression, classification, and tree-based methods. Ultimately, the course equips you with the practical skills to apply these techniques using Python and its core libraries (such as pandas and scikit-learn), enabling you to analyze real-world data, critically evaluate the reliability of your results, and effectively communicate your findings.
First part of the semester:
- Introduction to Data Mining
- Data Preprocessing
- Regression Models
- Classification Models
Second part of the semester:
- Cross-Validation
- Feature Selection
- Tree Based Models
- Text Mining
Project
- Predicting Startup Success using Twitter
The following table relates the learning outcomes to the content taught to achieve them:
| RA | Syllabus | Contents |
| R1 | Understanding of data mining concepts and techniques | Introduction to Data Mining |
| R2 | Ability to analyze and interpret large datasets to extract meaningful insights and patterns | Data PreprocessingFeature SelectionCross-Validation |
| R3 | Knowledge of the various tools and technologies used in data mining using python, including numpy, pandas, matplotlib, seaborn and scikit-learn. | Regression ModelsClassification ModelsTree Based Models |
| R4 | Ability to critically evaluate data mining results and decide their reliability and validity | Cross-ValidationFeature Selection |
| R5 | Ability to communicate and present findings from data mining analysis effectively. | Project: Predicting Startup Success using Twitter |
The evaluation system will be continuous combining several activities to ease the assimilation of knowledge by the student.
The following table shows the percentage of evaluation of each activity based on the final grade:
| R1, R2 | Homework | 20% |
| R2, R3 | Mid-Term Exam | 30% |
| R4, R5 | Project | 20% |
| R2, R3 | Final Exam | 30% |
The aims of the continuous evaluation are the following:
- Progressive learning of the subject and evaluation of the activity
- Evaluation of the knowledge got in exams
- Practice the subject with a real-world project
--
- Provost, F., Fawcett, T. (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking, O’Really
- Mueller, A., Guido, S. (2016). Introduction to Machine Learning with Python, O’Really
--