Bachelor in Business Intelligence and Data Analytics

Bachelor in Business Intelligence and Data Analytics

Become an expert in data analysis and business decision making in a technological ecosystem and with great networking opportunities

Non-structured data analysis

Description: 

This module has been designed to equip students with the necessary resources to learn how to extract business value from unstructured data. The technologies covered are organized into two main sections: an initial part focused on machine learning and associated statistical tools, followed by a more computational segment centered on natural language and image processing—the two most common types of unstructured data.

 

The approach is fundamentally practical, while providing sufficient theoretical grounding to ensure students can assimilate and consolidate their understanding of both core techniques and state-of-the-art methods.

Type Subject
Tercer - Obligatoria
Semester
Second
Course
4
Credits
3.00
Previous Knowledge: 

To successfully engage with this course, students are expected to have a solid foundation in linear algebra and advanced statistics, as well as a clear understanding of the fundamentals of artificial intelligence and core data analysis techniques. Familiarity with the architecture and management of Big Data pipelines is also required, along with advanced proficiency in Python and hands-on experience with commonly used development tools and runtime environments—such as Jupyter Notebooks, VS Code, virtual environments, and package managers.

Objectives: 

The objective of this module is for students to acquire fundamental knowledge and develop sufficient proficiency in the various techniques for processing unstructured data, as well as to build the criteria needed to identify the most appropriate methods for analyzing, processing, and extracting business value from this type of data.

Contents: 

  1. Introduction to Unstructured Data. This session provides a course overview and introduces the challenges associated with handling unstructured data—such as text and images—their prevalence in real-world scenarios, and the importance of analytical techniques for extracting meaningful insights. Foundational concepts from applied neuroscience, standard preprocessing steps, and essential tools for managing unstructured data are also introduced.
  2. Co-occurrence Analysis and High-Dimensional Data Visualization with PCA. This session examines the frequency and patterns of paired elements (e.g., keywords or codes within a dataset) to uncover associations and structural relationships among data components. It also covers projecting multi-feature datasets into lower-dimensional spaces using Principal Component Analysis (PCA), enabling clearer interpretation of data structure and variance in two or three dimensions.
  3. PCA (Continued) and Manifold Learning. This session explores a family of non-linear algorithms designed to discover low-dimensional structures embedded within high-dimensional data. By preserving intrinsic geometric relationships, these methods reveal complex patterns that linear techniques like PCA cannot capture.
  4. Clustering: k-means and Other Models. This session focuses on unsupervised clustering algorithms that partition data into a predefined number of groups. It then extends to probabilistic approaches, modeling data as a combination of multiple distributions to capture more flexible structures and assignment probabilities.
  5. Clustering (Continued): Interpretation and Selecting the Number of Clusters. This session addresses strategies for determining the optimal number of clusters using likelihood-based criteria that balance model fit against complexity, helping to avoid overfitting and ensure robust, interpretable results.
  6. Unstructured Data Review. This session revisits the specific challenges of working with unstructured data—particularly natural language and images—their ubiquity in real-world contexts, and the critical role of analytical techniques in deriving actionable insights. Foundational neuroscience concepts, common preprocessing workflows, and key tools for managing unstructured data are also reviewed.
  7. Rule-Based NLP. This session focuses on rule-based natural language processing methods, which rely on handcrafted patterns and linguistic rules to analyze and manipulate text. Essential techniques such as tokenization, part-of-speech (POS) tagging, named entity recognition (NER), and syntactic parsing are covered. Through practical examples, participants explore the strengths, limitations, and appropriate use cases for rule-based approaches—whether in niche applications or as complements to data-driven models.
  8. Neural Networks. This session aims to refresh students' understanding of neural networks and prepare them for advanced topics in NLP and deep learning. Key concepts—including perceptrons, activation functions, backpropagation, and core architectures such as feedforward, convolutional (CNN), and recurrent neural networks (RNN)—are reviewed.
  9. NLP with Deep Learning. This session introduces deep learning approaches to natural language processing, demonstrating how neural networks can tackle complex language tasks such as sentiment analysis, machine translation, and question answering. Techniques including RNNs, Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs) are explained, highlighting their advantages over rule-based methods, with practical demonstrations included.
  10. Embeddings and Vectorization. This session covers the representation of textual data in numerical form for machine learning applications. Word embedding techniques such as Word2Vec and GloVe are introduced, alongside contextual embeddings from models like BERT. Students learn how vectorization methods capture semantic relationships and contextual information, enabling sophisticated language modeling and deeper linguistic understanding.
  11. Transformers and Generative AI. This session explores the evolution of the transformer architecture. Core concepts—including self-attention and multi-head attention mechanisms—are explained, along with the standard transformer structure and its operational principles. Models such as BERT and GPT are introduced, accompanied by practical examples of their application across diverse domains.
  12. Generative AI and Business Applications. This session focuses on real-world applications of generative AI in business contexts. Case studies illustrate how AI can enhance customer experiences, streamline workflows, and enable innovative solutions. Ethical considerations, implementation challenges, and best practices for deploying generative AI systems are also discussed. Additionally, emerging architectures such as RAG (Retrieval-Augmented Generation) and Agentic RAG are covered.
  13. Image Processing: From CNNs to Transformers. This closing session covers the fundamentals of image processing, beginning with Convolutional Neural Networks (CNNs) and their role in tasks such as image classification and object detection. It then addresses the transition to transformer-based architectures in computer vision, demonstrating how these models have surpassed traditional CNNs in tasks requiring contextual understanding and global relationship modeling within images.

Methodology: 

This course is delivered through a weekly session divided into two parts. The first part focuses on introducing the content descriptively and providing theoretical or conceptual explanations for aspects that require mathematical or computational justification. The second part is practical in nature and is dedicated to exploring the concepts covered through demonstrations or exercises that help students assimilate the material, understand its utility, and identify relevant application scenarios.

Evaluation: 

Continuous assessment for this course follows the structure below: 


Evaluation Type

Weight

Content

Activity Type

Attendance and Participation

20%

All course content

Moderately important

Individual Assignments

40%

Approximately 8 submissions

Highly important

Final Exam

40%

Full module content

Highly important


Evaluation criteria apply to all students; those enrolled in the retake session are also required to attend class. Any exceptional circumstances must be communicated to the teaching staff in advance and validated by the academic tutor.

The course will be considered passed when the final grade is equal to or higher than 5 out of 10.

RETAKE POLICY

 

The retake assessment will consist of a comprehensive exam covering all course content.

The maximum grade that can be obtained to pass the course in the retake session is 6.0 out of 10.

Evaluation Criteria: 

---

Basic Bibliography: 

  • Jurafsky, D., & Martin, J. H. (2022). Speech and Language Processing
  • Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit
  • Vaswani, A., et al. (2017). Attention Is All You Need.
  • L. Tunstall, L.Von Werra & T.Wolf (2022). Natural Language Processing with Transformers: Building Language Applications with HuggingFace.
  • Behrouz A., Razaviyayn M., Zhong P., Mirrokni V. (2025). Nested Learning: The Illusion of Deep Learning Architectures.

Additional Material: 

---