Bachelor in Business Intelligence and Data Analytics

Bachelor in Business Intelligence and Data Analytics

Become an expert in data analysis and business decision making in a technological ecosystem and with great networking opportunities

Big data analysis

Description: 

This module provides a comprehensive, practice-oriented foundation in modern data engineering and analytics, equipping students with the technical skills to process, manage, and extract value from diverse data sources. Participants will learn to use Python and its core libraries (Numpy, Pandas, Matplotlib, SciPy) for data transformation and analysis, while gaining a solid understanding of Big Data system architectures, distributed processing on clusters, and cloud-based data services. The course also covers essential topics such as data conversion and standardization, preparation techniques for statistical modeling and machine learning, and effective data visualization. Additional exposure to emerging technologies like Blockchain and its security principles ensures students are prepared to navigate both established and evolving landscapes in data-driven environments. By the end of the module, students will be capable of designing and implementing end-to-end data workflows—from ingestion and cleaning to analysis and communication of insights—using industry-standard tools and best practices.

Type Subject
Tercer - Obligatoria
Semester
First
Course
3
Credits
6.00
Previous Knowledge: 

To successfully engage with this module, students are recommended to possess foundational skills in general data processing, including familiarity with databases, common data formats, and both traditional and advanced analytics techniques. Proficiency in the Python programming language is essential, as it serves as the primary tool for implementing the concepts and exercises covered throughout the course. Additionally, basic notions of cloud architectures are advisable, given that many of the discussed technologies and workflows are designed to operate within distributed or cloud-based environments.

Objectives: 

The goal of Big Data Analysis is to teach you how to use tools that can control the avalanche of data generated in the modern era.This will be achieved through a combination of a data management architecture (the bigdata pipeline), the use of specific Big Data processing technologies, and some Python programming.By the end of this course,you should be able to process large data files and manipulate data to generate statistics, metrics and visualizations.

Contents: 

  1. Use Python to read and transform data in different formats. Students will develop independent solutions by leveraging standard libraries such as Numpy, Pandas, Matplotlib, and SciPy to handle data manipulation, transformation, and analysis tasks efficiently across multiple file formats and structures.
  2. Understand the component architecture of a Big Data Management system. Participants will be able to identify the key elements of a Big Data processing pipeline and understand their specific role within the data value chain, including familiarity with the main variants of each component and the distinguishing technical and functional aspects between them.
  3. Gain familiarity with leading Cloud Data Services offerings for Big Data. Students will learn about cloud infrastructure fundamentals, the structure and types of data services available, and the specific solutions provided by major cloud providers in the market, enabling informed decisions when selecting tools for scalable data workflows.
  4. Understand how Blockchain works and the foundations of its security. Participants will explore the core functions of Blockchain technology, its primary use cases, and the cryptographic and consensus mechanisms that enable its high levels of security and resilience, with a brief review of essential cryptographic concepts to support deeper comprehension.
  5. Generate basic statistics and metrics using data stored on disk. Students will retrieve data from storage systems, load it into an appropriate format, and perform necessary cleaning and pre-processing steps; they will then calculate fundamental statistics—such as mean, median, and standard deviation—and relevant metrics like averages or percentages, presenting the results in a clear and actionable manner.
  6. Work with distributed processing tasks on a cluster. Participants will learn to configure a computing cluster environment, including selecting suitable hardware, setting up software frameworks, and establishing network communication between nodes, while developing and executing tasks that incorporate data parallelism, task distribution, fault tolerance, and dynamic resource management.
  7. Convert data from diverse sources into standardized storage or query formats. Students will identify various data sources—such as flat files, APIs, and databases—and implement robust extraction procedures that address format-specific challenges; they will also develop conversion workflows to transform and harmonize data from heterogeneous sources into a common structure, ensuring quality, consistency, and compatibility for downstream analysis or storage.
  8. Prepare data for statistical analysis, visualization, and machine learning. The course covers techniques for identifying and addressing missing values, outliers, and inconsistencies through methods like imputation, scaling, and categorical encoding; students will also learn to engineer new relevant features and select informative variables to optimize datasets for modeling while preserving data integrity and analytical validity.
  9. Present data through effective visualizations. Participants will learn to select the most appropriate chart types—such as bar charts, scatter plots, or heat maps—based on the data's nature and the insights to be conveyed; they will design clear, accurate, and aesthetically engaging visuals using suitable colors, labels, and titles, and integrate them into reports or presentations to communicate data-driven insights effectively to technical and non-technical stakeholders.

Methodology: 

This subject has two teaching sessions per week. Each session is divided into two parts: the first part is primarily instructor-led, during which the teacher presents new content and theory; the second part involves students working on exercises to reinforce the knowledge they have acquired. Assessments are conducted periodically through  individual or group activities and collection of homework exercises, etc.

Evaluation: 

In order to evaluate if the student has achieved an adequate score for the objectives pursued in the subject, different evaluation activities are used (with a frequency of approximately weekly).

Thefollowingtableshowsthepercentageofevaluationofeachactivityonthefinalgrade:

CONTINUOUSEVALUATIONSYSTEM:


Evaluation type

Weight

Content

Activity type

Attendance and participation

30%

All topics

Moderately important

Activities

30%

Around 5 o 6 activities individual or groupal

Highly important

Mid-term exam

10%

Covered tòpics to date

Moderately important

Final exam

30%

All topics

Highly important


Students who do not pass the regular call will have an Extraordinary Call in July. Students who do not take any of the rest exams will have a final grade of the subject NP (Not Presented) in the extraordinary call.

 Aminimumgradeof3inthefinalandretakeexamwillbeneededtopassthecourse. Objectives of the continuous evaluation:

-  The main objective is to help students to update the subject and get a good method of work, so that it helps them to assimilate the subject, taught progressively, and in obtaining good academic results.

-  It also allows to value the work that the student does day by day, without his note depends only of the examinations realized during the semesters of the academic course.

-  As a teacher, it helps to have more information about the work done by students and a better knowledge of them,both academically and personally

Evaluation Criteria: 

---

Basic Bibliography: 

Marin, I., Shukla, A., & VK, S. (2019). Big Data Analysis with Python. Packt Publishing.

Martin Kleppmann, (2019), Designing Data-Intensive Applications [O’Reilly]

Additional Material: 

---