University of Manchester

COMP33111
Data Integration and Analysis

Course Information (2011/12)

Lecturer: Goran Nenadic

Demonstrators: Azad Dehghan, Rosyzie Anna Apong, James Naish

Introduction

All application areas are witnessing the "data deluge", i.e. the ever growing amount of digital data available as part of day-to-day activities in business, science, education, entertainment, etc. Making sense of this data by integration and analysis is a key for success of any organisation. In addition to the need to work with huge volumes of data, current applications are also challenged with multi-modal data, including un- and semi-structured data, text, image and video data, spatial and temporal data, etc.

Module Aims

The aim of the course is to give students an awareness of the problems and opportunities associated with data integration, including the analysis of the data once integrated. Previous database courses focused on the infrastructure for managing and querying data, database design and database programming. This course unit focuses principally on making the most of data within an organisation through
  • Data integration: getting the data into a form that supports and facilitates aggregation, exploration and mining.
  • Data analysis: techniques for learning new lessons from the data.

Official module description and syllabus

Schedule of lectures, tutorials and labs

All lectures are on Mondays 13:00-15:00 in Kilburn LF 15
Tutorials, open surgeries, labs: Mondays 12:00-13:00 (3rd year lab) and/or 15:00-16:00 (G23)

WeekDate Lecture/Tutorial/Lab/Test
1 26/09
   Introduction to Data Integration and Analysis
203/10
   Data Warehousing and Profiling
310/10
   Data Analytics and Online Analytical Processing
417/10
   Association Rule Mining
524/10
   Lab test 1 (DW, OLAP) and Text Mining
6  Reading week
707/11
   Data Classification
814/11
   Data Clustering
921/11
   Multimedia Data Analysis and Integration
1028/11
   Lab test 2: Data Mining

1105/12
   Enterprise Resource Planning
1212/12
   Guest lecture
  • IBM Guest lecture: "Smart Analytics for Policing -
    Data Warehousing for real applications"

Assessment

  • Examination: 85% (3 questions from 5).
  • Laboratory: 15% (2 test sessions, practicing tools and methods discussed during the lectures).

Past exam papers

Past exam papers are available here and here. Feedback on past papers is available here.
General information about Examination is here.

Lab-test sessions

There are 2 assessed lab sessions:
  • Lab-test 1 on Data warehousing and OLAP (worth 5% of the total mark) - October 24th, 14:00-15:00
  • Lab-test 2 on Data Mining (worth 10% of the total mark) - November 28th, 13:00-15:00.

Tutorials, labs and open surgeries

There are weekly sessions with tutorials, labs and open surgeries. Tasks and questions will be delivered in advance and will be discussed during tutorial sessions on Mondays 12:00-13:00 and 15:00-16:00 (every Monday from week 2, apart from weeks when the lab tests take part).

Supplementary textbooks

  1. Thomas Connolly and Carolyn Begg: Database systems: a practical approach to design, implementation, and management, Addison-Wesley, 4th edition, 2005, ISBN: 0321210255
  2. Ramez Elmasri and Shamkant B. Navathe: Fundamentals of database systems, Pearson, 5th edition, 2007, ISBN: 032141506X
  3. Robert Nisbet, John Elder and Gary Miner Handbook of Statistical Analysis and Data Mining Applications, Elsevier, ISBN: 978-0-12-374765-5 (e-book)
  4. Oded Maimon and Lior Rokach (Eds.): Data Mining and Knowledge Discovery Handbook, Springer, 2nd ed., 2010, ISBN: 978-0-387-09822-7 (e-book)


Additional Study Materials

Additional Oracle documentation