|
COMP33111 Data Integration and Analysis
Course Information (2011/12)
Lecturer: Goran Nenadic
Demonstrators: Azad Dehghan, Rosyzie Anna Apong, James Naish
Introduction
All application areas are witnessing the "data deluge", i.e. the ever growing amount of digital data available as part of day-to-day activities in business, science, education, entertainment, etc. Making sense of this data by integration and analysis is a key for success of any organisation. In addition to the need to work with huge volumes of data, current applications are also challenged with multi-modal data, including un- and semi-structured data, text, image and video data, spatial and temporal data, etc.
Module Aims
The aim of the course is to give students an awareness of the problems and opportunities associated with data integration, including the analysis of the data once integrated. Previous database courses focused on the infrastructure for managing and querying data, database design and database programming. This course unit focuses principally on making the most of data within an organisation through
- Data integration: getting the data into a form that supports and facilitates aggregation, exploration and mining.
- Data analysis: techniques for learning new lessons from the data.
Schedule of lectures, tutorials and labs
All lectures are on Mondays 13:00-15:00 in Kilburn LF 15
Tutorials, open surgeries, labs: Mondays 12:00-13:00 (3rd year lab) and/or 15:00-16:00 (G23)
| Week | Date | Lecture/Tutorial/Lab/Test |
| 1 | 26/09 |
Introduction to Data Integration and Analysis
|
2 | 03/10 |
Data Warehousing and Profiling
|
| 3 | 10/10 |
Data Analytics and Online Analytical Processing
|
| 4 | 17/10 |
Association Rule Mining
|
| 5 | 24/10 |
Lab test 1 (DW, OLAP) and Text Mining
|
| 6 | |
Reading week |
| 7 | 07/11 |
Data Classification
|
| 8 | 14/11 |
Data Clustering
|
| 9 | 21/11 |
Multimedia Data Analysis and Integration
|
| 10 | 28/11 |
Lab test 2: Data Mining
|
| 11 | 05/12 |
Enterprise Resource Planning
|
| 12 | 12/12 |
Guest lecture
- IBM Guest lecture: "Smart Analytics for Policing -
Data Warehousing for real applications"
|
Assessment
- Examination: 85% (3 questions from 5).
- Laboratory: 15% (2 test sessions, practicing tools and methods discussed during the lectures).
Past exam papers
Past exam papers are available here and here. Feedback on past papers
is available here.
General information about Examination is here.
Lab-test sessions
There are 2 assessed lab sessions:
- Lab-test 1 on Data warehousing and OLAP (worth 5% of the total mark) - October 24th, 14:00-15:00
- Lab-test 2 on Data Mining (worth 10% of the total mark) - November 28th, 13:00-15:00.
Tutorials, labs and open surgeries
There are weekly sessions with tutorials, labs and open surgeries. Tasks and questions will be delivered in advance and will be discussed during tutorial sessions on Mondays 12:00-13:00 and 15:00-16:00 (every Monday from week 2, apart from weeks when the lab tests take part).
Supplementary textbooks
- Thomas Connolly and Carolyn Begg: Database systems: a practical approach to design, implementation, and management,
Addison-Wesley, 4th edition, 2005, ISBN: 0321210255
- Ramez Elmasri and Shamkant B. Navathe: Fundamentals of database systems, Pearson, 5th edition, 2007, ISBN: 032141506X
- Robert Nisbet, John Elder and Gary Miner Handbook of Statistical Analysis and Data Mining Applications, Elsevier, ISBN: 978-0-12-374765-5 (e-book)
- Oded Maimon and Lior Rokach (Eds.): Data Mining and Knowledge Discovery Handbook, Springer, 2nd ed., 2010, ISBN: 978-0-387-09822-7 (e-book)
Additional Study Materials
- Week 1 - Introduction: data integration, analysis and mining
- Week 2 - Data Warehousing, Profiling and Cleansing
- Week 3 - On-line analytical processing (OLAP)
- Week 4 - Association rule mining
- Week 5 - Text mining
- Week 7 - Classification
- Week 8 - Clustering
- Week 9 - Mining and integration of unstructured data
- Week 11 - Data Integration and Analysis: Enterprise-wide Information Systems
- Enterprise Resource Planning (slides from eprfans.com)
- SAP (Business Software Solutions, Applications and Services)
- SAP/Business Objects Intelligence Platform
- SAP case study: Boulanger
- Information Builders (Business Intelligence and Enterprise Reporting)
- WebFOCUS from Information Builders
- IBM Cognos (Business Intelligence, Budgeting, Forecasting, Performace management and Scorecarding)
- SAS Institute (Business analytics)
- ORACLE's Siebel (Business analytics solutions, CRM, ERP)
- ORACLE's Hyperion (Financial Performance Management, etc.)
- Qlik View (Business intelligence, etc.)
- MUSING project - next generation semantic-based business intelligence solutions
- CRISP-DM tutorial
- S. Anand, M. Grobelnik, D. Wettschereck: CRISP-DM: A Standard Process Model for Data Mining
Additional Oracle documentation
|