Querying Data on the Web

Unit code: COMP62421
Credit Rating: 15
Unit level: Level 6
Teaching period(s): Semester 1
Offered by School of Computer Science
Available as a free choice unit?: Y



Additional Requirements


Comparable knowledge to that provided by: COMP23111 Fundamentals of Databases


This course unit aims to endow students with knowledge and understanding of query processing technology, particularly with relation to data on the Web.

Given the changing landscape of computing towards a predominance of data-centric approaches in both scientific and industrial contexts, query processing is set to become an increasingly important activity on which organisations will compete. This course unit, therefore, aims to build upon the knowledge that students will have acquired of modelling data on the Web and to focus on studying how queries over data on the Web are expressed, optimised and evaluated. While, for practical reasons of infrastructure, the course unit does not linger on issues that arise when dealing with data on a very large scale, it is of course the case with the Web that the challenges associated with query processing in this setting are particularly interesting and so will be discussed in the course unit.

Significant use is made of reading assignments and other activities in order to stimulate students to engage in independent information acquisition.

Note that this course unit is about query processing from a systems viewpoint (as opposed, for example, to a theoretical one, or to an application-oriented one). Therefore, it concerns itself much more with how query processing systems are built on well-accepted principles, and not as much with how they are used to support applications. It should appeal particularly well to students who enjoy understanding how well-founded systems can be made to deliver advanced, challenging functionality. It is possibly less appealing to students who are more interested in how advanced technologies can be deployed, say, in businesses, in response to specific business needs, although the course unit does attempt to explain the motivations behind the technological advances it covers.


In undergraduate courses on databases, we often focus on how database applications are developed. Here, we look into how database management systems are built internally and seek to understand where the advances lie that make the database applications we develop so effective and efficient. We start by a deep look into query processing, focussing on optimisation. We then survey the most advance data management tools available: parallel, distributed, federated for use in cluster, cloud and big data settings. We look into big data appliances as well as into sensor networks.

Teaching and learning methods

One closed-book, 5 question, 50 mark, 2 hour written exam

Five weekly exercises including problem-solving lab work

Learning outcomes

Programme outcome      G.1

Unit learning outcome

  • Have acquired knowledge and understanding of query processing techniques in the classical context of relational databases including parallelization approaches
  • Have acquired knowledge and understanding of query processing techniques over XML/JSON data using XQuery
  • Have acquired knowledge and understanding of query processing techniques over RDF data using SPARQL
  • Have acquired knowledge and understanding of query processing techniques over map-reduce engines as well as NoSQL ones in the context of data-centric Web systems and applications

Employability skills

  • Analytical skills
  • Problem solving
  • Research
  • Written communication

Assessment methods

  • Written exam - 50%
  • Written assignment (inc essay) - 50%


Part 1

[Day 1]

Introduction to the Course Unit [1]

Relational Query Processing (1 of 2)

     The Architectural Paradigm for Query Processing Systems [1]

     The Relational Model of Data [1]

     The Relational Calculi and Algebra [1]

     The SQL Language [1]


[Day 2]

Relational Query Processing (2 of 2)

     Logical Optimization [2]

     Physical Optimization [1]

     Classical Query Execution [1]

     Parallel Query Execution [1]


Part 2

[Day 3]

Query Processing Using XQuery

     Motivation for the Language [1]

     Example Capabilities [1]

     Compilation, Optimization, Evaluation [2]

     Applications [1]


[Day 4]

Query Processing Using SPARQL

     Motivation for the Language [1]

     Example Capabilities [1]

     Compilation, Optimization, Evaluation [2]

     Applications [1]


Part 3

[Day 5]

Map-Reduce for Query Processing

     The Models and Platforms [1]

     Using the Platform for Query Processing [1]

     Applications [1]


NoSQL Query Processing

     The Models and Platforms [1]

     Applications [1]

Recommended reading

Specific research papers will be referenced for self-learning and some will be assigned for mandatory reading

There is no single textbook, or set thereof, that can be said to cover the course content. Books that are moderately useful for the course, include:

     - Database systems: the complete book (2nd edition).

       Hector Garcia-Molina, Jeffrey D. Ullman and Jennifer Widom.

       ISBN 9780131354289.

       Pearson, 2008.


     - Database system concepts (6th edition).

       Abraham Silberschatz, Henry F. Korth and S. Sudarshan.

       ISBN 9780071289597.

       McGraw-Hill, 2010.


     - Database management systems (3rd edition).

       Raghu Ramakrishnan and Johannes Gehrke.

       ISBN 9780071231510.

       McGraw-Hill, 2002.


The three above are largely alternatives to one another, with slightly different emphases.


     - Understanding Relational Database Query Languages.

       Suzanne W. Dietrich. ISBN 0130286524.

       Prentice-Hall, 2001.


This is a more focussed look on the various languages  that have been proposed for relational data.


     - Web Data Management.

       Serge Abiteboul, Ioana Manolescu et al.

       ISBN 9781107012431.

       Cambridge UP, 2012.


This is a good survey, which, in an ever-shifting technical landscape, is beginning to fall out of date. There is, too much of more lasting value.


     - A Semantic Web Primer.

       Grigoris Antoniu, Paul Groth et al.

       ISBN 9780262018289.

       MIT Press, 2012.


This is the 3rd ed. of an overview of the Semantic Web (previous editions came in 2008 and 2004). Again, the pace of change means that the three years since publication have brought some changes to the topics treated.

Feedback methods

Coursework is assigned and lab sessions provide an opportunity for interaction. Coursework is marked offline with feedback given in writing. Lab sessions allow students to discuss the written feedback in more depth with the marker. The course unit will use the standard tools available in virtual learning environments for hints, tips, discussions, etc.

Study hours

  • Lectures - 25 hours
  • Practical classes & workshops - 10 hours
  • Independent study hours - 115 hours

Teaching staff

Alvaro Fernandes - Unit coordinator

▲ Up to the top