Our seminar series is free and available for anyone to attend. Unless otherwise stated, seminars take place on Wednesday afternoons at 2pm in the Kilburn Building during teaching season.

If you wish to propose a seminar speaker please contact Antoniu Pop.


Query Planning in Distributed Stream Processing Systems

  • Speaker:   Dr  Eva Kalyvianaki  (Imperial College London)
  • Host:   Rizos Sakellariou
  • 28th April 2010 at 14:15 in Lecture Theatre 1.4, Kilburn Building
Distributed Stream Processing Systems (DSPSs) collect and analyse continuously generated data updates from distributed sources. Consider for example the real-time data coming from stock markets around the world. A stock trader could use the DSPS to invoke the following query: "Which are the top-10 stocks from European stock markets with the highest increase on their selling price over the last 5 minutes." In addition to financial data processing, other example applications running in a DSPS include network monitoring, environmental sensing and many others. A DSPS is a system of distributed nodes used by different queries to collect and process the large volumes of source data in near real-time.

In this talk, we explore query planning in DSPSs. Query planning involves the allocation of resources from physical nodes to queries. We introduce algorithms for query planning in two different settings: (a) First, we focus on DSPSs that deliver strict performance guarantees. Here, allocation decisions must satisfy the resource demands of queries, while overall the system should scale to a large number of satisfied queries. We formalise query planning as a constrained optimisation problem and solve an approximate version based on query reuse to achieve scalability. Experimental results, in simulation and as part of a prototype implementation of a DSPS show that our approach makes efficient allocations. (b) Second, we study the query planning in DSPSs aiming to accommodate every incoming request. As new queries arrive, the system can easily reach saturation. In this case, it is important to allocate a fair share of resources for processing and thus achieve the same quality of service. We show an overview of our approach which balances resources across existing running and new queries through controlled performance degradation in a distributed way.
▲ Up to the top