Our seminar series is free and available for anyone to attend. Unless otherwise stated, seminars take place on Wednesday afternoons at 2pm in the Kilburn Building during teaching season.

If you wish to propose a seminar speaker please contact Antoniu Pop.


Title: Crowd sourcing and active learning to scale out supervised learning approaches: a case study for web data extraction

  • Speaker:   Dr  Paolo Merialdo  (Universita Roma Tre)
  • Host:   Norman Paton
  • 11th December 2013 at 14:00 in Lecture Theatre 1.4
The recent advent of crowd sourcing platforms (such as Amazon Mechanical Turk) is opening new opportunities to address several issues based on supervised learning approaches. These platforms provide support for managing and assigning mini-tasks to humans actors, and thus can be used to produce massive training dataset. As these platforms facilitate the involvement of a large number of persons to produce, we may say that they represent a solution to 'scale-out' a supervised learning approaches. However, to obtain an efficient and effective process, two main issues need to be addressed. First, since mini-tasks are performed by non-expert, usually unskilled people, they should be extremely simple. Second, since the costs are proportional to the efforts required to the crowd, the number of mini-tasks should be minimized. To address the latter issue, Active Learning represents a natural solution. As a proof of concept, we present a system to infer web wrappers that relies on workers recruited by means of a crowd sourcing platform. Our system adopts a supervised approach to infer wrappers with training data generated by submitting simple queries to a crowdsourcing platform. To address the cost issue, our system selects the queries that more quickly bring to infer an accurate solution, thus minimizing the number of mini-tasks assigned to the crowd.
Paolo Merialdo has been an Associate Professor at Universit?? degli Studi Roma Tre since 2006. He received his Computer Engineering degree in 1990 from Universit?? degli Studi di Genova, and his PhD in 1998 from Universit?? degli Studi di Roma 'La Sapienza'. His research interests include information extraction and data management techniques for Web data. He has published his research results in important journals of the field and in the refereed proceedings of the major conferences (ACM-SIGMOD, VLDB, WWW, EDBT). He has been program committee member for many international conferences and he served as Associate Director for ACM SIGMOD Record. He is co-founder of InnovAction Lab, an entrepreneurship program for master students. He serves as Advisor at LuissEnlabs, an accelerator for startups, in Rome.)
▲ Up to the top