PODS Invited Talks
Incomplete Data: What Went Wrong, and How to Fix It
Leonid Libkin (University of Edinburgh)
Incomplete data is ubiquitous and poses even more problems than before. The more data we accumulate and the more widespread tools for integrating and exchanging data become, the more instances of incompleteness we have. And yet the subject is poorly handled by both practice and theory. Many queries for which students get full marks in their undergraduate courses will not work correctly in the presence of incomplete data, but these ways of evaluating queries are cast in stone (SQL standard). We have many theoretical results on handling incomplete data but they are, by and large, about showing high complexity bounds, and thus are often dismissed by practitioners. Even worse, we have a basic theoretical notion of what it means to answer queries over incomplete data, and yet this is not at all what practical systems do.
Is there a way out of this predicament? Can we have a theory of incompleteness that will appeal to theoreticians and make practitioners realize that commercial DBMSs often produce paradoxical answers? Can we make such a theory applicable, i.e., implementable on top of existing DBMSs that are very good at fast query evaluation? And can we make it useful for applications such as data integration and handling inconsistency? The talk is about raising these issues, providing some answers, and outlining problems that still need to be solved.
Leonid Libkin is Professor of Foundations of Data Management in the School of Informatics at the University of Edinburgh. He was previously professor at the University of Toronto and a member of research staff at Bell Laboratories. He received his PhD from the University of Pennsylvania in 1994. His main research interests are in the areas of data management and applications of logic in computer science. He has written five books and over 180 technical papers. He was the recipient of a Marie Curie Chair Award from the EU in 2006 and four best paper awards. He has chaired several program committees, including PODS and ICDT, and was the conference chair of the 2010 Federated Logic Conference. He is an ACM fellow and a fellow of the Royal Society of Edinburgh.
Model-Data Ecosystems: Challenges, Tools, and Trends
Peter J. Haas (IBM Almaden Research Center)
In the past few years, research around (big) data management has begun to intertwine with research around deep predictive modeling and simulation. There is an increasing recognition that observed data must be combined with simulated data to support the deep what-if analysis that is needed for robust decision making under uncertainty. Simulation models of large, complex systems (traffic, biology, population health and safety) both consume and produce massive amounts of data, compounding the challenges of traditional information management. This talk will survey some interesting new problems, mathematical tools, and future directions in this emerging research area. Tentative topics include (i) pushing stochastic simulation into the database, (ii) simulation as a tool for data integration, (iii) new methods for massive scale time series transformations between models, (iv) moving from query optimization to simulation-run optimization, and (v) exploiting user control of simulated data
Peter J. Haas has been a Research Staff Member at the IBM Almaden
Research Center since 1987, where he has pursued research at the
interface of information management, applied probability, statistics,
and computer simulation. He has contributed to IBM products such as DB2
UDB and Netezza, as well as to the ISO SQL standards for database
sampling and analytics. He is also a Consulting Professor in the
Department of Management Science and Engineering at Stanford University,
teaching and pursuing research in stochastic modeling and simulation. He
is an IBM Master Inventor, an ACM Fellow, and a past president of the
INFORMS Simulation Society (I-Sim). He has received a number of awards,
including an ACM SIGMOD 10-year Best Paper award, an I-Sim Outstanding
Simulation Publication Award, and an IBM Research Outstanding Technical
Achievement Award. He has served on the editorial boards of the VLDB
Journal, Operations Research, and ACM Transactions on Modeling and
Database Principles in Information Extraction
Benny Kimelfeld (LogicBlox)
Populating a relational schema from textual content, a problem commonly known as Information Extraction, is pervasive in contemporary computational challenges associated with Big Data. In this tutorial, I will give an overview of the algorithmic concepts and techniques used for solving Information Extraction tasks. I will also describe some of the declarative frameworks that provide abstractions and infrastructure for programming extractors. Finally, I will highlight opportunities for impact through principles of data management, illustrate these opportunities through recent work, and propose directions for future research.
After receiving his Ph.D. in Computer Science from The Hebrew University of Jerusalem, Benny spent five years at IBM Research – Almaden, first as a postdoctoral scholar in the Computer Science Principles and Methodologies (Theory) Department, and then as a research staff member in the Search and Analytics Department. Since 2014, Benny has been a Computer Scientist at LogicBlox. Benny’s research spans a spectrum of both foundational and systems aspects of data management, such as uncertain (probabilistic) databases, information retrieval over data with structure, view updates, semistructured data, graph mining, and infrastructure for text analytics.