Keynote 1 (Tuesday April 1st)
Abstract: The amount of data collected in the last two years is higher than the amount of data collected since the dawn of time. Businesses are drowning in data, and need several months of ETL processing to barely prepare them for querying. Domain scientists collect data much faster than they can be transformed into valuable information and are often forced into hasty decisions on which parts to discard, potentially throwing away valuable data before it has been exploited fully. The reason is that query processing, which is the mechanism to squeeze information out of data, becomes slower as datasets grow larger. At the same time, the continuously increased number of hardware contexts ends up slowing processing down further, as keeping all cores busy with doing useful computation is difficult. Today's query engines cannot harness but a fraction of the potential of new hardware platforms.
Is it possible to decouple query processing efficiency from the data growth curve? As data grows exponentially, which new techniques can we invent to process today's data with the same efficiency as yesterday's data (although the latter was half the size)? How can we remain hardware-aware without creating systems that are too specialized to today's microarchitectures (and useless tomorrow)?
This talk advocates a departure from the traditional "create a database, then run queries" paradigm. Instead, data analysts should run queries on raw data, while a database is built on the side. In fact the database should become an implementation detail, imperceptible by the user. To achieve this paradigm shift, query processing should be decoupled from specific data storage formats. Ad-hoc primitives and dynamically synthesized operators are key for just-in-time query optimization and processing. Finally, exploitation of compute and memory resources should be seamless and based on hardware hints; extreme vertical integration is an enemy to forward compatibility.
Bio: Anastasia Ailamaki is a Professor of Computer and Communication Sciences at the Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland. Her research interests are in database systems and applications, and in particular (a) in strengthening the interaction between the database software and emerging hardware and I/O devices, and (b) in automating data management to support computationally- demanding and data-intensive scientific applications. She has received an ERC Consolidator Award (2013), a Finmeccanica endowed chair from the Computer Science Department at Carnegie Mellon (2007), a European Young Investigator Award from the European Science Foundation (2007), an Alfred P. Sloan Research Fellowship (2005), eight best-paper awards in database, storage, and computer architecture conferences (2001-2012), and an NSF CAREER award (2002). She holds a Ph.D. in Computer Science from the University of Wisconsin-Madison in 2000. She is a senior member of the IEEE and a member of the ACM, serves as the ACM SIGMOD vice chair, and has served as a CRA-W mentor.
Keynote 2 (Wednesday April 2nd)
Abstract: Big Data has captured a lot of interest in industry, with anticipation of better decisions, efficient organizations, and many new jobs. Much of the emphasis is on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and technologies that handles volume, including storage and computational techniques to support analysis (Hadoop, NoSQL, MapReduce, etc), and. However, the most important feature of Big Data, the raison d'etre, is none of these 4 Vs -- but value. In this talk, I will forward the concept of Smart Data that is realized by extracting value from a variety of data, and how Smart Data for growing variety (e.g., social, sensor/IoT, health care) of Big Data enable much larger class of applications that can benefit not just large companies but each individual. This requires organized ways to harness and overcome the four V-challenges. In particular, we will need to utilize metadata, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP.
For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. Lastly, for Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships and uses them to better understand new cues in the data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response and smart city. I will present examples from a couple of these.
Bio: Amit P. Sheth (http://knoesis.org/amit) is an educator, researcher, and entrepreneur. He is the LexisNexis Eminent Scholar and founder/executive director of the Ohio Center of Excellent in Knowledge-enabled Computing (Kno.e.sis). Kno.e.sis conducts research in social/sensor/semantic data and Web 3.0 with real-world applications and multidisciplinary solutions for translational research, healthcare and life sciences, cognitive science, material sciences, etc. Kno.e.sis' activities have resulted in Wright State University being recognized as a top organization in the world on World Wide Web in research impact. Prof. Sheth is one of top authors in Computer Science, World Wide Web and databases (cf: Microsoft Academic Search). His research has led to several commercial products, many real-world applications, and two earlier companies with two more in early stages. One of these was Taalee/Voquette/Semagix, which was likely the first company (founded in 1999) that developed Semantic Web enabled search and analysis, and semantic application development platforms.