Ibrahim Sabek (University of Minnesota, Twin Cities); Mohamed Mokbel (Qatar Computing Research Institute)*
Abstract: The proliferation in amounts of generated data has propelled the rise of scalable machine learning solutions to efficiently analyze and extract useful insights from such data. Meanwhile, spatial data has become ubiquitous, e.g., GPS data, with increasingly sheer sizes in recent years. The applications of big spatial data span a wide spectrum of interests including tracking infectious disease, climate change simulation, drug addiction, among others. Consequently, major research efforts are exerted to support efficient analysis and intelligence inside these applications by either providing spatial extensions to existing machine learning solutions or building new solutions from scratch. In this 90-minute tutorial, we comprehensively review the state-of-the-art work in the intersection of machine learning and big spatial data. We cover existing research efforts and challenges in three major areas of machine learning, namely, data analysis, deep learning and statistical inference. We also discuss the existing end-to-end systems, and highlight open problems and challenges for future research in this area.
Ibrahim Sabek Ibrahim Sabek is a PhD candidate at the department of Computer Science and Engineering, University of Minnesota. He received his M.Sc. degree at the same department in 2017. His research interests lie in the intersection area between big spatial data management, spatial computing, and scalable machine learning systems. Ibrahim has been awarded the University of Minnesota Doctoral Dissertation Fellowship in 2019 for this dissertation focus on scalable machine learning for big spatial data and applications. His research work has been nominated for the Best Paper Award of ACM SIGSPATIAL 2018, and has been qualified to the final stage of ACM SIGMOD Student Research Competition (SRC) 2017. During his PhD, he has collaborated with NEC Labs America, and Microsoft Research (MSR) in Redmond. For more information, please visit: http://www.cs.umn.edu/∼sabek.
Mohamed F. Mokbel Mohamed F. Mokbel is the Chief Scientist of Qatar Computing Research Institute and a Professor at University of Minnesota. His current research interests focus on systems and machine learning techniques for big spatial data and applications. His research work has been recognized by the VLDB 10-years Best Paper Award, four conference Best Paper Awards, and the NSF CAREER Award. Mohamed has delivered seven tutorials in VLDB/SIGMOD/ICDE/EDBT conferences, in addition to tutorials in other communities’ first- tier venues, including IEEE ICDM and ACM CCS. Mohamed is the past elected Chair of ACM SIGPATIAL, current Editor- in-Chief for Distributed and Parallel Databases Journal, and on the editorial board of ACM Books, ACM TODS, VLDB Journal, ACM TSAS, and GoeInformatica journals. He has also served as PC Vice Chair of ACM SIGMOD and PC Co-Chair for ACM SIGSPATIAL and IEEE MDM. For more information, please visit: www.cs.umn.edu/∼mokbel.
Evaggelia Pitoura (Univ. of Ioannina, Greece); Georgia Koutrika (Athena Research Center, Greece); Kostas Stefanidis (Tampere University, Finland)
Abstract: In this tutorial, we pay special attention to the concept of fairness in rankings and recommender systems. By fairness, we typically mean lack of bias. It is not correct to assume that insights achieved via computations on data are unbiased simply because data was collected automatically or processing was performed algorithmically. Bias may come from the algorithm, reflecting, for example, commercial or other preferences of its designers, or even from the actual data, for example, if a survey contains biased questions. In this tutorial, we review a number of definitions of fairness that aim at addressing discrimination, bias amplification, and ensure transparency. We organize these definitions around the notions of individual and group fairness. We also present methods for achieving fairness in rankings and recommendations, taking a cross-type view, distinguishing them between pre-processing, in-processing and post-processing approaches. We conclude with a discussion on new research directions that arise.
Georgia Koutrika Georgia Koutrika is Research Director at Athena Research Center in Greece. She has more than 15 years of experience in multiple roles at HP Labs, IBM Almaden, and Stanford, building innovative solutions for recommendations, data analytics and exploration. Her work has been incorporated in commercial products, described in 9 granted patents and 18 patent applications in the US and worldwide, and published in more than 80 papers in top-tier conferences and journals. She is an ACM Distinguished Speaker and associate editor for TKDE and PVLDB. She has served or serves as PC member or co-chair of many conferences.
Kostas Stefanidis Kostas Stefanidis is an Associate Professor on Data Science at the Tampere University, Finland. He got his PhD in personalized data management from the Univ. of Ioannina, Greece. His research interests lie in the intersection of databases, information retrieval and the Web, and include personalization and recommender systems, and large-scale entity resolution and information integration. His publications include more than 80 papers in peer-reviewed conferences and journals, including SIGMOD, ICDE, and ACM TODS, and a book on entity resolution in the Web of data.
Maria K Krommyda (National Technical University of Athens)*; Verena Kantere (National Technical University of Athens)
Abstract: The wide adoption of the RDF data model, as well as the Linked Open Data initiative, have made available large linked datasets that have the potential to offer invaluable knowledge. Accessing, evaluating and understanding these datasets as published, though, requires extensive training and experience in the field of the Semantic Web, making these valuable sources of information inaccessible to a wider audience. In the recent years, there have been many efforts to create systems that allow the visualization and exploration of this information. Some of these systems rely on techniques that allow them to limit the volume of the displayed information, by providing aggregated, filtered or summarized access to the datasets while others initialize the exploration of the dataset based on actions performed by the users, such as keyword searches and queries. The underlying technique is key for the sustainability of the system, the definition of the requirements that the input must comply with, the datasets that can be visualized as well as the visualization types provided. We present here a survey on these techniques, their strengths and weaknesses as well as the datasets that they can support. The survey will provide the reader with a deep understanding of the challenges regarding the visualization of large linked datasets, a categorization of the developed techniques to resolve them as well as an overview of the available systems and their functionalities.
Maria Krommyda Maria Krommyda is a PhD candidate at the School of Electrical and Computer Engineering in the National Technical University of Athens (NTUA). She is also working as a researcher at the Institute of Communication and Computer Systems where she is working on the development of methods, algorithms and systems for data management, storage, indexing and provision, of large datasets of geo-spatial information.
Verena Kantere Verena Kantere is an Assistant Professor at the School of ECE of the NTUA. Before she was a Maıtre d’Enseignement et de Recherche at the Centre Universitaire d’ Informatique of the University of Geneva. Previously, she was a tenure-track junior assistant professor at the Department of Electrical Engineering and Information Technology at the Cyprus University of Technology. She has received a Diploma and a Ph.D. from the NTUA and a M.Sc. from the Department of Computer Science at the University of Toronto.
Mohammad Javad Amiri (University of California, Santa Barbara)* & Divy Agrawal (University of California, Santa Barbara) & Amr El Abbadi (University of California Santa Barbara)
Abstract: Large scale data management systems utilize consensus protocols to provide fault tolerance. Consensus protocols are extensively used in the distributed database infrastructure of large enterprises such as Google, Amazon, and Facebook as well as permissioned blockchain systems like IBM's Hyperledger Fabric. In the last four decades, numerous consensus protocols have been proposed to cover a broad spectrum of distributed systems. On one hand, distributed networks might be synchronous, partially synchronous, or asynchronous, and on the other hand, a distributed system might include crash-only nodes, Byzantine nodes or both. In addition, a consensus protocol might follow a pessimistic or optimistic strategy to process transactions. Furthermore, while traditional consensus protocols assume a priori known set of nodes, in permissionless blockchains, nodes are assumed to be unknown. Finally, consensus protocols have explored a variety of performance trade-offs between the number of phases/messages (latency), number of required processors, message complexity, and the activity level of participants (replicas and clients). In this tutorial we discuss existing consensus protocols, classify them into different categories based on their assumptions on network synchrony, failure model of nodes, etc., and elaborate on their main advantages and limitations.
Mohammad Javad Amiri Mohammad Javad Amiri is a PhD student at the University of California at Santa Barbara. His current work spans research topics such as Distributed systems, large-scale data management, and business processes management.
Divyakant Agrawal Divyakant Agrawal is a Professor of Computer Science at the University of California at Santa Barbara. His current interests are in the area of scalable data management and data analysis in cloud computing environments, security and privacy of data in the cloud, scalable analytics over big data, and Blockchain. Prof. Agrawal is an ACM Distinguished Scientist (2010), an ACM Fellow (2012), an IEEE Fellow (2012), and an AAAS Fellow (2016).
Amr El Abbadi Amr El Abbadi is a Professor of Computer Science at the University of California, Santa Barbara. Prof. El Abbadi is an ACM Fellow, AAAS Fellow, and IEEE Fellow. He was Chair of the Computer Science Department at UCSB from 2007 to 2011. He has served as a journal editor for several database journals and has been Program Chair for multiple database and distributed systems conferences. Prof. El Abbadi was also the co-recipient of the Test of Time Award at EDBT/ICDT 2015. He has published over 400 articles in databases and distributed systems and has supervised over 35 PhD students.
Shantanu Sharma (University of California Irvine)* & Anton Burtsev (University of California Irvine) & Sharad Mehrotra (University of California Irvine)
Abstract: Despite extensive research, secure outsourcing remains an open challenge. This tutorial focuses on recent advances in secure cloud-based data outsourcing based on cryptographic (encryption, secret-sharing, and multi-party computation (MPC)) and hardware-based approaches. We highlight the strengths and weaknesses of state-of-the-art techniques, and conclude that no single approach is likely to emerge as a silver bullet. Thus the key is to merge different hardware and software techniques to work in conjunction using partitioned computing wherein a computation is split across different cryptographic techniques carefully, so as not to compromise security. We highlight some recent work in that direction.
Shantanu Sharma Shantanu Sharma received his Ph.D. degree in Computer Science in 2016 from Ben-Gurion University, Israel. He ob- tained his Master of Technology (M.Tech.) degree in Computer Science from National Institute of Technology, Kurukshetra, India, in 2011. He was awarded a gold medal for the first position in his M.Tech. degree. Currently, he is pursuing his Post Doc at the University of California, Irvine, USA, assisted by Prof. Sharad Mehrotra.
Anton Burtsev Anton Burtsev received the PhD degree in computer science from the University of Utah, in 2013. He is currently an Assistant Adjunct Professor in the Department of Computer Science, University of California, Irvine. Previously, he was a Research Assistant Professor at the University of Utah.
Sharad Mehrotra Sharad Mehrotra received the PhD degree in computer science from the University of Texas, Austin, in 1993. He is currently a professor in the Department of Computer Science, University of California, Irvine. Previously, he was a professor with the University of Illinois at Urbana Champaign. He has received numerous awards and honors, including the 2011 SIGMOD Best Paper Award, 2007 DASFAA Best Paper Award, SIGMOD test of time award, 2012, DASFAA ten year best paper awards for 2013 and 2014, 1998 CAREER Award from the US National Science Foundation (NSF), and ACM ICMR best paper award for 2013.
Cuneyt G Akcora (University of Manitoba)* & Yulia Gel (The University of Texas at Dallas) & Murat Kantarcioglu ((The University of Texas at Dallas)
Abstract: Over the last couple of years, Bitcoin cryptocurrency and the Blockchain technology that forms the basis of Bitcoin have witnessed an unprecedented attention. Designed to facilitate a secure distributed platform without central regulation, Blockchain is heralded as a novel paradigm that will be as powerful as Big Data, Cloud Computing, and Machine Learning. The Blockchain technology garners an ever-increasing interest of researchers in various domains that benefit from scalable cooperation among trustless parties. As Blockchain applications proliferate, so does the complexity and volume of data stored by Blockchains. Analyzing this data has emerged as an important research topic, already leading to methodological advancements in the information sciences. In this tutorial, we offer a holistic view on applied Data Science on Blockchains. Starting with the core components of Blockchain, we will detail the state of the art in Blockchain data analytics for graph, security and finance domains. Our examples will answer questions, such as, how to parse, extract and clean the data stored in Blockchains?, how to store and query Blockchain data? and what features could be computed from Blockchains?
Cuneyt Gurcan Akcora Cuneyt Gurcan Akcora (http://cakcora.github.io) is an Assistant Professor of Computer Science and Statistics at the University of Manitoba, Canada. Before that, he was a fellow in the Departments of Statistics and Computer Science at the University of Texas at Dallas. He received his Ph.D. from University of Insubria, Italy and his M.S. from State University of New York at Buffalo, USA. His primary research interests are Data Science on complex networks and large scale graph analysis, with applications in social, biological, IoT and Blockchain networks. He is a Fulbright Scholarship recipient, and his research works have been published in leading conferences and journals including TKDE, VLDB, ICDM and ICDE.
Yulia R. Gel Yulia R. Gel (https://personal.utdallas.edu/∼yxg142030/) is Professor in the Department of Mathematical Science at the University of Texas at Dallas. Her research interests include statistical foundation of Data Science, inference for random graphs and complex networks, time series analysis, and pre- dictive analytics. She holds a Ph.D in Mathematics, followed by a postdoctoral position in Statistics at the University of Washington. Prior to joining UT Dallas, she was a tenured faculty member at the University of Waterloo, Canada. She also held visiting positions at Johns Hopkins University, Uni- versity of California, Berkeley, and the Isaac Newton Institute for Mathematical Sciences, Cambridge University, UK. She served as a Vice President of the International Society on Business and Industrial Statistics (ISBIS), and is a Fellow of the American Statistical Association.
Murat Kantarcioglu Murat Kantarcioglu, Ph.D. is a Professor in the Computer Science Department and Director of the Data Security and Privacy Lab at the University of Texas at Dallas. He is also a visiting scholar at the Data Privacy Lab at Harvard University. Dr. Kantarcioglu’s research focuses on creating technologies that can efficiently extract useful information from any data without sacrificing privacy or security. He has published over 100 papers in peer reviewed journals and conferences. His research has been supported by grants from NSF, AFOSR, ONR, NSA, and NIH and has received two best paper awards. He is a recipient of the NSF CAREER award and his research has been reported on in the media, including the Boston Globe and ABC News. He holds a B.S. in Computer Engineering from Middle East Technical University, and M.S. and Ph.D degrees in Computer Science from Purdue University. He is a senior member of IEEE and ACM.