Divide & Recombine (D&R) with Tessera: High-Performance Computing for Deep Analysis of Big Data and Small
Host
Applied Mathematics & Data Science
Speaker
William S. Cleveland
Shanti S. Gupta Professor of Statistics, Purdue University
http://www.stat.purdue.edu/~wsc/
Description
Abstract: The widely used term "big data" carries with it a notion of computational performance for the analysis of big datasets. But for data analysis, computational performance depends very heavily not just on size but on the computational complexity of the analytic routines used in the analysis. Datasets that have big computational challenges have a very wide range of sizes. Furthermore, the hardware power available to the data analyst is also an important factor. High-performance computing for data analysis can be provided for wide ranges of dataset size, computational complexity, and hardware power by the (D&R) statistical approach, and the Tessera D&R software implementation that makes programming D&R easy (www.tessera.io).
About the speaker: William S. Cleveland is the Shanti S. Gupta Distinguished Professor of Statistics and Courtesy Professor of Computer Science at Purdue University. His areas of methodological research are in statistics, machine learning, and data visualization. He has analyzed data in his research in cyber security, computer networking, visual perception, environmental science, healthcare engineering, public opinion polling, and disease surveillance. In the course of this work, Cleveland has developed many new methods and models for data that are widely used throughout the worldwide technical community. He has led teams developing software systems implementing his methods that have become core programs in many commercial and open-source systems. In 1996 Cleveland was chosen national Statistician of the Year by the Chicago Chapter of the American Statistical Association. In 2002 he was selected as a Highly Cited Researcher by the American Society for Information Science & Technology in the newly formed mathematics category. He is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics, the American Association of the Advancement of Science, and the International Statistical Institute. Today, Cleveland and colleagues develop the Divide & Recombine (D&R) approach to data analysis, and the Tessera software system that implements D&R. This provides high-performance computing for datasets whose sizes, computational complexities, and cluster hardware power range from very small to very big.
Event Topic
Data Science