Support Points – A New Way to Compact Data, with Applications to Optimal MCMC Reduction and Experimental Design

Time

-

Locations

Rettaliata Engineering Center, Room 104

Host

Department of Applied Mathematics

Speaker

Simon Mak
H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology
https://sites.google.com/view/simonmak/



Description

The trichotomy between energy, matter and data has intrigued mathematicians, statisticians and physicists alike over the past century; the duality between energy and matter is provided in Einstein’s famous \(E=mc^2\) relation, while the link between matter (entropy) and data (information) is explored in seminal works by Schrödinger and Shannon. In this talk, we look to exploit the third connection between energy and data, with the goal of compacting large datasets (or in the infinite setting, distributions) into smaller, minimum-energy point sets called support points. Particularly in an era of big data, these representative point sets can be used to solve many practical engineering, statistical or machine-learning problems. Support points are obtained by minimizing the energy distance, a distance-based potential measure proposed by Székely and Rizzo (2004) for testing goodness-of- fit. These point sets enjoy several nice theoretical properties on distributional convergence and integration performance, and allow for efficient and parallelizable data reduction using difference-of-convex optimization methods. We highlight two important uses of support points for (a) optimally compacting Markov chain Monte Carlo (MCMC) samples in Bayesian computation, and (b) uncertainty propagation in expensive engineering simulations. This talk concludes with several interesting developments on adapting support points as robust experimental designs for deterministic computer simulations.

Tags: