Simulating Extreme-scale Distributed Systems with CODES
Host
Department of Computer ScienceDescription
Simulating future extreme-scale parallel/distributed systems can be an important component in understanding these systems at a scale at which prototyping cannot feasibly reach. For HPC, big-data/cloud, or other computing/analysis platforms, the design decisions for developing systems that scale beyond current-generation systems are multi-dimensional in nature, including distributed storage software/hardware solutions, network topologies within and between computing centers, algorithms for data analysis and compute services in heterogeneous software/hardware environments, etc.
In this talk, I will first give a high-level motivation for, and overview of, the CODES and underlying ROSS parallel discrete event simulation frameworks, including previous and ongoing areas of research involving them.
Second, I will give an in-depth look at the components making up CODES, including model configuration, utilities for locating and communicating with simulation entities, models for networking, storage, and other miscellany, and interfaces for introducing application workloads into the simulator. I will end with a discussion of the time warp parallel computation model and the concept of reverse computation, the method by which ROSS and subsequently CODES achieves best-in-class scalability.
John Jenkins is a Postdoctoral Appointee at Argonne National Laboratory. He received his Ph.D. in Computer Science from North Carolina State University in 2013. His research interests include parallel I/O, parallel/distributed storage and analysis systems, and parallel discrete event simulation.