Computer Science Seminar: Muhammad Ali Gulzar
This event is open to all Illinois Tech faculty, students, and staff.
Abstract
Data-intensive scalable computing (DISC) systems such as MapReduce, Google FlumeJava, and Apache Spark are commonly used today to process terabytes of data. At this scale, rare and buggy corner cases frequently show up in production, leading to a crash after running for days or worse, silently producing corrupted output. Unfortunately, in this domain, “testing on a random” sample rarely guarantees reliability and “printf” debugging methods are expensive. Compared to traditional software, data-centric software poses new challenges in automatic debugging and testing because of the scale, distributed nature, and new programming paradigms.
In this talk, Gulzar will first emphasize the key differences between traditional and data-centric software and how they pose unique engineering challenges. Next, Gulzar will tackle those challenges on two fronts—debugging and testing. First, Gulzar will present BigDebug and BigSift, which redesign interactive and automated debugging primitives tailored for data-centric software. Gulzar will show how we leverage ideas from systems and database research to reduce the debugging time by half and perform precise root-cause analysis in a fraction of the job-execution time. Second, Gulzar will discuss BigTest, which systematically explores dataflow program paths and automatically generates test data that is orders of magnitude smaller yet several times more effective in revealing critical bugs. Finally, the talk will conclude with a broader vision of designing productivity toolkits to support the growing needs of data-centric software in ML, AI, and data science.
Bio
Muhammad Ali Gulzar is a Ph.D. candidate in the University of California Los Angeles’s Department of Computer Science. His research designs and builds systems that improve developer productivity through automated debugging and testing of data-centric software. These systems bring together a unique combination of ideas from software engineering, distributed systems, and databases to accelerate the development of reliable big data applications. His research tools have inspired commercial data processing tools and have also been recognized with the 2017 Google Ph.D. fellowship award, 2018 ACM SRC gold medal, and 2016 “The Best of Vldb” award.