Speaker: Minghung Shih, ECpE Graduate Student
Advisor: Srikanta Tirthapura
Title: Optimization-Driven Sampling for Analyzing Large-Scale Data
Abstract: Sampling is a popular approach to provide a quick approximation over large-scale data. By avoiding analyzing the entire data, analysis could be done within a much shorter process time. However, given a limited budget of sample size, how to build an optimal sample such that the approximated results could be as accurate as possible is a major challenge. In this research, we propose an optimization-driven sampling (ODS) framework that formulates sampling as an optimization problem with regard to how the sample will be analyzed. We study ODS with different workloads over both static and streaming data, and provide experiment results to evaluate the quality of samples generated by ODS and other traditional sampling methods.