III: Small: Real-Time Detection of Structures from a Massive Graph Stream
There is an urgent need to quickly derive actionable intelligence from increasingly large volumes of data. In many domains, including social media analytics and cybersecurity, data contains relationships between entities that can be modeled using a graph, and a question on detecting patterns in data can be transformed into questions on detecting emerging structures in appropriately derived graphs. The goal of this project is to develop algorithms and software for finding significant structures from dynamic graphs in real-time. The project will develop algorithms that are efficient in their use of CPU and memory, and whose performance can be quantified in a mathematically rigorous way. The project will also develop implementations that are expected to process data at a high throughput and can identify emerging structures much faster than current methods. The availability of these methods and implementations will impact the domains of cybersecurity and social network analytics. Software developed will be released as a toolkit of operators that can be used with current stream processing systems. The project will lead to new instructional material in existing courses as well as new courses in data analytics, involve individuals from underrepresented groups, and forge research collaborations with industrial research labs.
The project will consider data that contains an evolving dynamic graph, and develop methods for detecting and enumerating change in the set of (1) dense combinatorial structures such as maximal cliques, quasi-cliques, maximal bicliques and quasi-bicliques in a graph, and (2) temporal structures such as temporal paths and temporal cliques in a time-stamped graph. While there has been significant progress in methods for detecting structures in a massive static graph, the same is not true for a dynamic graph, and often, the state-of-the-art for a dynamic graph is to repeatedly execute a method designed for a static graph. For enumerating the change in the set of structures, the project will take a novel approach of developing change-sensitive algorithms whose processing cost is proportional to the magnitude of change in the set of structures. It will use techniques from the area of approximation algorithms in designing (space and time) efficient methods for enumerating temporal structures from a graph stream.