GRAPE: Parallel Graph Query Engine
The need for graph computations is evident in a multitude of use cases. To support computations on large-scale graphs, several parallel systems have been developed. However, existing graph systems require users to recast algorithms into new models, which makes parallel graph computations as a privilege to experienced users only. Moreover, real world applications often require much more complex graph processing workflows than previously evaluated. In response to these challenges, the thesis presents GRAPE, a distributed graph computation system, shipped with various applications for social network analysis, social media marketing and functional dependencies on graphs. Firstly, the thesis presents the foundation of GRAPE. The principled approach of GRAPE is based on partial evaluation and incremental computation. Sequential graph algorithms can be plugged into GRAPE with minor changes, and get parallelized as a whole. The termination and correctness are guaranteed under a monotonic condition. Secondly, as an application on GRAPE, the thesis proposes graph-pattern association rules (GPARs) for social media marketing. GPARs help users discover regularities between entities in social graphs and identify potential customers by exploring social influence. The thesis studies the problem of discovering top-k diversified GPARs and the problem of identifying potential customers with GPARs. Although both are NP- hard, parallel scalable algorithms on GRAPE are developed, which guarantee a polynomial speedup over sequential algorithms with the increase of processors. Thirdly, the thesis proposes quantified graph patterns (QGPs), an extension of graph patterns by supporting simple counting quantifiers on edges. QGPs naturally express universal and existential quantification, numeric and ratio aggregates, as well as negation. The thesis proves that the matching problem of QGPs remains NP-complete in the absence of negation, and is DP-complete for general QGPs. In addition, the thesis introduces quantified graph association rules defined with QGPs, to identify potential customers in social media marketing. Finally, to address the issue of data consistency, the thesis proposes a class of functional dependencies for graphs, referred to as GFDs. GFDs capture both attribute-value dependencies and topological structures of entities. The satisfiability and implication problems for GFDs are studied and proved to be coNP-complete and NP-complete, respectively. The thesis also proves that the validation problem for GFDs is coNP- complete. The parallel algorithms developed on GRAPE verify that GFDs provide an effective approach to detecting inconsistencies in knowledge and social graphs.