# HG changeset patch # User Madarasi Peter # Date 1480543421 -3600 # Node ID 42fbe17f0e3b414d568c074e66ff51a335e6c1a9 # Parent bdf97dafabfb447184dd93b2a6ed7ce4689b097d fine tune diff -r bdf97dafabfb -r 42fbe17f0e3b damecco.tex --- a/damecco.tex Wed Nov 30 22:45:35 2016 +0100 +++ b/damecco.tex Wed Nov 30 23:03:41 2016 +0100 @@ -103,46 +103,60 @@ %% \address{Address\fnref{label3}} %% \fntext[label3]{} -\title{VF2++ --- An Improved Subgraph Isomorphism Algorithm} +\title{Improved Algorithms for Matching Biological Graphs} %% use optional labels to link authors explicitly to addresses: %% \author[label1,label2]{} %% \address[label1]{} %% \address[label2]{} -\author[egres,elte]{Alp{\'a}r J{\"u}ttner} -\ead{alpar@cs.elte.hu} -\author[elte]{P{\'e}ter Madarasi} -\ead{madarasip@caesar.elte.hu} -\address[egres]{MTA-ELTE Egerv{\'a}ry Research Group, Budapest, Hungary.} -\address[elte]{Department of Operations Research, E{\"o}tv{\"o}s Lor{\'a}nd University, Budapest, Hungary.} +\author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi} + +\address{Dept of Operations Research, ELTE} \begin{abstract} +Subgraph isomorphism is a well-known NP-Complete problem, while its +special case, the graph isomorphism problem is one of the few problems +in NP neither known to be in P nor NP-Complete. Their appearance in +many fields of application such as pattern analysis, computer vision +questions and the analysis of chemical and biological systems has +fostered the design of various algorithms for handling special graph +structures. - This paper presents a largely improved version of the VF2 algorithm - for the \emph{Subgraph Isomorphism Problem}. The improvements are - twofold. Firstly, it is based on a new approach for determining the - matching order of the nodes, and secondly, more efficient - - nevertheless easier to compute - cutting rules significantly - reducing the search space are applied. +This paper presents VF2++, a new algorithm based on the original VF2, +which runs significantly faster on most test cases and performs +especially well on special graph classes stemming from biological +questions. VF2++ handles graphs of thousands of nodes in practically +near linear time including preprocessing. Not only is it an improved +version of VF2, but in fact, it is by far the fastest existing +algorithm especially on biological graphs. - In addition to the usual subgraph isomorphism, the paper also - presents specialized versions for the \emph{Induced Subgraph - Isomorphism} and for the \emph{Graph Isomorphism Problems}. +The reason for VF2++' superiority over VF2 is twofold. Firstly, taking +into account the structure and the node labeling of the graph, VF2++ +determines a state order in which most of the unfruitful branches of +the search space can be pruned immediately. Secondly, introducing more +efficient - nevertheless still easier to compute - cutting rules +reduces the chance of going astray even further. - Finally, an extensive experimental evaluation is provided using a - wide range of inputs, including both real life biological and - chemical datasets and standard randomly generated graph series. The - results show major and consistent running time improvements over the - other known methods. - - The C++ implementations of the algorithms are available open source as - the part of the LEMON graph and network optimization library. - +In addition to the usual subgraph isomorphism, specialized versions +for induced subgraph isomorphism and for graph isomorphism are +presented. VF2++ has gained a runtime improvement of one order of +magnitude respecting induced subgraph isomorphism and a better +asymptotical behaviour in the case of graph isomorphism problem. + +After having provided the description of VF2++, in order to evaluate +its effectiveness, an extensive comparison to the contemporary other +algorithms is shown, using a wide range of inputs, including both real +life biological and chemical datasets and standard randomly generated +graph series. + +The work was motivated and sponsored by QuantumBio Inc., and all the +developed algorithms are available as the part of the open source +LEMON graph and network optimization library +(http://lemon.cs.elte.hu). \end{abstract} \begin{keyword} - Computational Biology, Subgraph Isomorphism Problem %% keywords here, in the form: keyword \sep keyword %% PACS codes here, in the form: \PACS code \sep code @@ -180,7 +194,7 @@ edges to chemical bonds. The similarity and dissimilarity of objects corresponding to nodes are incorporated to the model by \emph{node labels}. Understanding such networks basically -requires finding specific subgraphs, thus calls for efficient +requires finding specific subgraphs, thus it calls for efficient graph matching algorithms. Other real-world fields related to some @@ -255,12 +269,13 @@ efficient cutting rules and determines a node order in which VF2 runs significantly faster on practical inputs. +graph and network optimization library + This project was initiated and sponsored by QuantumBio -Inc.\cite{QUANTUMBIO} and the implementation --- along with a source -code --- has been published as a part of LEMON\cite{LEMON} open source -graph library. +Inc.\cite{QUANTUMBIO} and the implementation is available open source as the +part of the LEMON\cite{LEMON} graph and network optimization library. -Outline: Section~\ref{sec:ProbStat} defines the problems to be solved, Section~\ref{sec:VF2Alg} provides a description of VF2, Section~\ref{sec:VF2ppAlg} introduces VF2++, a new graph matching algorithm, Section~\ref{sec:VF2ppImpl} presents the details of an efficient implementation of VF2++, and Section~\ref{sec:ExpRes} compares VF2++ to a state of the art algorithm. +Outline: Section~\ref{sec:ProbStat} defines the problems to be solved, Section~\ref{sec:VF2Alg} provides a description of VF2, Section~\ref{sec:VF2ppAlg} introduces VF2++, a new graph matching algorithm, Section~\ref{sec:VF2ppImpl} presents the details of an efficient implementation of VF2++, and Section~\ref{sec:ExpRes} compares VF2++ to the state of the art algorithm. \section{Problem Statement}\label{sec:ProbStat} This section provides a formal description of the problems to be