damecco.tex
changeset 26 42fbe17f0e3b
parent 24 bdf97dafabfb
child 27 497868c58d36
equal deleted inserted replaced
21:b2015807e1bf 23:54760161a072
   101 %% \fntext[label2]{}
   101 %% \fntext[label2]{}
   102 %% \cortext[cor1]{}
   102 %% \cortext[cor1]{}
   103 %% \address{Address\fnref{label3}}
   103 %% \address{Address\fnref{label3}}
   104 %% \fntext[label3]{}
   104 %% \fntext[label3]{}
   105 
   105 
   106 \title{VF2++ --- An Improved Subgraph Isomorphism Algorithm}
   106 \title{Improved Algorithms for Matching Biological Graphs}
   107 
   107 
   108 %% use optional labels to link authors explicitly to addresses:
   108 %% use optional labels to link authors explicitly to addresses:
   109 %% \author[label1,label2]{}
   109 %% \author[label1,label2]{}
   110 %% \address[label1]{}
   110 %% \address[label1]{}
   111 %% \address[label2]{}
   111 %% \address[label2]{}
   112 
   112 
   113 \author[egres,elte]{Alp{\'a}r J{\"u}ttner}
   113 \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi}
   114 \ead{alpar@cs.elte.hu}
   114 
   115 \author[elte]{P{\'e}ter Madarasi}
   115 \address{Dept of Operations Research, ELTE}
   116 \ead{madarasip@caesar.elte.hu}
       
   117 \address[egres]{MTA-ELTE Egerv{\'a}ry Research Group, Budapest, Hungary.}
       
   118 \address[elte]{Department of Operations Research, E{\"o}tv{\"o}s Lor{\'a}nd University, Budapest, Hungary.}
       
   119 
   116 
   120 \begin{abstract}
   117 \begin{abstract}
   121 
   118 Subgraph isomorphism is a well-known NP-Complete problem, while its
   122   This paper presents a largely improved version of the VF2 algorithm
   119 special case, the graph isomorphism problem is one of the few problems
   123   for the \emph{Subgraph Isomorphism Problem}. The improvements are
   120 in NP neither known to be in P nor NP-Complete. Their appearance in
   124   twofold. Firstly, it is based on a new approach for determining the
   121 many fields of application such as pattern analysis, computer vision
   125   matching order of the nodes, and secondly, more efficient -
   122 questions and the analysis of chemical and biological systems has
   126   nevertheless easier to compute - cutting rules significantly
   123 fostered the design of various algorithms for handling special graph
   127   reducing the search space are applied.
   124 structures.
   128 
   125 
   129   In addition to the usual subgraph isomorphism, the paper also
   126 This paper presents VF2++, a new algorithm based on the original VF2,
   130   presents specialized versions for the \emph{Induced Subgraph
   127 which runs significantly faster on most test cases and performs
   131     Isomorphism} and for the \emph{Graph Isomorphism Problems}.
   128 especially well on special graph classes stemming from biological
   132 
   129 questions. VF2++ handles graphs of thousands of nodes in practically
   133   Finally, an extensive experimental evaluation is provided using a
   130 near linear time including preprocessing. Not only is it an improved
   134   wide range of inputs, including both real life biological and
   131 version of VF2, but in fact, it is by far the fastest existing
   135   chemical datasets and standard randomly generated graph series. The
   132 algorithm especially on biological graphs.
   136   results show major and consistent running time improvements over the
   133 
   137   other known methods.
   134 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking
   138  
   135 into account the structure and the node labeling of the graph, VF2++
   139   The C++ implementations of the algorithms are available open source as
   136 determines a state order in which most of the unfruitful branches of
   140   the part of the LEMON graph and network optimization library.
   137 the search space can be pruned immediately. Secondly, introducing more
   141   
   138 efficient - nevertheless still easier to compute - cutting rules
       
   139 reduces the chance of going astray even further.
       
   140 
       
   141 In addition to the usual subgraph isomorphism, specialized versions
       
   142 for induced subgraph isomorphism and for graph isomorphism are
       
   143 presented. VF2++ has gained a runtime improvement of one order of
       
   144 magnitude respecting induced subgraph isomorphism and a better
       
   145 asymptotical behaviour in the case of graph isomorphism problem.
       
   146 
       
   147 After having provided the description of VF2++, in order to evaluate
       
   148 its effectiveness, an extensive comparison to the contemporary other
       
   149 algorithms is shown, using a wide range of inputs, including both real
       
   150 life biological and chemical datasets and standard randomly generated
       
   151 graph series.
       
   152 
       
   153 The work was motivated and sponsored by QuantumBio Inc., and all the
       
   154 developed algorithms are available as the part of the open source
       
   155 LEMON graph and network optimization library
       
   156 (http://lemon.cs.elte.hu).
   142 \end{abstract}
   157 \end{abstract}
   143 
   158 
   144 \begin{keyword}
   159 \begin{keyword}
   145   Computational Biology, Subgraph Isomorphism Problem
       
   146 %% keywords here, in the form: keyword \sep keyword
   160 %% keywords here, in the form: keyword \sep keyword
   147 
   161 
   148 %% PACS codes here, in the form: \PACS code \sep code
   162 %% PACS codes here, in the form: \PACS code \sep code
   149 
   163 
   150 %% MSC codes here, in the form: \MSC code \sep code
   164 %% MSC codes here, in the form: \MSC code \sep code
   178 as graphs, for instance, a molecular structure can be
   192 as graphs, for instance, a molecular structure can be
   179 considered as a graph, whose nodes correspond to atoms and whose
   193 considered as a graph, whose nodes correspond to atoms and whose
   180 edges to chemical bonds. The similarity and dissimilarity of
   194 edges to chemical bonds. The similarity and dissimilarity of
   181 objects corresponding to nodes are incorporated to the model
   195 objects corresponding to nodes are incorporated to the model
   182 by \emph{node labels}. Understanding such networks basically
   196 by \emph{node labels}. Understanding such networks basically
   183 requires finding specific subgraphs, thus calls for efficient
   197 requires finding specific subgraphs, thus it calls for efficient
   184 graph matching algorithms.
   198 graph matching algorithms.
   185 
   199 
   186 Other real-world fields related to some
   200 Other real-world fields related to some
   187 variants of graph matching include pattern recognition
   201 variants of graph matching include pattern recognition
   188 and machine vision \cite{HorstBunkeApplications}, symbol recognition
   202 and machine vision \cite{HorstBunkeApplications}, symbol recognition
   253 This paper introduces \emph{VF2++}, a new further improved algorithm
   267 This paper introduces \emph{VF2++}, a new further improved algorithm
   254 for the graph and (induced)subgraph isomorphism problem, which uses
   268 for the graph and (induced)subgraph isomorphism problem, which uses
   255 efficient cutting rules and determines a node order in which VF2 runs
   269 efficient cutting rules and determines a node order in which VF2 runs
   256 significantly faster on practical inputs.
   270 significantly faster on practical inputs.
   257 
   271 
       
   272 graph and network optimization library
       
   273 
   258 This project was initiated and sponsored by QuantumBio
   274 This project was initiated and sponsored by QuantumBio
   259 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
   275 Inc.\cite{QUANTUMBIO} and the implementation is available open source as the 
   260 code --- has been published as a part of LEMON\cite{LEMON} open source
   276 part of the LEMON\cite{LEMON} graph and network optimization library.
   261 graph library.
   277 
   262 
   278 Outline: Section~\ref{sec:ProbStat} defines the problems to be solved, Section~\ref{sec:VF2Alg} provides a description of VF2, Section~\ref{sec:VF2ppAlg} introduces VF2++, a new graph matching algorithm, Section~\ref{sec:VF2ppImpl} presents the details of an efficient implementation of VF2++, and Section~\ref{sec:ExpRes} compares VF2++ to the state of the art algorithm.
   263 Outline: Section~\ref{sec:ProbStat} defines the problems to be solved, Section~\ref{sec:VF2Alg} provides a description of VF2, Section~\ref{sec:VF2ppAlg} introduces VF2++, a new graph matching algorithm, Section~\ref{sec:VF2ppImpl} presents the details of an efficient implementation of VF2++, and Section~\ref{sec:ExpRes} compares VF2++ to a state of the art algorithm.
       
   264 
   279 
   265 \section{Problem Statement}\label{sec:ProbStat}
   280 \section{Problem Statement}\label{sec:ProbStat}
   266 This section provides a formal description of the problems to be
   281 This section provides a formal description of the problems to be
   267 solved.
   282 solved.
   268 \subsection{Definitions}
   283 \subsection{Definitions}