101 %% \fntext[label2]{} |
101 %% \fntext[label2]{} |
102 %% \cortext[cor1]{} |
102 %% \cortext[cor1]{} |
103 %% \address{Address\fnref{label3}} |
103 %% \address{Address\fnref{label3}} |
104 %% \fntext[label3]{} |
104 %% \fntext[label3]{} |
105 |
105 |
106 \title{VF2++ --- An Improved Subgraph Isomorphism Algorithm} |
106 \title{Improved Algorithms for Matching Biological Graphs} |
107 |
107 |
108 %% use optional labels to link authors explicitly to addresses: |
108 %% use optional labels to link authors explicitly to addresses: |
109 %% \author[label1,label2]{} |
109 %% \author[label1,label2]{} |
110 %% \address[label1]{} |
110 %% \address[label1]{} |
111 %% \address[label2]{} |
111 %% \address[label2]{} |
112 |
112 |
113 \author[egres,elte]{Alp{\'a}r J{\"u}ttner} |
113 \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi} |
114 \ead{alpar@cs.elte.hu} |
114 |
115 \author[elte]{P{\'e}ter Madarasi} |
115 \address{Dept of Operations Research, ELTE} |
116 \ead{madarasip@caesar.elte.hu} |
|
117 \address[egres]{MTA-ELTE Egerv{\'a}ry Research Group, Budapest, Hungary.} |
|
118 \address[elte]{Department of Operations Research, E{\"o}tv{\"o}s Lor{\'a}nd University, Budapest, Hungary.} |
|
119 |
116 |
120 \begin{abstract} |
117 \begin{abstract} |
121 |
118 Subgraph isomorphism is a well-known NP-Complete problem, while its |
122 This paper presents a largely improved version of the VF2 algorithm |
119 special case, the graph isomorphism problem is one of the few problems |
123 for the \emph{Subgraph Isomorphism Problem}. The improvements are |
120 in NP neither known to be in P nor NP-Complete. Their appearance in |
124 twofold. Firstly, it is based on a new approach for determining the |
121 many fields of application such as pattern analysis, computer vision |
125 matching order of the nodes, and secondly, more efficient - |
122 questions and the analysis of chemical and biological systems has |
126 nevertheless easier to compute - cutting rules significantly |
123 fostered the design of various algorithms for handling special graph |
127 reducing the search space are applied. |
124 structures. |
128 |
125 |
129 In addition to the usual subgraph isomorphism, the paper also |
126 This paper presents VF2++, a new algorithm based on the original VF2, |
130 presents specialized versions for the \emph{Induced Subgraph |
127 which runs significantly faster on most test cases and performs |
131 Isomorphism} and for the \emph{Graph Isomorphism Problems}. |
128 especially well on special graph classes stemming from biological |
132 |
129 questions. VF2++ handles graphs of thousands of nodes in practically |
133 Finally, an extensive experimental evaluation is provided using a |
130 near linear time including preprocessing. Not only is it an improved |
134 wide range of inputs, including both real life biological and |
131 version of VF2, but in fact, it is by far the fastest existing |
135 chemical datasets and standard randomly generated graph series. The |
132 algorithm especially on biological graphs. |
136 results show major and consistent running time improvements over the |
133 |
137 other known methods. |
134 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking |
138 |
135 into account the structure and the node labeling of the graph, VF2++ |
139 The C++ implementations of the algorithms are available open source as |
136 determines a state order in which most of the unfruitful branches of |
140 the part of the LEMON graph and network optimization library. |
137 the search space can be pruned immediately. Secondly, introducing more |
141 |
138 efficient - nevertheless still easier to compute - cutting rules |
|
139 reduces the chance of going astray even further. |
|
140 |
|
141 In addition to the usual subgraph isomorphism, specialized versions |
|
142 for induced subgraph isomorphism and for graph isomorphism are |
|
143 presented. VF2++ has gained a runtime improvement of one order of |
|
144 magnitude respecting induced subgraph isomorphism and a better |
|
145 asymptotical behaviour in the case of graph isomorphism problem. |
|
146 |
|
147 After having provided the description of VF2++, in order to evaluate |
|
148 its effectiveness, an extensive comparison to the contemporary other |
|
149 algorithms is shown, using a wide range of inputs, including both real |
|
150 life biological and chemical datasets and standard randomly generated |
|
151 graph series. |
|
152 |
|
153 The work was motivated and sponsored by QuantumBio Inc., and all the |
|
154 developed algorithms are available as the part of the open source |
|
155 LEMON graph and network optimization library |
|
156 (http://lemon.cs.elte.hu). |
142 \end{abstract} |
157 \end{abstract} |
143 |
158 |
144 \begin{keyword} |
159 \begin{keyword} |
145 Computational Biology, Subgraph Isomorphism Problem |
|
146 %% keywords here, in the form: keyword \sep keyword |
160 %% keywords here, in the form: keyword \sep keyword |
147 |
161 |
148 %% PACS codes here, in the form: \PACS code \sep code |
162 %% PACS codes here, in the form: \PACS code \sep code |
149 |
163 |
150 %% MSC codes here, in the form: \MSC code \sep code |
164 %% MSC codes here, in the form: \MSC code \sep code |
178 as graphs, for instance, a molecular structure can be |
192 as graphs, for instance, a molecular structure can be |
179 considered as a graph, whose nodes correspond to atoms and whose |
193 considered as a graph, whose nodes correspond to atoms and whose |
180 edges to chemical bonds. The similarity and dissimilarity of |
194 edges to chemical bonds. The similarity and dissimilarity of |
181 objects corresponding to nodes are incorporated to the model |
195 objects corresponding to nodes are incorporated to the model |
182 by \emph{node labels}. Understanding such networks basically |
196 by \emph{node labels}. Understanding such networks basically |
183 requires finding specific subgraphs, thus calls for efficient |
197 requires finding specific subgraphs, thus it calls for efficient |
184 graph matching algorithms. |
198 graph matching algorithms. |
185 |
199 |
186 Other real-world fields related to some |
200 Other real-world fields related to some |
187 variants of graph matching include pattern recognition |
201 variants of graph matching include pattern recognition |
188 and machine vision \cite{HorstBunkeApplications}, symbol recognition |
202 and machine vision \cite{HorstBunkeApplications}, symbol recognition |
253 This paper introduces \emph{VF2++}, a new further improved algorithm |
267 This paper introduces \emph{VF2++}, a new further improved algorithm |
254 for the graph and (induced)subgraph isomorphism problem, which uses |
268 for the graph and (induced)subgraph isomorphism problem, which uses |
255 efficient cutting rules and determines a node order in which VF2 runs |
269 efficient cutting rules and determines a node order in which VF2 runs |
256 significantly faster on practical inputs. |
270 significantly faster on practical inputs. |
257 |
271 |
|
272 graph and network optimization library |
|
273 |
258 This project was initiated and sponsored by QuantumBio |
274 This project was initiated and sponsored by QuantumBio |
259 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source |
275 Inc.\cite{QUANTUMBIO} and the implementation is available open source as the |
260 code --- has been published as a part of LEMON\cite{LEMON} open source |
276 part of the LEMON\cite{LEMON} graph and network optimization library. |
261 graph library. |
277 |
262 |
278 Outline: Section~\ref{sec:ProbStat} defines the problems to be solved, Section~\ref{sec:VF2Alg} provides a description of VF2, Section~\ref{sec:VF2ppAlg} introduces VF2++, a new graph matching algorithm, Section~\ref{sec:VF2ppImpl} presents the details of an efficient implementation of VF2++, and Section~\ref{sec:ExpRes} compares VF2++ to the state of the art algorithm. |
263 Outline: Section~\ref{sec:ProbStat} defines the problems to be solved, Section~\ref{sec:VF2Alg} provides a description of VF2, Section~\ref{sec:VF2ppAlg} introduces VF2++, a new graph matching algorithm, Section~\ref{sec:VF2ppImpl} presents the details of an efficient implementation of VF2++, and Section~\ref{sec:ExpRes} compares VF2++ to a state of the art algorithm. |
|
264 |
279 |
265 \section{Problem Statement}\label{sec:ProbStat} |
280 \section{Problem Statement}\label{sec:ProbStat} |
266 This section provides a formal description of the problems to be |
281 This section provides a formal description of the problems to be |
267 solved. |
282 solved. |
268 \subsection{Definitions} |
283 \subsection{Definitions} |