2 %% Copyright 2007, 2008, 2009 Elsevier Ltd
4 %% This file is part of the 'Elsarticle Bundle'.
5 %% ---------------------------------------------
7 %% It may be distributed under the conditions of the LaTeX Project Public
8 %% License, either version 1.2 of this license or (at your option) any
9 %% later version. The latest version of this license is in
10 %% http://www.latex-project.org/lppl.txt
11 %% and version 1.2 or later is part of all distributions of LaTeX
12 %% version 1999/12/01 or later.
14 %% The list of all files belonging to the 'Elsarticle Bundle' is
15 %% given in the file `manifest.txt'.
18 %% Template article for Elsevier's document class `elsarticle'
19 %% with numbered style bibliographic references
22 \documentclass[preprint,12pt]{elsarticle}
24 %% Use the option review to obtain double line spacing
25 %% \documentclass[authoryear,preprint,review,12pt]{elsarticle}
27 %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
28 %% for a journal layout:
29 %% \documentclass[final,1p,times]{elsarticle}
30 %% \documentclass[final,1p,times,twocolumn]{elsarticle}
31 %% \documentclass[final,3p,times]{elsarticle}
32 %% \documentclass[final,3p,times,twocolumn]{elsarticle}
33 %% \documentclass[final,5p,times]{elsarticle}
34 %% \documentclass[final,5p,times,twocolumn]{elsarticle}
36 %% For including figures, graphicx.sty has been loaded in
37 %% elsarticle.cls. If you prefer to use the old commands
38 %% please give \usepackage{epsfig}
40 %% The amssymb package provides various useful mathematical symbols
42 %% The amsthm package provides extended theorem environments
43 %% \usepackage{amsthm}
45 %% The lineno packages adds line numbers. Start line numbering with
46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
47 %% for the whole article with \linenumbers.
48 %% \usepackage{lineno}
51 %% \usepackage[pdftex]{graphicx}
54 \pgfplotsset{width=9cm}
55 \pgfplotsset{compat=1.8}
58 \usepackage{subcaption}
60 \usepackage{algorithm}
61 \usepackage{algpseudocode}
64 \usepackage{amsthm,amssymb}
65 \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}
67 \newtheorem{theorem}{Theorem}[subsection]
68 \newtheorem{corollary}{Corollary}[theorem]
69 \newtheorem{claim}[theorem]{Claim}
71 \newtheorem{definition}{Definition}[subsection]
72 \newtheorem{notation}{Notation}[subsection]
73 \newtheorem{example}{Example}[subsection]
74 \usetikzlibrary{decorations.markings}
75 \let\oldproofname=\proofname
76 %% \renewcommand{\proofname}{\rm\bf{Proof:}}
78 \journal{Discrete Applied Mathematics}
84 %% Title, authors and addresses
86 %% use the tnoteref command within \title for footnotes;
87 %% use the tnotetext command for theassociated footnote;
88 %% use the fnref command within \author or \address for footnotes;
89 %% use the fntext command for theassociated footnote;
90 %% use the corref command within \author for corresponding author footnotes;
91 %% use the cortext command for theassociated footnote;
92 %% use the ead command for the email address,
93 %% and the form \ead[url] for the home page:
94 %% \title{Title\tnoteref{label1}}
95 %% \tnotetext[label1]{}
96 %% \author{Name\corref{cor1}\fnref{label2}}
97 %% \ead{email address}
98 %% \ead[url]{home page}
101 %% \address{Address\fnref{label3}}
104 \title{Improved Algorithms for Matching Biological Graphs}
106 %% use optional labels to link authors explicitly to addresses:
107 %% \author[label1,label2]{}
108 %% \address[label1]{}
109 %% \address[label2]{}
111 \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi}
113 \address{Dept of Operations Research, ELTE}
116 Subgraph isomorphism is a well-known NP-Complete problem, while its
117 special case, the graph isomorphism problem is one of the few problems
118 in NP neither known to be in P nor NP-Complete. Their appearance in
119 many fields of application such as pattern analysis, computer vision
120 questions and the analysis of chemical and biological systems has
121 fostered the design of various algorithms for handling special graph
124 The idea of using state space representation and checking some
125 conditions in each state to prune the search tree has made the VF2
126 algorithm one of the state of the art graph matching algorithms for
127 more than a decade. Recently, biological questions of ever increasing
128 importance have required more efficient, specialized algorithms.
130 This paper presents VF2++, a new algorithm based on the original VF2,
131 which runs significantly faster on most test cases and performs
132 especially well on special graph classes stemming from biological
133 questions. VF2++ handles graphs of thousands of nodes in practically
134 near linear time including preprocessing. Not only is it an improved
135 version of VF2, but in fact, it is by far the fastest existing
136 algorithm regarding biological graphs.
138 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking
139 into account the structure and the node labeling of the graph, VF2++
140 determines a state order in which most of the unfruitful branches of
141 the search space can be pruned immediately. Secondly, introducing more
142 efficient - nevertheless still easier to compute - cutting rules
143 reduces the chance of going astray even further.
145 In addition to the usual subgraph isomorphism, specialized versions
146 for induced subgraph isomorphism and for graph isomorphism are
147 presented. VF2++ has gained a runtime improvement of one order of
148 magnitude respecting induced subgraph isomorphism and a better
149 asymptotical behaviour in the case of graph isomorphism problem.
151 After having provided the description of VF2++, in order to evaluate
152 its effectiveness, an extensive comparison to the contemporary other
153 algorithms is shown, using a wide range of inputs, including both real
154 life biological and chemical datasets and standard randomly generated
157 The work was motivated and sponsored by QuantumBio Inc., and all the
158 developed algorithms are available as the part of the open source
159 LEMON graph and network optimization library
160 (http://lemon.cs.elte.hu).
164 %% keywords here, in the form: keyword \sep keyword
166 %% PACS codes here, in the form: \PACS code \sep code
168 %% MSC codes here, in the form: \MSC code \sep code
169 %% or \MSC[2008] code \sep code (2000 is the default)
178 \section{Introduction}
181 In the last decades, combinatorial structures, and especially graphs
182 have been considered with ever increasing interest, and applied to the
183 solution of several new and revised questions. The expressiveness,
184 the simplicity and the studiedness of graphs make them practical for
185 modelling and appear constantly in several seemingly independent
186 fields. Bioinformatics and chemistry are amongst the most relevant
187 and most important fields.
189 Complex biological systems arise from the interaction and cooperation
190 of plenty of molecular components. Getting acquainted with such
191 systems at the molecular level has primary importance, since
192 protein-protein interaction, DNA-protein interaction, metabolic
193 interaction, transcription factor binding, neuronal networks, and
194 hormone signaling networks can be understood only this way.
196 For instance, a molecular structure can be considered as a graph,
197 whose nodes correspond to atoms and whose edges to chemical bonds. The
198 secondary structure of a protein can also be represented as a graph,
199 where nodes are associated with aminoacids and the edges with hydrogen
200 bonds. The nodes are often whole molecular components and the edges
201 represent some relationships among them. The similarity and
202 dissimilarity of objects corresponding to nodes are incorporated to
203 the model by \emph{node labels}. Many other chemical and biological
204 structures can easily be modeled in a similar way. Understanding such
205 networks basically requires finding specific subgraphs, which can not
206 avoid the application of graph matching algorithms.
208 Finally, let some of the other real-world fields related to some
209 variants of graph matching be briefly mentioned: pattern recognition
210 and machine vision \cite{HorstBunkeApplications}, symbol recognition
211 \cite{CordellaVentoSymbolRecognition}, face identification
212 \cite{JianzhuangYongFaceIdentification}. \\
214 Subgraph and induced subgraph matching problems are known to be
215 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is
216 one of the few problems in NP neither known to be in P nor
217 NP-Complete. Although polynomial time isomorphism algorithms are known
218 for various graph classes, like trees and planar
219 graphs\cite{PlanarGraphIso}, bounded valence
220 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
221 or permutation graphs\cite{PermGraphIso}.
223 In the following, some algorithms based on other approaches are
224 summarized, which do not need any restrictions on the graphs. However,
225 an overall polynomial behaviour is not expectable from such an
226 alternative, it may often have good performance, even on a graph class
227 for which polynomial algorithm is known. Note that this summary
228 containing only exact matching algorithms is far not complete, neither
229 does it cover all the recent algorithms.
231 The first practically usable approach was due to
232 Ullmann\cite{Ullmann} which is a commonly used depth-first
233 search based algorithm with a complex heuristic for reducing the
234 number of visited states. A major problem is its $\Theta(n^3)$ space
235 complexity, which makes it impractical in the case of big sparse
238 In a recent paper, Ullmann\cite{UllmannBit} presents an
239 improved version of this algorithm based on a bit-vector solution for
240 the binary Constraint Satisfaction Problem.
242 The Nauty algorithm\cite{Nauty} transforms the two graphs to
243 a canonical form before starting to check for the isomorphism. It has
244 been considered as one of the fastest graph isomorphism algorithms,
245 although graph categories were shown in which it takes exponentially
246 many steps. This algorithm handles only the graph isomorphism problem.
248 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search
249 strategy and formulates the matching as a Constraint Satisfaction
250 Problem to prune the search tree. The constraints are that the mapping
251 has to be injective and edge-preserving, hence it is possible to
252 handle new matching types as well.
254 The \textbf{RI} algorithm\cite{RI} and its variations are based on a
255 state space representation. After reordering the nodes of the graphs,
256 it uses some fast executable heuristic checks without using any
257 complex pruning rules. It seems to run really efficiently on graphs
258 coming from biology, and won the International Contest on Pattern
259 Search in Biological Databases\cite{Content}.
261 The currently most commonly used algorithm is the
262 \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was
263 designed for solving pattern matching and computer vision problems,
264 and has been one of the best overall algorithms for more than a
265 decade. Although, it can't be up to new specialized algorithms, it is
266 still widely used due to its simplicity and space efficiency. VF2 uses
267 a state space representation and checks some conditions in each state
268 to prune the search tree.
270 Our first graph matching algorithm was the first version of VF2 which
271 recognizes the significance of the node ordering, more opportunities
272 to increase the cutting efficiency and reduce its computational
273 complexity. This project was initiated and sponsored by QuantumBio
274 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
275 code --- has been published as a part of LEMON\cite{LEMON} open source
278 This paper introduces \textbf{VF2++}, a new further improved algorithm
279 for the graph and (induced)subgraph isomorphism problem, which uses
280 efficient cutting rules and determines a node order in which VF2 runs
281 significantly faster on practical inputs.
283 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has
284 been published. It is considered to be as efficient as the RI
285 algorithm and has a strictly better behavior on large graphs. The
286 main idea of VF2 Plus is to precompute a heuristic node order of the
287 small graph, in which the VF2 works more efficiently.
289 \section{Problem Statement}
290 This section provides a detailed description of the problems to be
292 \subsection{Definitions}
294 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and
295 $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs.
296 \begin{definition}\label{sec:ismorphic}
297 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M:
298 V_{small} \longrightarrow V_{large}$ bijection, for which the
301 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
302 (M(u),M(v))\in{E_{large}}$
305 For the sake of simplicity in this paper subgraphs and induced
306 subgraphs are defined in a more general way than usual:
308 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I:
309 V_{small}\longrightarrow V_{large}$ injection, for which the
312 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$
317 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists
318 I: V_{small}\longrightarrow V_{large}$ injection, for which the
321 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
322 (I(u),I(v))\in E_{large}$
327 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node
328 label function}, where K is an arbitrary set. The elements in K
329 are the \textbf{node labels}. Two nodes, u and v are said to be
330 \textbf{equivalent}, if $lab(u)=lab(v)$.
333 When node labels are also given, the matched nodes must have the same
334 labels. For example, the node labeled isomorphism is phrased by
336 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label
337 function lab} if $\exists M: V_{small} \longrightarrow V_{large}$
338 bijection, for which the following is true:
340 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
341 (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} :
346 The other two definitions can be extended in the same way.
348 Note that edge label function can be defined similarly to node label
349 function, and all the definitions can be extended with additional
350 conditions, but it is out of the scope of this work.
352 The equivalence of two nodes is usually defined by another relation,
353 $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the
354 definition given above if R is an equivalence relation, which does not
355 mean restriction in biological and chemical applications.
357 \subsection{Common problems}\label{sec:CommProb}
359 The focus of this paper is on two extensively studied topics, the
360 subgraph isomorphism and its variations. However, the following
361 problems also appear in many applications.
363 The \textbf{subgraph matching problem} is the following: is
364 $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node
367 The \textbf{induced subgraph matching problem} asks the same about the
368 existence of an induced subgraph.
370 The \textbf{graph isomorphism problem} can be defined as induced
371 subgraph matching problem where the sizes of the two graphs are equal.
373 In addition to existence, it may be needed to show such a subgraph, or
374 it may be necessary to list all of them.
376 It should be noted that some authors misleadingly refer to the term
377 \emph{subgraph isomorphism problem} as an \emph{induced subgraph
378 isomorphism problem}.
380 The following sections give the descriptions of VF2, VF2++, VF2 Plus
381 and a particular comparison.
383 \section{The VF2 Algorithm}
384 This algorithm is the basis of both the VF2++ and the VF2 Plus. VF2
385 is able to handle all the variations mentioned in Section
386 \ref{sec:CommProb}. Although it can also handle directed graphs,
387 for the sake of simplicity, only the undirected case will be
391 \subsection{Common notations}
392 \indent Assume $G_{small}$ is searched in $G_{large}$. The following
393 definitions and notations will be used throughout the whole paper.
395 A set $M\subseteq V_{small}\times V_{large}$ is called
396 \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears
397 in more than one pair in M. That is, M uniquely associates some of
398 the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice
403 Mapping M \textbf{covers} a node v, if there exists a pair in M, which
408 A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the
409 nodes in $V_{small}$.
413 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in
414 V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in
415 V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$.
419 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node
420 exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$
421 and $v\in V_{small}\cup V_{large}$.
424 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique
426 The definitions of the isomorphism types can be rephrased on the
427 existence of a special whole mapping $M$, since it represents a
428 bijection. For example
430 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph
431 isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v
432 \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
433 (Pair(M,u),Pair(M,v))\in E_{large}$.
437 A set of whole mappings is called \textbf{problem type}.
439 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
440 which can be substituted by any problem type.
442 A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using
443 this notations, VF2 searches a whole mapping $W$ of type $PT$.
445 For example the problem type of graph isomorphism problem is the
446 following. A whole mapping $W$ is in $\mathbf{ISO}$, iff the
447 bijection represented by $W$ satisfies Definition~\ref{sec:ismorphic}.
448 The subgraph- and induced subgraph matching problems can be formalized
449 in a similar way. Let their problem types be denoted as $\mathbf{SUB}$
454 $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in
455 PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow
456 (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of
457 $G_{small}$ has to be mapped to an edge of $G_{large}$ for each
461 Note that $ISO$, $SUB$ and $IND$ are expanding problem types.
463 This paper deals with the three problem types mentioned above only,
464 but the following generic definitions make it possible to handle other
465 types as well. Although it may be challenging to find a proper
466 consistency function and an efficient cutting function.
469 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
470 \textbf{consistency function by } $\mathbf{PT}$, if the following
471 holds. If there exists whole mapping $W$ of type PT for which
472 $M\subseteq W$, then $Cons_{PT}(M)$ is true.
476 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
477 \textbf{cutting function by } $\mathbf{PT}$, if the following
478 holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a
479 whole mapping W of type PT.
483 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if
484 $Cons_{PT}(M)$ is true.
487 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
489 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and
490 $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where
491 $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping.
494 $Cons_{PT}$ will be used to check the consistency of the already
495 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
496 no whole consistent mapping can contain the current mapping.
498 \subsection{Overview of the algorithm}
499 VF2 uses a state space representation of mappings, $Cons_{PT}$ for
500 excluding inconsistency with the problem type and $Cut_{PT}$ for
501 pruning the search tree. Each state $s$ of the matching process can
502 be associated with a mapping $M(s)$.
504 Algorithm~\ref{alg:VF2Pseu} is a high level description of
505 the VF2 matching algorithm.
509 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
510 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson ..
511 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
512 \begin{algorithmic}[1]
514 \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers
515 $V_{small}$} \State Output($M(s)$) \Else
517 \State Compute the set $P(s)$ of the pairs candidate for inclusion
518 in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$
519 $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state
520 $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call}
521 VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
526 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it
527 starts with an empty mapping.
529 For each state $s$, the algorithm computes $P(s)$, the set of
530 candidate node pairs for adding to the current state $s$.
532 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and
533 $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and
534 $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup
535 \{p\}$ is computed, and the whole process is recursively applied to
536 $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it
537 can be proved that $s$ can not be extended to a whole mapping.
539 In order to make sure of the correctness see
541 Through consistent mappings, only consistent whole mappings can be
542 reached, and all of the whole mappings are reachable through
546 Note that a state may be reached in many different ways, since the
547 order of insertions into M does not influence the nascent mapping. In
548 fact, the number of different ways which lead to the same state can be
549 exponentially large. If $G_{small}$ and $G_{large}$ are circles with n
550 nodes and n different node labels, there exists exactly one graph
551 isomorphism between them, but it will be reached in $n!$ different
554 However, one may observe
557 \label{claim:claimTotOrd}
558 Let $\prec$ an arbitrary total ordering relation on $V_{small}$. If
559 the algorithm ignores each $p=(u,v) \in P(s)$, for which
561 $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$,
563 then no state can be reached more than ones and each state associated
564 with a whole mapping remains reachable.
567 Note that the cornerstone of the improvements to VF2 is a proper
568 choice of a total ordering.
570 \subsection{The candidate set P(s)}
571 \label{candidateComputingVF2}
572 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.
573 Suppose that $PT$ is an expanding problem type, see
574 Definition~\ref{expPT}.
577 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by
578 $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}}
579 \wedge \tilde{u}$ is covered by $M(s)\}$, and
580 \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not
582 $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}}
583 \wedge \tilde{v}$ is covered by $M(s)\}$
586 The set $P(s)$ includes the pairs of uncovered neighbours of covered
587 nodes and if there is not such a node pair, all the pairs containing
588 two uncovered nodes are added. Formally, let
592 T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if }
593 T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq
594 \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s))
595 &\hspace{-0.15cm}otherwise.
599 \subsection{Consistency}
600 This section defines the consistency functions for the different
601 problem types mentioned in Section~\ref{sec:CommProb}.
603 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} :
604 (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large}
605 (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$
607 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is
608 a state of the matching procedure, $M(s)$ is consistent mapping by
609 $PT$ and $lab(u)=lab(v)$. $Cons_{PT}(p,M(s))$ checks whether
610 including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$.
612 \subsubsection{Induced subgraph isomorphism}
613 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow
614 (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small}
615 \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline The
616 following formulation gives an efficient way of calculating
619 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v)
620 \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge
621 (\forall \tilde{u}\in \Gamma_{small}(u)
622 \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
623 consistency function in the case of $IND$.
626 \subsubsection{Graph isomorphism}
627 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$
628 $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by
631 $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and
632 only if it is a consistency function by $IND$.
634 \subsubsection{Subgraph isomorphism}
635 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow
636 (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small}
637 \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.
639 The following formulation gives an efficient way of calculating
642 $Cons_{SUB}((u,v),M(s)):= (\forall \tilde{u}\in \Gamma_{small}(u)
643 \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
644 consistency function by $SUB$.
647 \subsection{Cutting rules}
648 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently
649 verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can
650 be true only if it is impossible to extended $M(s)\cup \{p\}$ to a
654 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash
655 M_{small}(s))\backslash T_{small}(s)$, and
656 \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash
657 M_{large}(s))\backslash T_{large}(s)$.
659 \subsubsection{Induced subgraph isomorphism}
661 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
662 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
663 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap
664 \tilde{T}_{small}(s)|$ is a cutting function by $IND$.
666 \subsubsection{Graph isomorphism}
667 Note that the cutting function of induced subgraph isomorphism defined
668 above is a cutting function by $ISO$, too, however it is less
669 efficient than the following while their computational complexity is
672 $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq
673 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
674 \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap
675 \tilde{T}_{small}(s)|$ is a cutting function by $ISO$.
678 \subsubsection{Subgraph isomorphism}
680 $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
681 |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by
684 Note that there is a significant difference between induced and
685 non-induced subgraph isomorphism:
689 $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
690 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
691 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$
692 is \textbf{not} a cutting function by $SUB$.
700 [scale=.8,auto=left,every node/.style={circle,fill=black!15}]
701 \node[rectangle,fill=black!15] at (4,6) {$G_{small}$}; \node (u4) at
702 (2.5,10) {$u_4$}; \node (u3) at (5.5,10) {$u_3$}; \node (u1) at
703 (2.5,7) {$u_1$}; \node (u2) at (5.5,7) {$u_2$};
705 \node[rectangle,fill=black!30] at (13.5,6) {$G_{large}$};
706 \node[fill=black!30] (v4) at (12,10) {$v_4$}; \node[fill=black!30]
707 (v3) at (15,10) {$v_3$}; \node[fill=black!30] (v1) at (12,7)
708 {$v_1$}; \node[fill=black!30] (v2) at (15,7) {$v_2$};
711 \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1} \draw (\from) --
712 (\to); \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3} \draw
714 % \draw[dashed] (\from) -- (\to);
716 \caption{Graphs for the proof of Claim~\ref{claimSUB}}
720 Let the two graphs of Figure~\ref{fig:proofSUB} be the input
721 graphs. Suppose the total ordering relation is $u_1 \prec u_2 \prec
722 u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add
723 $(u_2,v_2)\in P(s)$.\newline $Cons_{SUB}((u_2,v_2),M(s))=true$, so
724 $M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function
725 $Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the
726 tree.\newline On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true,
727 since\\$0=|\Gamma_{large}(v_2)\cap
728 \tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap
729 \tilde{T}_{small}(s)|=1$ is true, but still the tree can not be
730 pruned, because otherwise the
731 $\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found.
734 \section{The VF2++ Algorithm}
735 Although any total ordering relation makes the search space of VF2 a
736 tree, its choice turns out to dramatically influence the number of
737 visited states. The goal is to determine an efficient one as quickly
740 The main reason for VF2++' superiority over VF2 is twofold. Firstly,
741 taking into account the structure and the node labeling of the graph,
742 VF2++ determines a state order in which most of the unfruitful
743 branches of the search space can be pruned immediately. Secondly,
744 introducing more efficient --- nevertheless still easier to compute
745 --- cutting rules reduces the chance of going astray even further.
747 In addition to the usual subgraph isomorphism, specialized versions
748 for induced subgraph isomorphism and for graph isomorphism have been
749 designed. VF2++ has gained a runtime improvement of one order of
750 magnitude respecting induced subgraph isomorphism and a better
751 asymptotical behaviour in the case of graph isomorphism problem.
753 Note that a weaker version of the cutting rules and the more efficient
754 candidate set calculating were described in \cite{VF2Plus}, too.
756 It should be noted that all the methods described in this section are
757 extendable to handle directed graphs and edge labels as well.
759 The basic ideas and the detailed description of VF2++ are provided in
762 \subsection{Preparations}
764 \label{claim:claimCoverFromLeft}
765 The total ordering relation uniquely determines a node order, in which
766 the nodes of $V_{small}$ will be covered by VF2. From the point of
767 view of the matching procedure, this means, that always the same node
768 of $G_{small}$ will be covered on the d-th level.
771 In order to make the search space a tree, the pairs in $\{(u,v)\in
772 P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$.
774 Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} :
777 The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} :
778 \forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form
779 $\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious,
780 that only $\tilde{u}$ can be covered in $V_{small}$. Actually,
781 $\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in
782 $V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and
783 $T_{small}(s)$ depends only on the covered nodes of $G_{small}$.
785 Simple induction on $d$ shows that the set of covered nodes of
786 $G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if
791 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
792 $V_{small}$ is \textbf{matching order}, if exists $\prec$ total
793 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
794 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$.
797 \begin{claim}\label{claim:MOclaim}
798 A total ordering is matching order, iff the nodes of every component
799 form an interval in the node sequence, and every node connects to a
800 previous node in its component except the first node of the
801 component. The order of the components is arbitrary. \\Formally
803 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
804 $V_{small}$ is matching order $\Leftrightarrow$ $\forall
805 G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall
806 i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
807 V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq
808 l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge
809 (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in
810 \{1,..,|V_{small}|\}$\newline
813 Suppose a matching order is given. It has to be shown that the node
814 sequence has a structure described above.\\ Let
815 $G'_{small}=(V'_{small},E'_{small})$ be an arbitrary component and $i$
818 $(\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
819 V'_{small})\Rightarrow u_{\sigma(i)}$ is not the first covered node in
820 $G_{small}\ \Rightarrow u_{\sigma(i)}$ is connected to a covered node
821 $u_{\sigma(k)}$ where $k<i$, since $u_{\sigma(i)}\in T_{small}(s)$ for
822 some $s$ and $T_{small}(s)\subseteq{V'_{small}}$ contains only nodes
823 connected to at least one covered node. It's easy to see, that
824 $\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}$, since
825 $T_{small}(s)$ contains only nodes connected to some covered ones,
826 while it is not empty, but if it were empty, then $u_{\sigma(k)}$ and
827 $u_{\sigma(i)}$ were not in the same component.
829 Now, let us show that if a node sequence has the special structure
830 described above, it has to be matching
831 order.\\ $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ is
832 a total ordering, which determines the matching order
833 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$.
836 To summing up, a total ordering always uniquely determines a matching
837 order, and every matching order can be determined by a total ordering,
838 however, more than one different total orderings may determine the
840 \subsection{Idea behind the algorithm}
841 The goal is to find a matching order in which the algorithm is able to
842 recognize inconsistency or prune the infeasible branches on the
843 highest levels and goes deep only if it is needed.
846 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the
847 number of neighbours of u which are in H, where $u\in V_{small} $ and
848 $H\subseteq V_{small}$.
851 The principal question is the following. Suppose a state $s$ is
852 given. For which node of $T_{small}(s)$ is the hardest to find a
853 consistent pair in $G_{large}$? The more covered neighbours a node in
854 $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has
855 ---, the more rarely satisfiable consistency constraints for its pair
858 In biology, most of the graphs are sparse, thus several nodes in
859 $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes
860 reasonable to define a secondary and a tertiary order between them.
861 The observation above proves itself to be as determining, that the
862 secondary ordering prefers nodes with the most uncovered neighbours
863 among which have the same $Conn_{M_{small}(s)}$ to increase
864 $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible. The
865 tertiary ordering prefers nodes having the rarest uncovered labels.
867 Note that the secondary ordering is the same as the ordering by $deg$,
868 which is a static data in front of the above used.
870 These rules can easily result in a matching order which contains the
871 nodes of a long path successively, whose nodes may have low $Conn$ and
872 is easily matchable into $G_{large}$. To avoid that, a BFS order is
873 used, which provides the shortest possible paths.
876 In the following, some examples on which the VF2 may be slow are
877 described, although they are easily solvable by using a proper
881 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways
882 without node labels. Let $u\in V_{small}$ and $v\in V_{large}$.
888 $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash
891 $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash
895 Now, any mapping by the node label $lab$ must contain $(u,v)$, since
896 $u$ is black and no node in $V_{large}$ has a black label except
897 $v$. If unfortunately $u$ were the last node which will get covered,
898 VF2 would check only in the last steps, whether $u$ can be matched to
901 However, had $u$ been the first matched node, u would have been
902 matched immediately to v, so all the mappings would have been
903 precluded in which node labels can not correspond.
907 Suppose there is no node label given, $G_{small}$ is a small graph and
908 can not be mapped into $G_{large}$ and $u\in V_{small}$.
910 Let $G'_{small}:=(V_{small}\cup
911 \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup
912 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
913 $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint
914 from $G_{small}$ and one of its starting points is connected to $u\in
917 Is there a subgraph of $G_{large}$, which is isomorph with
920 If unfortunately the nodes of the path were the first $k$ nodes in the
921 matching order, the algorithm would iterate through all the possible k
922 long paths in $G_{large}$, and it would recognize that no path can be
923 extended to $G'_{small}$.
925 However, had it started by the matching of $G_{small}$, it would not
926 have matched any nodes of the path.
929 These examples may look artificial, but the same problems also appear
930 in real-world examples, even though in a less obvious way.
932 \subsection{Total ordering}
933 Instead of the total ordering relation, the matching order will be
936 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} :
937 l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ ,
938 where $l$ is a label and $\mathcal{M}\subseteq V_{small}$.
941 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
945 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
946 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson ..
948 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
949 \begin{algorithmic}[1]
950 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$
951 \Comment{matching order} \While{$V_{small}\backslash \mathcal{M}
952 \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
953 min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash
954 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
955 root node $r$. \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
956 $d$-th level \State Process $V_d$ \Comment{See Algorithm
957 \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2}} \EndFor
958 \EndWhile \EndProcedure
963 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
964 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson ..
966 \caption{\hspace{.5cm}$A\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
967 \begin{algorithmic}[1]
968 \Procedure{VF2++ProcessLevel1}{$V_{d}$} \While{$V_d\neq\emptyset$}
969 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg
970 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
971 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
972 $F_\mathcal{M}$ \EndWhile \EndProcedure
977 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
978 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson ..
980 \caption{\hspace{0.5cm}$Another\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess2}
981 \begin{algorithmic}[1]
982 \Procedure{VF2++ProcessLevel2}{$V_{d}$} \State Sort $V_d$ in
983 descending lex. order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$
984 \State Append the sorted $V_d$ to the end of $\mathcal{M}$ \State
985 Refresh $F_\mathcal{M}$ \EndProcedure
988 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
989 matching order procedure of VF2++. It computes a BFS tree for each
990 component in ascending order of their rarest $lab$ and largest $deg$,
991 whose root vertex is the component's minimal
992 node. Algorithms~\ref{alg:VF2PPProcess1} and \ref{alg:VF2PPProcess2}
993 are two different methods to process a level of the BFS tree.
995 After sorting the nodes of the current level in descending
996 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$,
997 Algorithm~\ref{alg:VF2PPProcess2} appends them simultaneously to the
998 matching order $\mathcal{M}$ and refreshes $F_\mathcal{M}$ just once,
999 whereas Algorithm~\ref{alg:VF2PPProcess1} appends the nodes separately
1000 to $\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it
1001 provides up-to-date label information and may result in a more
1002 efficient matching order.
1004 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}
1005 provides a matching order.
1008 \subsection{Cutting rules}
1009 \label{VF2PPCuttingRules}
1010 This section presents the cutting rules of VF2++, which are improved
1011 by using extra information coming from the node labels.
1013 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l
1014 \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and
1015 $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge
1016 \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in
1017 V_{large}$ and $l$ is a label.
1020 \subsubsection{Induced subgraph isomorphism}
1022 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND.
1025 It has to be shown, that $LabCut_{IND}((u,v),M(s))=true\Rightarrow$
1026 the mapping can not be extended to a whole
1027 mapping.\\ $LabCut_{IND}((u,v),M(s))=true,$ iff the following
1028 holds. $\\\exists l: |\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| <
1029 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)| \vee
1030 |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| <
1031 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$.
1033 Suppose that $|\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| <
1034 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)|$. Each node of
1035 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$ has to be matched to a
1036 node in $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$, so
1037 $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$ can not be smaller than
1038 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$. That is why $M(s)$ can
1039 not be extended to a whole mapping.
1041 Otherwise $|\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| <
1042 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$ has to be
1043 true. Similarly, each node of $\Gamma_{large}^{l}(v)\cap
1044 \tilde{T}_{large}(s)$ has to be matched to a node in
1045 $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$,
1046 i.e. $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ can not be
1047 smaller than $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, if
1048 $M(s)$ is extendible.
1050 The following claims can be proven similarly.
1051 \subsubsection{Graph isomorphism}
1053 \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO.
1056 \subsubsection{Subgraph isomorphism}
1058 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\] is a cutting function by SUB.
1063 \subsection{Implementation details}
1064 This section provides a detailed summary of an efficient
1065 implementation of VF2++.
1066 \subsubsection{Storing a mapping}
1067 After fixing an arbitrary node order ($u_0, u_1, ..,
1068 u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store
1069 the current mapping in the following way.
1073 v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INVALID &
1074 if\ no\ node\ has\ been\ mapped\ to\ u_i.
1077 Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$
1079 \subsubsection{Avoiding the recurrence}
1080 Exploring the state space was described in a recursive fashion using
1081 sets (see Algorithm~\ref{alg:VF2Pseu}), which makes the
1082 algorithm easy to understand but does not show directly an efficient
1083 way of implementation. The following approach avoiding recursion and
1084 using lookup tables instead of sets is not only fast but has linear
1085 space complexity as well.
1087 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
1088 as a while loop, which has a loop counter $depth$ denoting the
1089 all-time depth of the recursion. Fixing a matching order, let $M$
1090 denote the array storing the all-time mapping. The initial state is
1091 associated with the empty mapping, which means that $\forall i:
1092 M[i]=INVALID$ and $depth=0$. In case of a recursive call, $depth$ has
1093 to be incremented, while in case of a return, it has to be
1094 decremented. Based on Claim~\ref{claim:claimCoverFromLeft},
1095 $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before
1096 $depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and
1097 $\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes
1098 while the state is being processed, but the property is held before
1099 both stepping back to a predecessor state and exploring a successor
1102 The necessary part of the candidate set is easily maintainable or
1103 computable by following
1104 Section~\ref{candidateComputingVF2}. A much faster method
1105 has been designed for biological- and sparse graphs, see the next
1106 section for details.
1108 \subsubsection{Calculating the candidates for a node}
1109 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
1110 task is not to maintain the candidate set, but to generate the
1111 candidate nodes in $G_{large}$ for a given node $u\in V_{small}$. In
1112 case of an expanding problem type and $M$ mapping, if a node $v\in
1113 V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall
1114 u'\in V_{small} : (u,u')\in
1115 E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in
1116 E_{large}$. That is, each covered neighbour of $u$ has to be mapped to
1117 a covered neighbour of $v$.
1119 Having said that, an algorithm running in $\Theta(deg)$ time is
1120 describable if there exists a covered node in the component containing
1121 $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily
1122 --- such a node exists based on
1123 Claim~\ref{claim:MOclaim}. With all the candidates of $u$
1124 being among the uncovered neighbours of $Pair(M,u')$, there are solely
1125 $deg(Pair(M,u'))$ nodes to check.
1127 An easy trick is to choose an $u'$, for which
1128 $|\{uncovered\ neighbours\ $ $of\ Pair(M,u')\}|$ is the smallest
1131 Note that if $u$ is the first node of its component, then all the
1132 uncovered nodes of $G_{large}$ are candidates, so giving a sublinear
1133 method is impossible.
1136 \subsubsection{Determining the node order}
1137 This section describes how the node order preprocessing method of
1138 VF2++ can efficiently be implemented.
1140 For using lookup tables, the node labels are associated with the
1141 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
1142 enables $F_\mathcal{M}$ to be stored in an array, for which
1143 $F_\mathcal{M}[i]=F_\mathcal{M}(i)$ where $i=0,1,..,|K|-1$. At first,
1144 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
1145 in $V_{small}$ having label i, which is easy to compute in
1146 $\Theta(|V_{small}|)$ steps.
1148 $\mathcal{M}\subseteq V_{small}$ can be represented as an array of
1151 The BFS tree is computed by using a FIFO data structure which is
1152 usually implemented as a linked list, but one can avoid it by using
1153 the array $\mathcal{M}$ itself. $\mathcal{M}$ contains all the nodes
1154 seen before, a pointer shows where the first node of the FIFO is, and
1155 another one shows where the next explored node has to be inserted. So
1156 the nodes of each level of the BFS tree can be processed by
1157 Algorithms~\ref{alg:VF2PPProcess1} and
1158 \ref{alg:VF2PPProcess2} in place by swapping nodes.
1160 After a node $u$ gets to the next place of the node order,
1161 $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is
1162 one less covered node in $V_{large}$ with label $lab(u)$, that is why
1163 min selection sort is preferred which gives the elements from left to
1164 right in descending order, see Algorithm~\ref{alg:VF2PPProcess1}.
1166 Note that using a $\Theta(n^2)$ sort absolutely does not slow down the
1167 procedure on biological (and on sparse) graphs, since they have few
1168 nodes on a level. If a level had a large number of nodes,
1169 Algorithm~\ref{alg:VF2PPProcess2} would seem to be a better
1170 choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the
1171 efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not
1172 be immediately refreshed, so it is unable to provide up-to-date label
1175 Note that the \textit{while loop} of Algorithm~\ref{alg:VF2PPPseu}
1176 takes one iteration per graph component and the graphs in biology are
1179 \subsubsection{Cutting rules}
1180 In Section~\ref{VF2PPCuttingRules}, the cutting rules were
1181 described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$
1182 and $\tilde T_{large}$, which are dependent on the all-time mapping
1183 (i.e. on the all-time state). The aim is to check the labeled cutting
1184 rules of VF2++ in $\Theta(deg)$ time.
1186 Firstly, suppose that these four sets are given in such a way, that
1187 checking whether a node is in a certain set takes constant time,
1188 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
1189 initially zero integer lookup table of size $|K|$. After incrementing
1190 $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and
1191 decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap
1192 T_{large}(s)$, the first part of the cutting rules is checkable in
1193 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
1194 to zero takes $\Theta(deg)$ time again, which makes it possible to use
1195 the same table through the whole algorithm. The second part of the
1196 cutting rules can be verified using the same method with $\tilde
1197 T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and
1198 $T_{large}$. Thus, the overall complexity is $\Theta(deg)$.
1200 An other integer lookup table storing the number of covered neighbours
1201 of each node in $G_{large}$ gives all the information about the sets
1202 $T_{large}$ and $\tilde T_{large}$, which is maintainable in
1203 $\Theta(deg)$ time when a pair is added or substracted by incrementing
1204 or decrementing the proper indices. A further improvement is that the
1205 values of $L[lab(u')]$ in case of checking $u$ is dependent only on
1206 $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an
1207 array of pairs (label, number of such labels) can be stored to skip
1208 the maintaining operations. Note that these arrays are at most of size
1209 $deg$. Skipping this trick, the number of covered neighbours has to be
1210 stored for each node of $G_{small}$ as well to get the sets
1211 $T_{small}$ and $\tilde T_{small}$.
1213 Using similar tricks, the consistency function can be evaluated in
1214 $\Theta(deg)$ steps, as well.
1216 \section{The VF2 Plus Algorithm}
1217 The VF2 Plus algorithm is a recently improved version of VF2. It was
1218 compared with the state of the art algorithms in \cite{VF2Plus} and
1219 has proven itself to be competitive with RI, the best algorithm on
1220 biological graphs. \\ A short summary of VF2 Plus follows, which uses
1221 the notation and the conventions of the original paper.
1223 \subsection{Ordering procedure}
1224 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$
1225 with the lowest probability to find a pair in $V_{small}$ and the
1226 highest number of connections with the nodes already sorted by the
1230 $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and
1231 $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$.
1233 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in
1236 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in
1239 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of
1240 already sorted nodes, $T$ is the set of nodes candidate to be
1241 selected, and $degreeM$ of a node is the number of its neighbours in
1244 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
1245 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson ..
1247 \caption{}\label{alg:VF2PlusPseu}
1248 \begin{algorithmic}[1]
1249 \Procedure{VF2 Plus order}{} \State Select the node with the lowest
1250 $P$. \If {more nodes share the same $P$} \State select the one with
1251 maximum degree \EndIf \If {more nodes share the same $P$ and have the
1252 max degree} \State select the first \EndIf \State Put the selected
1253 node in the set $M$. \label{alg:putIn} \State Put all its unsorted
1254 neighbours in the set $T$. \If {$M\neq V_{small}$} \State From set
1255 $T$ select the node with maximum $degreeM$. \If {more nodes have
1256 maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf
1257 \If {more nodes have maximum $degreeM$ and $P$} \State Select the
1258 first. \EndIf \State \textbf{goto \ref{alg:putIn}.} \EndIf
1263 Using these notations, Algorithm~\ref{alg:VF2PlusPseu}
1264 provides the description of the sorting procedure.
1266 Note that $P(u)$ is not the exact probability of finding a consistent
1267 pair for $u$ by choosing a node of $V_{large}$ randomly, since
1268 $P_{lab}$ and $P_{deg}$ are not independent, though calculating the
1269 real probability would take quadratic time, which may be reduced by
1270 using fittingly lookup tables.
1272 \section{Experimental results}
1273 This section compares the performance of VF2++ and VF2 Plus. Both
1274 algorithms have run faster with orders of magnitude than VF2, thus its
1275 inclusion was not reasonable.
1276 \subsection{Biological graphs}
1277 The tests have been executed on a recent biological dataset created
1278 for the International Contest on Pattern Search in Biological
1279 Databases\cite{Content}, which has been constructed of Molecule,
1280 Protein and Contact Map graphs extracted from the Protein Data
1281 Bank\cite{ProteinDataBank}.
1283 The molecule dataset contains small graphs with less than 100 nodes
1284 and an average degree of less than 3. The protein dataset contains
1285 graphs having 500-10 000 nodes and an average degree of 4, while the
1286 contact map dataset contains graphs with 150-800 nodes and an average
1289 In the following, the induced subgraph isomorphism and the graph
1290 isomorphism will be examined.
1292 \subsubsection{Induced subgraph isomorphism}
1293 This dataset contains a set of graph pairs, and \textbf{all} the
1294 induced subgraph ismorphisms have to be found between
1295 them. Figure~\ref{fig:INDProt}, \ref{fig:INDContact}, and
1296 \ref{fig:INDMolecule} show the solution time of the
1297 problems in the problem set.
1302 \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1303 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1304 west},scaled x ticks = false,x tick label style={/pgf/number
1305 format/1000 sep = \thinspace}] %\addplot+[only marks] table
1306 {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]
1307 table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark
1308 size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
1313 \caption{Both the algorithms have linear behaviour on protein
1314 graphs. VF2++ is more than 10 times faster than VF2
1315 Plus.} \label{fig:INDProt}
1321 \begin{axis}[title=Contact Maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1322 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1323 west},scaled x ticks = false,x tick label style={/pgf/number
1324 format/1000 sep = \thinspace}]
1325 %\addplot+[only marks] table {proteinsOrig.txt};
1326 \addplot table {Orig/ContactMaps.128.txt};
1327 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1328 {VF2PPLabel/ContactMaps.128.txt};
1333 \caption{On Contact Maps, VF2++ runs in near constant time, while VF2
1334 Plus has a near linear behaviour.} \label{fig:INDContact}
1340 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1341 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1342 west},scaled x ticks = false,x tick label style={/pgf/number
1343 format/1000 sep = \thinspace}]
1344 %\addplot+[only marks] table {proteinsOrig.txt};
1345 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark
1346 size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
1351 \caption{In the case of Molecules, the algorithms seem to have a
1352 similar behaviour, but VF2++ is almost two times faster even on such
1353 small graphs.} \label{fig:INDMolecule}
1357 \subsubsection{Graph ismorphism}
1358 In this experiment, the nodes of each graph in the database have been
1359 shuffled and an isomorphism between the shuffled and the original
1360 graph has been searched. For runtime results, see
1361 Figure~\ref{fig:ISOProt}, \ref{fig:ISOContact}, and
1362 \ref{fig:ISOMolecule}.
1367 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1368 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1369 west},scaled x ticks = false,x tick label style={/pgf/number
1370 format/1000 sep = \thinspace}]
1371 %\addplot+[only marks] table {proteinsOrig.txt};
1372 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark
1373 size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
1378 \caption{On protein graphs, VF2 Plus has a super linear time
1379 complexity, while VF2++ runs in near constant time. The difference
1380 is about two order of magnitude on large graphs.}\label{fig:ISOProt}
1386 \begin{axis}[title=Contact Maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1387 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1388 west},scaled x ticks = false,x tick label style={/pgf/number
1389 format/1000 sep = \thinspace}]
1390 %\addplot+[only marks] table {proteinsOrig.txt};
1391 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark
1392 size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
1397 \caption{The results are closer to each other on Contact Maps, but
1398 VF2++ still performs consistently better.}\label{fig:ISOContact}
1404 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1405 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1406 west},scaled x ticks = false,x tick label style={/pgf/number
1407 format/1000 sep = \thinspace}]
1408 %\addplot+[only marks] table {proteinsOrig.txt};
1409 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark
1410 size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
1415 \caption{In the case of Molecules, there is not such a significant
1416 difference, but VF2++ seems to be faster as the number of nodes
1417 increases.}\label{fig:ISOMolecule}
1421 \subsection{Random graphs}
1422 This section compares VF2++ with VF2 Plus on random graphs of a large
1423 size. The node labels are uniformly distributed. Let $\delta$ denote
1424 the average degree. For the parameters of problems solved in the
1425 experiments, please see the top of each chart.
1426 \subsubsection{Graph isomorphism}
1427 To evaluate the efficiency of the algorithms in the case of graph
1428 isomorphism, connected graphs of less than 20 000 nodes have been
1429 considered. Generating a random graph and shuffling its nodes, an
1430 isomorphism had to be found. Figure \ref{fig:randISO5},
1431 \ref{fig:randISO10}, \ref{fig:randISO15}, \ref{fig:randISO35},
1432 \ref{fig:randISO45}, and \ref{fig:randISO100} show the runtime results
1433 on graph sets of various density.
1438 \begin{axis}[title={Random ISO, $\delta = 5$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1439 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1440 west},scaled x ticks = false,x tick label style={/pgf/number
1441 format/1000 sep = \thinspace}]
1442 %\addplot+[only marks] table {proteinsOrig.txt};
1443 \addplot table {randGraph/iso/vf2pIso5_1.txt};
1444 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1445 {randGraph/iso/vf2ppIso5_1.txt};
1450 \caption{}\label{fig:randISO5}
1456 \begin{axis}[title={Random ISO, $\delta = 10$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1457 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1458 west},scaled x ticks = false,x tick label style={/pgf/number
1459 format/1000 sep = \thinspace}]
1460 %\addplot+[only marks] table {proteinsOrig.txt};
1461 \addplot table {randGraph/iso/vf2pIso10_1.txt};
1462 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1463 {randGraph/iso/vf2ppIso10_1.txt};
1468 \caption{}\label{fig:randISO10}
1474 \begin{axis}[title={Random ISO, $\delta = 15$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1475 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1476 west},scaled x ticks = false,x tick label style={/pgf/number
1477 format/1000 sep = \thinspace}]
1478 %\addplot+[only marks] table {proteinsOrig.txt};
1479 \addplot table {randGraph/iso/vf2pIso15_1.txt};
1480 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1481 {randGraph/iso/vf2ppIso15_1.txt};
1486 \caption{}\label{fig:randISO15}
1492 \begin{axis}[title={Random ISO, $\delta = 35$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1493 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1494 west},scaled x ticks = false,x tick label style={/pgf/number
1495 format/1000 sep = \thinspace}]
1496 %\addplot+[only marks] table {proteinsOrig.txt};
1497 \addplot table {randGraph/iso/vf2pIso35_1.txt};
1498 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1499 {randGraph/iso/vf2ppIso35_1.txt};
1504 \caption{}\label{fig:randISO35}
1510 \begin{axis}[title={Random ISO, $\delta = 45$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1511 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1512 west},scaled x ticks = false,x tick label style={/pgf/number
1513 format/1000 sep = \thinspace}]
1514 %\addplot+[only marks] table {proteinsOrig.txt};
1515 \addplot table {randGraph/iso/vf2pIso45_1.txt};
1516 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1517 {randGraph/iso/vf2ppIso45_1.txt};
1522 \caption{}\label{fig:randISO45}
1528 \begin{axis}[title={Random ISO, $\delta = 100$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1529 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1530 west},scaled x ticks = false,x tick label style={/pgf/number
1531 format/1000 sep = \thinspace}]
1532 %\addplot+[only marks] table {proteinsOrig.txt};
1533 \addplot table {randGraph/iso/vf2pIso100_1.txt};
1534 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1535 {randGraph/iso/vf2ppIso100_1.txt};
1540 \caption{}\label{fig:randISO100}
1544 Considering the graph isomorphism problem, VF2++ consistently
1545 outperforms its rival especially on sparse graphs. The reason for the
1546 slightly super linear behaviour of VF2++ on denser graphs is the
1547 larger number of nodes in the BFS tree constructed in
1548 Algorithm~\ref{alg:VF2PPPseu}.
1550 \subsubsection{Induced subgraph isomorphism}
1551 This section provides a comparison of VF2++ and VF2 Plus in the case
1552 of induced subgraph isomorphism. In addition to the size of the large
1553 graph, that of the small graph dramatically influences the hardness of
1554 a given problem too, so the overall picture is provided by examining
1555 small graphs of various size.
1557 For each chart, a number $0<\rho< 1$ has been fixed and the following
1558 has been executed 150 times. Generating a large graph $G_{large}$,
1559 choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes,
1560 and for all the 10 subgraphs find a mapping by using both the graph
1561 matching algorithms. The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
1562 0.3, 0.6, 0.8, 0.95$ cases have been examined (see
1563 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
1564 \ref{fig:randIND35}, and for each $\delta$, a cumulative chart is
1565 given as well, which excludes $\rho = 0.05$ and $0.1$ for the sake of
1566 perspicuity (see Figure~\ref{fig:randIND5Sum}, \ref{fig:randIND10Sum}
1567 and \ref{fig:randIND35Sum}).
1575 \begin{subfigure}[b]{0.55\textwidth}
1578 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1579 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1580 west},scaled x ticks = false,x tick label style={/pgf/number
1581 format/1000 sep = \space}]
1582 %\addplot+[only marks] table {proteinsOrig.txt};
1583 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
1584 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1585 {randGraph/ind/vf2ppInd5_0.05.txt};
1590 \begin{subfigure}[b]{0.55\textwidth}
1593 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1594 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1595 west},scaled x ticks = false,x tick label style={/pgf/number
1596 format/1000 sep = \space}]
1597 %\addplot+[only marks] table {proteinsOrig.txt};
1598 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
1599 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1600 {randGraph/ind/vf2ppInd5_0.1.txt};
1606 \begin{subfigure}[b]{0.55\textwidth}
1609 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1610 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1611 west},scaled x ticks = false,x tick label style={/pgf/number
1612 format/1000 sep = \space}]
1613 %\addplot+[only marks] table {proteinsOrig.txt};
1614 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
1615 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1616 {randGraph/ind/vf2ppInd5_0.3.txt};
1621 \begin{subfigure}[b]{0.55\textwidth}
1624 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1625 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1626 west},scaled x ticks = false,x tick label style={/pgf/number
1627 format/1000 sep = \space}]
1628 %\addplot+[only marks] table {proteinsOrig.txt};
1629 \addplot table {randGraph/ind/vf2pInd5_0.6.txt};
1630 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1631 {randGraph/ind/vf2ppInd5_0.6.txt};
1636 \begin{subfigure}[b]{0.55\textwidth}
1639 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1640 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1641 west},scaled x ticks = false,x tick label style={/pgf/number
1642 format/1000 sep = \space}]
1643 %\addplot+[only marks] table {proteinsOrig.txt};
1644 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
1645 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1646 {randGraph/ind/vf2ppInd5_0.8.txt};
1650 \begin{subfigure}[b]{0.55\textwidth}
1652 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1653 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1654 west},scaled x ticks = false,x tick label style={/pgf/number
1655 format/1000 sep = \thinspace}]
1656 %\addplot+[only marks] table {proteinsOrig.txt};
1657 \addplot table {randGraph/ind/vf2pInd5_0.95.txt};
1658 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1659 {randGraph/ind/vf2ppInd5_0.95.txt};
1664 \caption{IND on graphs having an average degree of
1665 5.}\label{fig:randIND5}
1671 \begin{axis}[title={Rand IND Summary, $\delta = 5$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
1672 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
1673 west},scaled x ticks = false,x tick label style={/pgf/number
1674 format/1000 sep = \thinspace}]
1675 %\addplot+[only marks] table {proteinsOrig.txt};
1676 \addplot[mark=*,mark size=1.5pt,color=blue] table
1677 {randGraph/ind/vf2pInd5_0.3.txt}; \addplot[mark=triangle*,mark
1678 size=1.8pt,color=red] table
1679 {randGraph/ind/vf2ppInd5_0.3.txt}; \addplot[mark=*,mark
1680 size=1.5pt,color=blue] table
1681 {randGraph/ind/vf2pInd5_0.6.txt}; \addplot[mark=triangle*,mark
1682 size=1.8pt,color=red] table
1683 {randGraph/ind/vf2ppInd5_0.6.txt}; \addplot[mark=*,mark
1684 size=1.5pt,color=blue] table
1685 {randGraph/ind/vf2pInd5_0.8.txt}; \addplot[mark=triangle*,mark
1686 size=1.8pt,color=red] table
1687 {randGraph/ind/vf2ppInd5_0.8.txt}; \addplot[mark=*,mark
1688 size=1.5pt,color=blue] table
1689 {randGraph/ind/vf2pInd5_0.95.txt};
1690 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1691 {randGraph/ind/vf2ppInd5_0.95.txt};
1696 \caption{Cummulative chart for $\delta=5$.}\label{fig:randIND5Sum}
1703 \begin{subfigure}[b]{0.55\textwidth}
1706 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1707 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1708 west},scaled x ticks = false,x tick label style={/pgf/number
1709 format/1000 sep = \space}]
1710 %\addplot+[only marks] table {proteinsOrig.txt};
1711 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
1712 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1713 {randGraph/ind/vf2ppInd10_0.05.txt};
1718 \begin{subfigure}[b]{0.55\textwidth}
1721 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1722 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1723 west},scaled x ticks = false,x tick label style={/pgf/number
1724 format/1000 sep = \space}]
1725 %\addplot+[only marks] table {proteinsOrig.txt};
1726 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
1727 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1728 {randGraph/ind/vf2ppInd10_0.1.txt};
1734 \begin{subfigure}[b]{0.55\textwidth}
1737 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1738 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1739 west},scaled x ticks = false,x tick label style={/pgf/number
1740 format/1000 sep = \space}]
1741 %\addplot+[only marks] table {proteinsOrig.txt};
1742 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
1743 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1744 {randGraph/ind/vf2ppInd10_0.3.txt};
1749 \begin{subfigure}[b]{0.55\textwidth}
1752 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1753 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1754 west},scaled x ticks = false,x tick label style={/pgf/number
1755 format/1000 sep = \space}]
1756 %\addplot+[only marks] table {proteinsOrig.txt};
1757 \addplot table {randGraph/ind/vf2pInd10_0.6.txt};
1758 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1759 {randGraph/ind/vf2ppInd10_0.6.txt};
1764 \begin{subfigure}[b]{0.55\textwidth}
1767 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1768 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1769 west},scaled x ticks = false,x tick label style={/pgf/number
1770 format/1000 sep = \space}]
1771 %\addplot+[only marks] table {proteinsOrig.txt};
1772 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
1773 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1774 {randGraph/ind/vf2ppInd10_0.8.txt};
1778 \begin{subfigure}[b]{0.55\textwidth}
1780 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1781 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1782 west},scaled x ticks = false,x tick label style={/pgf/number
1783 format/1000 sep = \thinspace}]
1784 %\addplot+[only marks] table {proteinsOrig.txt};
1785 \addplot table {randGraph/ind/vf2pInd10_0.95.txt};
1786 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1787 {randGraph/ind/vf2ppInd10_0.95.txt};
1792 \caption{IND on graphs having an average degree of
1793 10.}\label{fig:randIND10}
1799 \begin{axis}[title={Rand IND Summary, $\delta = 10$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
1800 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
1801 west},scaled x ticks = false,x tick label style={/pgf/number
1802 format/1000 sep = \thinspace}]
1803 %\addplot+[only marks] table {proteinsOrig.txt};
1804 \addplot[mark=*,mark size=1.5pt,color=blue] table
1805 {randGraph/ind/vf2pInd10_0.3.txt};
1806 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1807 {randGraph/ind/vf2ppInd10_0.3.txt};
1808 \addplot[mark=*,mark size=1.5pt,color=blue] table
1809 {randGraph/ind/vf2pInd10_0.6.txt};
1810 \addplot[mark=triangle*,mark
1811 size=1.8pt,color=red] table
1812 {randGraph/ind/vf2ppInd10_0.6.txt};
1813 \addplot[mark=*,mark
1814 size=1.5pt,color=blue] table
1815 {randGraph/ind/vf2pInd10_0.8.txt};
1816 \addplot[mark=triangle*,mark
1817 size=1.8pt,color=red] table
1818 {randGraph/ind/vf2ppInd10_0.8.txt};
1819 \addplot[mark=*,mark
1820 size=1.5pt,color=blue]
1822 {randGraph/ind/vf2pInd10_0.95.txt};
1823 \addplot[mark=triangle*,mark
1824 size=1.8pt,color=red]
1826 {randGraph/ind/vf2ppInd10_0.95.txt};
1831 \caption{Cummulative chart for $\delta=10$.}\label{fig:randIND10Sum}
1838 \begin{subfigure}[b]{0.55\textwidth}
1841 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1842 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1843 west},scaled x ticks = false,x tick label style={/pgf/number
1844 format/1000 sep = \space}]
1845 %\addplot+[only marks] table {proteinsOrig.txt};
1846 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
1847 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1848 {randGraph/ind/vf2ppInd35_0.05.txt};
1853 \begin{subfigure}[b]{0.55\textwidth}
1856 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1857 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1858 west},scaled x ticks = false,x tick label style={/pgf/number
1859 format/1000 sep = \space}]
1860 %\addplot+[only marks] table {proteinsOrig.txt};
1861 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
1862 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1863 {randGraph/ind/vf2ppInd35_0.1.txt};
1869 \begin{subfigure}[b]{0.55\textwidth}
1872 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1873 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1874 west},scaled x ticks = false,x tick label style={/pgf/number
1875 format/1000 sep = \space}]
1876 %\addplot+[only marks] table {proteinsOrig.txt};
1877 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
1878 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1879 {randGraph/ind/vf2ppInd35_0.3.txt};
1884 \begin{subfigure}[b]{0.55\textwidth}
1887 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1888 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1889 west},scaled x ticks = false,x tick label style={/pgf/number
1890 format/1000 sep = \space}]
1891 %\addplot+[only marks] table {proteinsOrig.txt};
1892 \addplot table {randGraph/ind/vf2pInd35_0.6.txt};
1893 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1894 {randGraph/ind/vf2ppInd35_0.6.txt};
1899 \begin{subfigure}[b]{0.55\textwidth}
1902 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1903 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1904 west},scaled x ticks = false,x tick label style={/pgf/number
1905 format/1000 sep = \space}]
1906 %\addplot+[only marks] table {proteinsOrig.txt};
1907 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
1908 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1909 {randGraph/ind/vf2ppInd35_0.8.txt};
1913 \begin{subfigure}[b]{0.55\textwidth}
1915 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1916 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1917 west},scaled x ticks = false,x tick label style={/pgf/number
1918 format/1000 sep = \thinspace}]
1919 %\addplot+[only marks] table {proteinsOrig.txt};
1920 \addplot table {randGraph/ind/vf2pInd35_0.95.txt};
1921 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1922 {randGraph/ind/vf2ppInd35_0.95.txt};
1927 \caption{IND on graphs having an average degree of
1928 35.}\label{fig:randIND35}
1934 \begin{axis}[title={Rand IND Summary, $\delta = 35$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
1935 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
1936 west},scaled x ticks = false,x tick label style={/pgf/number
1937 format/1000 sep = \thinspace}]
1938 %\addplot+[only marks] table {proteinsOrig.txt};
1939 \addplot[mark=*,mark size=1.5pt,color=blue] table
1940 {randGraph/ind/vf2pInd35_0.3.txt};
1941 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1942 {randGraph/ind/vf2ppInd35_0.3.txt};
1943 \addplot[mark=*,mark size=1.5pt,color=blue] table
1944 {randGraph/ind/vf2pInd35_0.6.txt};
1945 \addplot[mark=triangle*,mark
1946 size=1.8pt,color=red] table
1947 {randGraph/ind/vf2ppInd35_0.6.txt};
1948 \addplot[mark=*,mark
1949 size=1.5pt,color=blue] table
1950 {randGraph/ind/vf2pInd35_0.8.txt};
1951 \addplot[mark=triangle*,mark
1952 size=1.8pt,color=red] table
1953 {randGraph/ind/vf2ppInd35_0.8.txt};
1954 \addplot[mark=*,mark
1955 size=1.5pt,color=blue]
1957 {randGraph/ind/vf2pInd35_0.95.txt};
1958 \addplot[mark=triangle*,mark
1959 size=1.8pt,color=red]
1961 {randGraph/ind/vf2ppInd35_0.95.txt};
1966 \caption{Cummulative chart for $\delta=35$.}\label{fig:randIND35Sum}
1969 Based on these experiments, VF2++ is faster than VF2 Plus and able to
1970 handle really large graphs in milliseconds. Note that when $IND$ was
1971 considered and the small graphs had proportionally few nodes ($\rho =
1972 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
1973 orders (e.g. see the $\delta=10$ case on
1974 Figure~\ref{fig:randIND10}). If these examples had been excluded, the
1975 charts would have seemed to be similar to the other ones.
1976 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
1977 Plus slow slightly down, but remain practically usable even on graphs
1978 having 10 000 nodes.
1984 \section{Conclusion}
1985 In this paper, after providing a short summary of the recent
1986 algorithms, a new graph matching algorithm based on VF2, called VF2++,
1987 has been presented and analyzed from a practical viewpoint.
1989 Recognizing the importance of the node order and determining an
1990 efficient one, VF2++ is able to match graphs of thousands of nodes in
1991 near practically linear time including preprocessing. In addition to
1992 the proper order, VF2++ uses more efficient consistency and cutting
1993 rules which are easy to compute and make the algorithm able to prune
1994 most of the unfruitful branches without going astray.
1996 In order to show the efficiency of the new method, it has been
1997 compared to VF2 Plus, which is the best concurrent algorithm based on
2000 The experiments show that VF2++ consistently outperforms VF2 Plus on
2001 biological graphs. It seems to be asymptotically faster on protein and
2002 on contact map graphs in the case of induced subgraph isomorphism,
2003 while in the case of graph isomorphism, it has definitely better
2004 asymptotic behaviour on protein graphs.
2006 Regarding random sparse graphs, not only has VF2++ proved itself to be
2007 faster than VF2 Plus, but it has a practically linear behaviour both
2008 in the case of induced subgraph- and graph isomorphism, as well.
2012 %% The Appendices part is started with the command \appendix;
2013 %% appendix sections are then done as normal sections
2019 %% If you have bibdatabase file and want bibtex to generate the
2020 %% bibitems, please use
2022 \bibliographystyle{elsarticle-num} \bibliography{bibliography}
2024 %% else use the following coding to input the bibitems directly in the
2027 %% \begin{thebibliography}{00}
2029 %% %% \bibitem{label}
2030 %% %% Text of bibliographic item
2034 %% \end{thebibliography}
2039 %% End of file `elsarticle-template-num.tex'.