2 %% Copyright 2007, 2008, 2009 Elsevier Ltd
4 %% This file is part of the 'Elsarticle Bundle'.
5 %% ---------------------------------------------
7 %% It may be distributed under the conditions of the LaTeX Project Public
8 %% License, either version 1.2 of this license or (at your option) any
9 %% later version. The latest version of this license is in
10 %% http://www.latex-project.org/lppl.txt
11 %% and version 1.2 or later is part of all distributions of LaTeX
12 %% version 1999/12/01 or later.
14 %% The list of all files belonging to the 'Elsarticle Bundle' is
15 %% given in the file `manifest.txt'.
18 %% Template article for Elsevier's document class `elsarticle'
19 %% with numbered style bibliographic references
22 \documentclass[preprint,12pt]{elsarticle}
24 %% Use the option review to obtain double line spacing
25 %% \documentclass[authoryear,preprint,review,12pt]{elsarticle}
27 %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
28 %% for a journal layout:
29 %% \documentclass[final,1p,times]{elsarticle}
30 %% \documentclass[final,1p,times,twocolumn]{elsarticle}
31 %% \documentclass[final,3p,times]{elsarticle}
32 %% \documentclass[final,3p,times,twocolumn]{elsarticle}
33 %% \documentclass[final,5p,times]{elsarticle}
34 %% \documentclass[final,5p,times,twocolumn]{elsarticle}
36 %% For including figures, graphicx.sty has been loaded in
37 %% elsarticle.cls. If you prefer to use the old commands
38 %% please give \usepackage{epsfig}
40 %% The amssymb package provides various useful mathematical symbols
42 %% The amsthm package provides extended theorem environments
43 %% \usepackage{amsthm}
45 %% The lineno packages adds line numbers. Start line numbering with
46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
47 %% for the whole article with \linenumbers.
48 %% \usepackage{lineno}
51 %% \usepackage[pdftex]{graphicx}
54 \pgfplotsset{width=9cm}
55 \pgfplotsset{compat=1.8}
58 \usepackage{subcaption}
60 \usepackage{algorithm}
61 \usepackage{algpseudocode}
64 \usepackage{amsthm,amssymb}
65 \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}
67 \newtheorem{theorem}{Theorem}[subsection]
68 \newtheorem{corollary}{Corollary}[theorem]
69 \newtheorem{claim}[theorem]{Claim}
71 \newtheorem{definition}{Definition}[subsection]
72 \newtheorem{notation}{Notation}[subsection]
73 \newtheorem{example}{Example}[subsection]
74 \usetikzlibrary{decorations.markings}
75 \let\oldproofname=\proofname
76 %% \renewcommand{\proofname}{\rm\bf{Proof:}}
78 \captionsetup{font=normalsize}
80 \journal{Discrete Applied Mathematics}
86 %% Title, authors and addresses
88 %% use the tnoteref command within \title for footnotes;
89 %% use the tnotetext command for theassociated footnote;
90 %% use the fnref command within \author or \address for footnotes;
91 %% use the fntext command for theassociated footnote;
92 %% use the corref command within \author for corresponding author footnotes;
93 %% use the cortext command for theassociated footnote;
94 %% use the ead command for the email address,
95 %% and the form \ead[url] for the home page:
96 %% \title{Title\tnoteref{label1}}
97 %% \tnotetext[label1]{}
98 %% \author{Name\corref{cor1}\fnref{label2}}
99 %% \ead{email address}
100 %% \ead[url]{home page}
103 %% \address{Address\fnref{label3}}
106 \title{Improved Algorithms for Matching Biological Graphs}
108 %% use optional labels to link authors explicitly to addresses:
109 %% \author[label1,label2]{}
110 %% \address[label1]{}
111 %% \address[label2]{}
113 \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi}
115 \address{Dept of Operations Research, ELTE}
118 Subgraph isomorphism is a well-known NP-Complete problem, while its
119 special case, the graph isomorphism problem is one of the few problems
120 in NP neither known to be in P nor NP-Complete. Their appearance in
121 many fields of application such as pattern analysis, computer vision
122 questions and the analysis of chemical and biological systems has
123 fostered the design of various algorithms for handling special graph
126 The idea of using state space representation and checking some
127 conditions in each state to prune the search tree has made the VF2
128 algorithm one of the state of the art graph matching algorithms for
129 more than a decade. Recently, biological questions of ever increasing
130 importance have required more efficient, specialized algorithms.
132 This paper presents VF2++, a new algorithm based on the original VF2,
133 which runs significantly faster on most test cases and performs
134 especially well on special graph classes stemming from biological
135 questions. VF2++ handles graphs of thousands of nodes in practically
136 near linear time including preprocessing. Not only is it an improved
137 version of VF2, but in fact, it is by far the fastest existing
138 algorithm regarding biological graphs.
140 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking
141 into account the structure and the node labeling of the graph, VF2++
142 determines a state order in which most of the unfruitful branches of
143 the search space can be pruned immediately. Secondly, introducing more
144 efficient - nevertheless still easier to compute - cutting rules
145 reduces the chance of going astray even further.
147 In addition to the usual subgraph isomorphism, specialized versions
148 for induced subgraph isomorphism and for graph isomorphism are
149 presented. VF2++ has gained a runtime improvement of one order of
150 magnitude respecting induced subgraph isomorphism and a better
151 asymptotical behaviour in the case of graph isomorphism problem.
153 After having provided the description of VF2++, in order to evaluate
154 its effectiveness, an extensive comparison to the contemporary other
155 algorithms is shown, using a wide range of inputs, including both real
156 life biological and chemical datasets and standard randomly generated
159 The work was motivated and sponsored by QuantumBio Inc., and all the
160 developed algorithms are available as the part of the open source
161 LEMON graph and network optimization library
162 (http://lemon.cs.elte.hu).
166 %% keywords here, in the form: keyword \sep keyword
168 %% PACS codes here, in the form: \PACS code \sep code
170 %% MSC codes here, in the form: \MSC code \sep code
171 %% or \MSC[2008] code \sep code (2000 is the default)
180 \section{Introduction}
183 In the last decades, combinatorial structures, and especially graphs
184 have been considered with ever increasing interest, and applied to the
185 solution of several new and revised questions. The expressiveness,
186 the simplicity and the studiedness of graphs make them practical for
187 modelling and appear constantly in several seemingly independent
188 fields. Bioinformatics and chemistry are amongst the most relevant
189 and most important fields.
191 Complex biological systems arise from the interaction and cooperation
192 of plenty of molecular components. Getting acquainted with such
193 systems at the molecular level has primary importance, since
194 protein-protein interaction, DNA-protein interaction, metabolic
195 interaction, transcription factor binding, neuronal networks, and
196 hormone signaling networks can be understood only this way.
198 For instance, a molecular structure can be considered as a graph,
199 whose nodes correspond to atoms and whose edges to chemical bonds. The
200 secondary structure of a protein can also be represented as a graph,
201 where nodes are associated with aminoacids and the edges with hydrogen
202 bonds. The nodes are often whole molecular components and the edges
203 represent some relationships among them. The similarity and
204 dissimilarity of objects corresponding to nodes are incorporated to
205 the model by \emph{node labels}. Many other chemical and biological
206 structures can easily be modeled in a similar way. Understanding such
207 networks basically requires finding specific subgraphs, which can not
208 avoid the application of graph matching algorithms.
210 Finally, let some of the other real-world fields related to some
211 variants of graph matching be briefly mentioned: pattern recognition
212 and machine vision \cite{HorstBunkeApplications}, symbol recognition
213 \cite{CordellaVentoSymbolRecognition}, face identification
214 \cite{JianzhuangYongFaceIdentification}. \\
216 Subgraph and induced subgraph matching problems are known to be
217 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is
218 one of the few problems in NP neither known to be in P nor
219 NP-Complete. Although polynomial time isomorphism algorithms are known
220 for various graph classes, like trees and planar
221 graphs\cite{PlanarGraphIso}, bounded valence
222 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
223 or permutation graphs\cite{PermGraphIso}.
225 In the following, some algorithms based on other approaches are
226 summarized, which do not need any restrictions on the graphs. However,
227 an overall polynomial behaviour is not expectable from such an
228 alternative, it may often have good performance, even on a graph class
229 for which polynomial algorithm is known. Note that this summary
230 containing only exact matching algorithms is far not complete, neither
231 does it cover all the recent algorithms.
233 The first practically usable approach was due to
234 Ullmann\cite{Ullmann} which is a commonly used depth-first
235 search based algorithm with a complex heuristic for reducing the
236 number of visited states. A major problem is its $\Theta(n^3)$ space
237 complexity, which makes it impractical in the case of big sparse
240 In a recent paper, Ullmann\cite{UllmannBit} presents an
241 improved version of this algorithm based on a bit-vector solution for
242 the binary Constraint Satisfaction Problem.
244 The Nauty algorithm\cite{Nauty} transforms the two graphs to
245 a canonical form before starting to check for the isomorphism. It has
246 been considered as one of the fastest graph isomorphism algorithms,
247 although graph categories were shown in which it takes exponentially
248 many steps. This algorithm handles only the graph isomorphism problem.
250 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search
251 strategy and formulates the matching as a Constraint Satisfaction
252 Problem to prune the search tree. The constraints are that the mapping
253 has to be injective and edge-preserving, hence it is possible to
254 handle new matching types as well.
256 The \textbf{RI} algorithm\cite{RI} and its variations are based on a
257 state space representation. After reordering the nodes of the graphs,
258 it uses some fast executable heuristic checks without using any
259 complex pruning rules. It seems to run really efficiently on graphs
260 coming from biology, and won the International Contest on Pattern
261 Search in Biological Databases\cite{Content}.
263 The currently most commonly used algorithm is the
264 \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was
265 designed for solving pattern matching and computer vision problems,
266 and has been one of the best overall algorithms for more than a
267 decade. Although, it can't be up to new specialized algorithms, it is
268 still widely used due to its simplicity and space efficiency. VF2 uses
269 a state space representation and checks some conditions in each state
270 to prune the search tree.
272 Our first graph matching algorithm was the first version of VF2 which
273 recognizes the significance of the node ordering, more opportunities
274 to increase the cutting efficiency and reduce its computational
275 complexity. This project was initiated and sponsored by QuantumBio
276 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
277 code --- has been published as a part of LEMON\cite{LEMON} open source
280 This paper introduces \textbf{VF2++}, a new further improved algorithm
281 for the graph and (induced)subgraph isomorphism problem, which uses
282 efficient cutting rules and determines a node order in which VF2 runs
283 significantly faster on practical inputs.
285 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has
286 been published. It is considered to be as efficient as the RI
287 algorithm and has a strictly better behavior on large graphs. The
288 main idea of VF2 Plus is to precompute a heuristic node order of the
289 small graph, in which the VF2 works more efficiently.
291 \section{Problem Statement}
292 This section provides a detailed description of the problems to be
294 \subsection{Definitions}
296 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and
297 $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs.
298 \begin{definition}\label{sec:ismorphic}
299 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M:
300 V_{small} \longrightarrow V_{large}$ bijection, for which the
303 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
304 (M(u),M(v))\in{E_{large}}$
307 For the sake of simplicity in this paper subgraphs and induced
308 subgraphs are defined in a more general way than usual:
310 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I:
311 V_{small}\longrightarrow V_{large}$ injection, for which the
314 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$
319 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists
320 I: V_{small}\longrightarrow V_{large}$ injection, for which the
323 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
324 (I(u),I(v))\in E_{large}$
329 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node
330 label function}, where K is an arbitrary set. The elements in K
331 are the \textbf{node labels}. Two nodes, u and v are said to be
332 \textbf{equivalent}, if $lab(u)=lab(v)$.
335 When node labels are also given, the matched nodes must have the same
336 labels. For example, the node labeled isomorphism is phrased by
338 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label
339 function lab} if $\exists M: V_{small} \longrightarrow V_{large}$
340 bijection, for which the following is true:
342 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
343 (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} :
348 The other two definitions can be extended in the same way.
350 Note that edge label function can be defined similarly to node label
351 function, and all the definitions can be extended with additional
352 conditions, but it is out of the scope of this work.
354 The equivalence of two nodes is usually defined by another relation,
355 $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the
356 definition given above if R is an equivalence relation, which does not
357 mean restriction in biological and chemical applications.
359 \subsection{Common problems}\label{sec:CommProb}
361 The focus of this paper is on two extensively studied topics, the
362 subgraph isomorphism and its variations. However, the following
363 problems also appear in many applications.
365 The \textbf{subgraph matching problem} is the following: is
366 $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node
369 The \textbf{induced subgraph matching problem} asks the same about the
370 existence of an induced subgraph.
372 The \textbf{graph isomorphism problem} can be defined as induced
373 subgraph matching problem where the sizes of the two graphs are equal.
375 In addition to existence, it may be needed to show such a subgraph, or
376 it may be necessary to list all of them.
378 It should be noted that some authors misleadingly refer to the term
379 \emph{subgraph isomorphism problem} as an \emph{induced subgraph
380 isomorphism problem}.
382 The following sections give the descriptions of VF2, VF2++, VF2 Plus
383 and a particular comparison.
385 \section{The VF2 Algorithm}
386 This algorithm is the basis of both the VF2++ and the VF2 Plus. VF2
387 is able to handle all the variations mentioned in Section
388 \ref{sec:CommProb}. Although it can also handle directed graphs,
389 for the sake of simplicity, only the undirected case will be
393 \subsection{Common notations}
394 \indent Assume $G_{small}$ is searched in $G_{large}$. The following
395 definitions and notations will be used throughout the whole paper.
397 A set $M\subseteq V_{small}\times V_{large}$ is called
398 \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears
399 in more than one pair in M. That is, M uniquely associates some of
400 the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice
405 Mapping M \textbf{covers} a node v, if there exists a pair in M, which
410 A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the
411 nodes in $V_{small}$.
415 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in
416 V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in
417 V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$.
421 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node
422 exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$
423 and $v\in V_{small}\cup V_{large}$.
426 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique
428 The definitions of the isomorphism types can be rephrased on the
429 existence of a special whole mapping $M$, since it represents a
430 bijection. For example
432 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph
433 isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v
434 \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
435 (Pair(M,u),Pair(M,v))\in E_{large}$.
439 A set of whole mappings is called \textbf{problem type}.
441 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
442 which can be substituted by any problem type.
444 A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using
445 this notations, VF2 searches a whole mapping $W$ of type $PT$.
447 For example the problem type of graph isomorphism problem is the
448 following. A whole mapping $W$ is in $\mathbf{ISO}$, iff the
449 bijection represented by $W$ satisfies Definition~\ref{sec:ismorphic}.
450 The subgraph- and induced subgraph matching problems can be formalized
451 in a similar way. Let their problem types be denoted as $\mathbf{SUB}$
456 $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in
457 PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow
458 (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of
459 $G_{small}$ has to be mapped to an edge of $G_{large}$ for each
463 Note that $ISO$, $SUB$ and $IND$ are expanding problem types.
465 This paper deals with the three problem types mentioned above only,
466 but the following generic definitions make it possible to handle other
467 types as well. Although it may be challenging to find a proper
468 consistency function and an efficient cutting function.
471 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
472 \textbf{consistency function by } $\mathbf{PT}$, if the following
473 holds. If there exists whole mapping $W$ of type PT for which
474 $M\subseteq W$, then $Cons_{PT}(M)$ is true.
478 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
479 \textbf{cutting function by } $\mathbf{PT}$, if the following
480 holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a
481 whole mapping W of type PT.
485 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if
486 $Cons_{PT}(M)$ is true.
489 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
491 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and
492 $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where
493 $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping.
496 $Cons_{PT}$ will be used to check the consistency of the already
497 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
498 no whole consistent mapping can contain the current mapping.
500 \subsection{Overview of the algorithm}
501 VF2 uses a state space representation of mappings, $Cons_{PT}$ for
502 excluding inconsistency with the problem type and $Cut_{PT}$ for
503 pruning the search tree. Each state $s$ of the matching process can
504 be associated with a mapping $M(s)$.
506 Algorithm~\ref{alg:VF2Pseu} is a high level description of
507 the VF2 matching algorithm.
511 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
512 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson ..
513 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
514 \begin{algorithmic}[1]
516 \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers
517 $V_{small}$} \State Output($M(s)$) \Else
519 \State Compute the set $P(s)$ of the pairs candidate for inclusion
520 in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$
521 $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state
522 $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call}
523 VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
528 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it
529 starts with an empty mapping.
531 For each state $s$, the algorithm computes $P(s)$, the set of
532 candidate node pairs for adding to the current state $s$.
534 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and
535 $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and
536 $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup
537 \{p\}$ is computed, and the whole process is recursively applied to
538 $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it
539 can be proved that $s$ can not be extended to a whole mapping.
541 In order to make sure of the correctness see
543 Through consistent mappings, only consistent whole mappings can be
544 reached, and all of the whole mappings are reachable through
548 Note that a state may be reached in many different ways, since the
549 order of insertions into M does not influence the nascent mapping. In
550 fact, the number of different ways which lead to the same state can be
551 exponentially large. If $G_{small}$ and $G_{large}$ are circles with n
552 nodes and n different node labels, there exists exactly one graph
553 isomorphism between them, but it will be reached in $n!$ different
556 However, one may observe
559 \label{claim:claimTotOrd}
560 Let $\prec$ an arbitrary total ordering relation on $V_{small}$. If
561 the algorithm ignores each $p=(u,v) \in P(s)$, for which
563 $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$,
565 then no state can be reached more than ones and each state associated
566 with a whole mapping remains reachable.
569 Note that the cornerstone of the improvements to VF2 is a proper
570 choice of a total ordering.
572 \subsection{The candidate set P(s)}
573 \label{candidateComputingVF2}
574 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.
575 Suppose that $PT$ is an expanding problem type, see
576 Definition~\ref{expPT}.
579 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by
580 $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}}
581 \wedge \tilde{u}$ is covered by $M(s)\}$, and
582 \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not
584 $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}}
585 \wedge \tilde{v}$ is covered by $M(s)\}$
588 The set $P(s)$ includes the pairs of uncovered neighbours of covered
589 nodes and if there is not such a node pair, all the pairs containing
590 two uncovered nodes are added. Formally, let
594 T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if }
595 T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq
596 \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s))
597 &\hspace{-0.15cm}otherwise.
601 \subsection{Consistency}
602 This section defines the consistency functions for the different
603 problem types mentioned in Section~\ref{sec:CommProb}.
605 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} :
606 (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large}
607 (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$
609 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is
610 a state of the matching procedure, $M(s)$ is consistent mapping by
611 $PT$ and $lab(u)=lab(v)$. $Cons_{PT}(p,M(s))$ checks whether
612 including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$.
614 \subsubsection{Induced subgraph isomorphism}
615 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow
616 (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small}
617 \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline The
618 following formulation gives an efficient way of calculating
621 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v)
622 \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge
623 (\forall \tilde{u}\in \Gamma_{small}(u)
624 \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
625 consistency function in the case of $IND$.
628 \subsubsection{Graph isomorphism}
629 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$
630 $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by
633 $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and
634 only if it is a consistency function by $IND$.
636 \subsubsection{Subgraph isomorphism}
637 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow
638 (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small}
639 \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.
641 The following formulation gives an efficient way of calculating
644 $Cons_{SUB}((u,v),M(s)):= (\forall \tilde{u}\in \Gamma_{small}(u)
645 \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
646 consistency function by $SUB$.
649 \subsection{Cutting rules}
650 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently
651 verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can
652 be true only if it is impossible to extended $M(s)\cup \{p\}$ to a
656 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash
657 M_{small}(s))\backslash T_{small}(s)$, and
658 \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash
659 M_{large}(s))\backslash T_{large}(s)$.
661 \subsubsection{Induced subgraph isomorphism}
663 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
664 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
665 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap
666 \tilde{T}_{small}(s)|$ is a cutting function by $IND$.
668 \subsubsection{Graph isomorphism}
669 Note that the cutting function of induced subgraph isomorphism defined
670 above is a cutting function by $ISO$, too, however it is less
671 efficient than the following while their computational complexity is
674 $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq
675 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
676 \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap
677 \tilde{T}_{small}(s)|$ is a cutting function by $ISO$.
680 \subsubsection{Subgraph isomorphism}
682 $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
683 |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by
686 Note that there is a significant difference between induced and
687 non-induced subgraph isomorphism:
691 $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
692 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
693 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$
694 is \textbf{not} a cutting function by $SUB$.
702 [scale=.8,auto=left,every node/.style={circle,fill=black!15}]
703 \node[rectangle,fill=black!15] at (4,6) {$G_{small}$}; \node (u4) at
704 (2.5,10) {$u_4$}; \node (u3) at (5.5,10) {$u_3$}; \node (u1) at
705 (2.5,7) {$u_1$}; \node (u2) at (5.5,7) {$u_2$};
707 \node[rectangle,fill=black!30] at (13.5,6) {$G_{large}$};
708 \node[fill=black!30] (v4) at (12,10) {$v_4$}; \node[fill=black!30]
709 (v3) at (15,10) {$v_3$}; \node[fill=black!30] (v1) at (12,7)
710 {$v_1$}; \node[fill=black!30] (v2) at (15,7) {$v_2$};
713 \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1} \draw (\from) --
714 (\to); \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3} \draw
716 % \draw[dashed] (\from) -- (\to);
718 \caption{Graphs for the proof of Claim~\ref{claimSUB}}
722 Let the two graphs of Figure~\ref{fig:proofSUB} be the input
723 graphs. Suppose the total ordering relation is $u_1 \prec u_2 \prec
724 u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add
725 $(u_2,v_2)\in P(s)$.\newline $Cons_{SUB}((u_2,v_2),M(s))=true$, so
726 $M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function
727 $Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the
728 tree.\newline On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true,
729 since\\$0=|\Gamma_{large}(v_2)\cap
730 \tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap
731 \tilde{T}_{small}(s)|=1$ is true, but still the tree can not be
732 pruned, because otherwise the
733 $\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found.
736 \section{The VF2++ Algorithm}
737 Although any total ordering relation makes the search space of VF2 a
738 tree, its choice turns out to dramatically influence the number of
739 visited states. The goal is to determine an efficient one as quickly
742 The main reason for VF2++' superiority over VF2 is twofold. Firstly,
743 taking into account the structure and the node labeling of the graph,
744 VF2++ determines a state order in which most of the unfruitful
745 branches of the search space can be pruned immediately. Secondly,
746 introducing more efficient --- nevertheless still easier to compute
747 --- cutting rules reduces the chance of going astray even further.
749 In addition to the usual subgraph isomorphism, specialized versions
750 for induced subgraph isomorphism and for graph isomorphism have been
751 designed. VF2++ has gained a runtime improvement of one order of
752 magnitude respecting induced subgraph isomorphism and a better
753 asymptotical behaviour in the case of graph isomorphism problem.
755 Note that a weaker version of the cutting rules and the more efficient
756 candidate set calculating were described in \cite{VF2Plus}, too.
758 It should be noted that all the methods described in this section are
759 extendable to handle directed graphs and edge labels as well.
761 The basic ideas and the detailed description of VF2++ are provided in
764 \subsection{Preparations}
766 \label{claim:claimCoverFromLeft}
767 The total ordering relation uniquely determines a node order, in which
768 the nodes of $V_{small}$ will be covered by VF2. From the point of
769 view of the matching procedure, this means, that always the same node
770 of $G_{small}$ will be covered on the d-th level.
773 In order to make the search space a tree, the pairs in $\{(u,v)\in
774 P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$.
776 Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} :
779 The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} :
780 \forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form
781 $\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious,
782 that only $\tilde{u}$ can be covered in $V_{small}$. Actually,
783 $\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in
784 $V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and
785 $T_{small}(s)$ depends only on the covered nodes of $G_{small}$.
787 Simple induction on $d$ shows that the set of covered nodes of
788 $G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if
793 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
794 $V_{small}$ is \textbf{matching order}, if exists $\prec$ total
795 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
796 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$.
799 \begin{claim}\label{claim:MOclaim}
800 A total ordering is matching order, iff the nodes of every component
801 form an interval in the node sequence, and every node connects to a
802 previous node in its component except the first node of the
803 component. The order of the components is arbitrary. \\Formally
805 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
806 $V_{small}$ is matching order $\Leftrightarrow$ $\forall
807 G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall
808 i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
809 V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq
810 l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge
811 (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in
812 \{1,..,|V_{small}|\}$\newline
815 Suppose a matching order is given. It has to be shown that the node
816 sequence has a structure described above.\\ Let
817 $G'_{small}=(V'_{small},E'_{small})$ be an arbitrary component and $i$
820 $(\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
821 V'_{small})\Rightarrow u_{\sigma(i)}$ is not the first covered node in
822 $G_{small}\ \Rightarrow u_{\sigma(i)}$ is connected to a covered node
823 $u_{\sigma(k)}$ where $k<i$, since $u_{\sigma(i)}\in T_{small}(s)$ for
824 some $s$ and $T_{small}(s)\subseteq{V'_{small}}$ contains only nodes
825 connected to at least one covered node. It's easy to see, that
826 $\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}$, since
827 $T_{small}(s)$ contains only nodes connected to some covered ones,
828 while it is not empty, but if it were empty, then $u_{\sigma(k)}$ and
829 $u_{\sigma(i)}$ were not in the same component.
831 Now, let us show that if a node sequence has the special structure
832 described above, it has to be matching
833 order.\\ $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ is
834 a total ordering, which determines the matching order
835 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$.
838 To summing up, a total ordering always uniquely determines a matching
839 order, and every matching order can be determined by a total ordering,
840 however, more than one different total orderings may determine the
842 \subsection{Idea behind the algorithm}
843 The goal is to find a matching order in which the algorithm is able to
844 recognize inconsistency or prune the infeasible branches on the
845 highest levels and goes deep only if it is needed.
848 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the
849 number of neighbours of u which are in H, where $u\in V_{small} $ and
850 $H\subseteq V_{small}$.
853 The principal question is the following. Suppose a state $s$ is
854 given. For which node of $T_{small}(s)$ is the hardest to find a
855 consistent pair in $G_{large}$? The more covered neighbours a node in
856 $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has
857 ---, the more rarely satisfiable consistency constraints for its pair
860 In biology, most of the graphs are sparse, thus several nodes in
861 $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes
862 reasonable to define a secondary and a tertiary order between them.
863 The observation above proves itself to be as determining, that the
864 secondary ordering prefers nodes with the most uncovered neighbours
865 among which have the same $Conn_{M_{small}(s)}$ to increase
866 $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible. The
867 tertiary ordering prefers nodes having the rarest uncovered labels.
869 Note that the secondary ordering is the same as the ordering by $deg$,
870 which is a static data in front of the above used.
872 These rules can easily result in a matching order which contains the
873 nodes of a long path successively, whose nodes may have low $Conn$ and
874 is easily matchable into $G_{large}$. To avoid that, a BFS order is
875 used, which provides the shortest possible paths.
878 In the following, some examples on which the VF2 may be slow are
879 described, although they are easily solvable by using a proper
883 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways
884 without node labels. Let $u\in V_{small}$ and $v\in V_{large}$.
890 $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash
893 $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash
897 Now, any mapping by the node label $lab$ must contain $(u,v)$, since
898 $u$ is black and no node in $V_{large}$ has a black label except
899 $v$. If unfortunately $u$ were the last node which will get covered,
900 VF2 would check only in the last steps, whether $u$ can be matched to
903 However, had $u$ been the first matched node, u would have been
904 matched immediately to v, so all the mappings would have been
905 precluded in which node labels can not correspond.
909 Suppose there is no node label given, $G_{small}$ is a small graph and
910 can not be mapped into $G_{large}$ and $u\in V_{small}$.
912 Let $G'_{small}:=(V_{small}\cup
913 \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup
914 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
915 $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint
916 from $G_{small}$ and one of its starting points is connected to $u\in
919 Is there a subgraph of $G_{large}$, which is isomorph with
922 If unfortunately the nodes of the path were the first $k$ nodes in the
923 matching order, the algorithm would iterate through all the possible k
924 long paths in $G_{large}$, and it would recognize that no path can be
925 extended to $G'_{small}$.
927 However, had it started by the matching of $G_{small}$, it would not
928 have matched any nodes of the path.
931 These examples may look artificial, but the same problems also appear
932 in real-world instances, even though in a less obvious way.
934 \subsection{Total ordering}
935 Instead of the total ordering relation, the matching order will be
938 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} :
939 l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ ,
940 where $l$ is a label and $\mathcal{M}\subseteq V_{small}$.
943 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
948 \algtext*{EndProcedure}
950 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
951 \begin{algorithmic}[1]
952 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$
953 \Comment{matching order} \While{$V_{small}\backslash \mathcal{M}
954 \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
955 min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash
956 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
957 root node $r$. \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
958 $d$-th level \State Process $V_d$ \Comment{See Algorithm
959 \ref{alg:VF2PPProcess1}} \EndFor
960 \EndWhile \EndProcedure
966 \algtext*{EndProcedure}%ne nyomtasson ..
968 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
969 \begin{algorithmic}[1]
970 \Procedure{VF2++ProcessLevel1}{$V_{d}$} \While{$V_d\neq\emptyset$}
971 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg
972 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
973 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
974 $F_\mathcal{M}$ \EndWhile \EndProcedure
978 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
979 matching order procedure of VF2++. It computes a BFS tree for each
980 component in ascending order of their rarest $lab$ and largest $deg$,
981 whose root vertex is the component's minimal
982 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending
983 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately
984 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.
986 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}
987 provides a matching order.
990 \subsection{Cutting rules}
991 \label{VF2PPCuttingRules}
992 This section presents the cutting rules of VF2++, which are improved
993 by using extra information coming from the node labels.
995 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l
996 \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and
997 $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge
998 \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in
999 V_{large}$ and $l$ is a label.
1002 \subsubsection{Induced subgraph isomorphism}
1004 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND.
1007 It has to be shown, that $LabCut_{IND}((u,v),M(s))=true\Rightarrow$
1008 the mapping can not be extended to a whole
1009 mapping.\\ $LabCut_{IND}((u,v),M(s))=true,$ iff the following
1010 holds. $\\\exists l: |\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| <
1011 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)| \vee
1012 |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| <
1013 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$.
1015 Suppose that $|\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| <
1016 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)|$. Each node of
1017 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$ has to be matched to a
1018 node in $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$, so
1019 $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$ can not be smaller than
1020 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$. That is why $M(s)$ can
1021 not be extended to a whole mapping.
1023 Otherwise $|\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| <
1024 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$ has to be
1025 true. Similarly, each node of $\Gamma_{large}^{l}(v)\cap
1026 \tilde{T}_{large}(s)$ has to be matched to a node in
1027 $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$,
1028 i.e. $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ can not be
1029 smaller than $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, if
1030 $M(s)$ is extendible.
1032 The following claims can be proven similarly.
1033 \subsubsection{Graph isomorphism}
1035 \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO.
1038 \subsubsection{Subgraph isomorphism}
1040 \[LabCut_{SUB}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\] is a cutting function by SUB.
1045 \subsection{Implementation details}
1046 This section provides a detailed summary of an efficient
1047 implementation of VF2++.
1048 \subsubsection{Storing a mapping}
1049 After fixing an arbitrary node order ($u_0, u_1, ..,
1050 u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store
1051 the current mapping in the following way.
1055 v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INVALID &
1056 if\ no\ node\ has\ been\ mapped\ to\ u_i.
1059 Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$
1061 \subsubsection{Avoiding the recurrence}
1062 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
1063 as a \textit{while loop}, which has a loop counter $depth$ denoting the
1064 all-time depth of the recursion. Fixing a matching order, let $M$
1065 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},
1066 $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before
1067 $depth$. $M[depth]$ changes
1068 while the state is being processed, but the property is held before
1069 both stepping back to a predecessor state and exploring a successor
1072 The necessary part of the candidate set is easily maintainable or
1073 computable by following
1074 Section~\ref{candidateComputingVF2}. A much faster method
1075 has been designed for biological- and sparse graphs, see the next
1076 section for details.
1078 \subsubsection{Calculating the candidates for a node}
1079 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
1080 task is not to maintain the candidate set, but to generate the
1081 candidate nodes in $G_{large}$ for a given node $u\in V_{small}$. In
1082 case of an expanding problem type and $M$ mapping, if a node $v\in
1083 V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall
1084 u'\in V_{small} : (u,u')\in
1085 E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in
1086 E_{large}$. That is, each covered neighbour of $u$ has to be mapped to
1087 a covered neighbour of $v$.
1089 Having said that, an algorithm running in $\Theta(deg)$ time is
1090 describable if there exists a covered node in the component containing
1091 $u$, and a linear one other wise.
1094 \subsubsection{Determining the node order}
1095 This section describes how the node order preprocessing method of
1096 VF2++ can efficiently be implemented.
1098 For using lookup tables, the node labels are associated with the
1099 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
1100 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order
1101 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
1102 in $V_{small}$ having label i, which is easy to compute in
1103 $\Theta(|V_{small}|)$ steps.
1105 Representing $\mathcal{M}\subseteq V_{small}$ as an array of
1106 size $|V_{small}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.
1108 After a node $u$ gets to the next place of the node order,
1109 $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is
1110 one less covered node in $V_{large}$ with label $lab(u)$, that is why
1111 min selection sort is preferred which gives the elements from left to
1112 right in descending order, see Algorithm~\ref{alg:VF2PPProcess1}.
1114 \subsubsection{Cutting rules}
1115 In Section~\ref{VF2PPCuttingRules}, the cutting rules were
1116 described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$
1117 and $\tilde T_{large}$, which are dependent on the all-time mapping
1118 (i.e. on the all-time state). The aim is to check the labeled cutting
1119 rules of VF2++ in $\Theta(deg)$ time.
1121 Firstly, suppose that these four sets are given in such a way, that
1122 checking whether a node is in a certain set takes constant time,
1123 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
1124 initially zero integer lookup table of size $|K|$. After incrementing
1125 $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and
1126 decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap
1127 T_{large}(s)$, the first part of the cutting rules is checkable in
1128 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
1129 to zero takes $\Theta(deg)$ time again, which makes it possible to use
1130 the same table through the whole algorithm. The second part of the
1131 cutting rules can be verified using the same method with $\tilde
1132 T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and
1133 $T_{large}$. Thus, the overall complexity is $\Theta(deg)$.
1135 An other integer lookup table storing the number of covered neighbours
1136 of each node in $G_{large}$ gives all the information about the sets
1137 $T_{large}$ and $\tilde T_{large}$, which is maintainable in
1138 $\Theta(deg)$ time when a pair is added or substracted by incrementing
1139 or decrementing the proper indices. A further improvement is that the
1140 values of $L[lab(u')]$ in case of checking $u$ is dependent only on
1141 $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an
1142 array of pairs (label, number of such labels) can be stored to skip
1143 the maintaining operations. Note that these arrays are at most of size
1144 $deg$. Skipping this trick, the number of covered neighbours has to be
1145 stored for each node of $G_{small}$ as well to get the sets
1146 $T_{small}$ and $\tilde T_{small}$.
1148 Using similar tricks, the consistency function can be evaluated in
1149 $\Theta(deg)$ steps, as well.
1151 \section{The VF2 Plus Algorithm}
1152 The VF2 Plus algorithm is a recently improved version of VF2. It was
1153 compared with the state of the art algorithms in \cite{VF2Plus} and
1154 has proven itself to be competitive with RI, the best algorithm on
1155 biological graphs. \\ A short summary of VF2 Plus follows, which uses
1156 the notation and the conventions of the original paper.
1158 \subsection{Ordering procedure}
1159 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$
1160 with the lowest probability to find a pair in $V_{small}$ and the
1161 highest number of connections with the nodes already sorted by the
1165 $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and
1166 $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$.
1168 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in
1171 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in
1174 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of
1175 already sorted nodes, $T$ is the set of nodes candidate to be
1176 selected, and $degreeM$ of a node is the number of its neighbours in
1179 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
1180 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson ..
1182 \caption{}\label{alg:VF2PlusPseu}
1183 \begin{algorithmic}[1]
1184 \Procedure{VF2 Plus order}{} \State Select the node with the lowest
1185 $P$. \If {more nodes share the same $P$} \State select the one with
1186 maximum degree \EndIf \If {more nodes share the same $P$ and have the
1187 max degree} \State select the first \EndIf \State Put the selected
1188 node in the set $M$. \label{alg:putIn} \State Put all its unsorted
1189 neighbours in the set $T$. \If {$M\neq V_{small}$} \State From set
1190 $T$ select the node with maximum $degreeM$. \If {more nodes have
1191 maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf
1192 \If {more nodes have maximum $degreeM$ and $P$} \State Select the
1193 first. \EndIf \State \textbf{goto \ref{alg:putIn}.} \EndIf
1198 Using these notations, Algorithm~\ref{alg:VF2PlusPseu}
1199 provides the description of the sorting procedure.
1201 Note that $P(u)$ is not the exact probability of finding a consistent
1202 pair for $u$ by choosing a node of $V_{large}$ randomly, since
1203 $P_{lab}$ and $P_{deg}$ are not independent, though calculating the
1204 real probability would take quadratic time, which may be reduced by
1205 using fittingly lookup tables.
1207 \section{Experimental results}
1208 This section compares the performance of VF2++ and VF2 Plus. Both
1209 algorithms have run faster with orders of magnitude than VF2, thus its
1210 inclusion was not reasonable.
1211 \subsection{Biological graphs}
1212 The tests have been executed on a recent biological dataset created
1213 for the International Contest on Pattern Search in Biological
1214 Databases\cite{Content}, which has been constructed of molecule,
1215 protein and contact map graphs extracted from the Protein Data
1216 Bank\cite{ProteinDataBank}.
1218 The molecule dataset contains small graphs with less than 100 nodes
1219 and an average degree of less than 3. The protein dataset contains
1220 graphs having 500-10 000 nodes and an average degree of 4, while the
1221 contact map dataset contains graphs with 150-800 nodes and an average
1224 In the following, the induced subgraph isomorphism and the graph
1225 isomorphism will be examined.
1227 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For run time results, please see Figure~\ref{fig:bioIND}.
1229 In an other experiment, the nodes of each graph in the database had been
1230 shuffled, and an isomorphism between the shuffled and the original
1231 graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}.
1238 \begin{subfigure}[b]{0.55\textwidth}
1240 \begin{tikzpicture}[trim axis left, trim axis right]
1241 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1242 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1243 west},scaled x ticks = false,x tick label style={/pgf/number
1244 format/1000 sep = \thinspace}]
1245 %\addplot+[only marks] table {proteinsOrig.txt};
1246 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark
1247 size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
1250 \caption{In the case of molecules, there is not such a significant
1251 difference, but VF2++ seems to be faster as the number of nodes
1252 increases.}\label{fig:ISOMolecule}
1256 \begin{subfigure}[b]{0.55\textwidth}
1258 \begin{tikzpicture}[trim axis left, trim axis right]
1259 \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1260 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1261 west},scaled x ticks = false,x tick label style={/pgf/number
1262 format/1000 sep = \thinspace}]
1263 %\addplot+[only marks] table {proteinsOrig.txt};
1264 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark
1265 size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
1268 \caption{The results are closer to each other on contact maps, but
1269 VF2++ still performs consistently better.}\label{fig:ISOContact}
1275 \begin{subfigure}[b]{0.55\textwidth}
1277 \begin{tikzpicture}[trim axis left, trim axis right]
1278 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1279 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1280 west},scaled x ticks = false,x tick label style={/pgf/number
1281 format/1000 sep = \thinspace}]
1282 %\addplot+[only marks] table {proteinsOrig.txt};
1283 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark
1284 size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
1287 \caption{On protein graphs, VF2 Plus has a super linear time
1288 complexity, while VF2++ runs in near constant time. The difference
1289 is about two order of magnitude on large graphs.}\label{fig:ISOProt}
1294 \caption{\normalsize{Graph isomomorphism on biological graphs}}\label{fig:bioISO}
1301 \begin{subfigure}[b]{0.55\textwidth}
1303 \begin{tikzpicture}[trim axis left, trim axis right]
1304 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1305 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1306 west},scaled x ticks = false,x tick label style={/pgf/number
1307 format/1000 sep = \thinspace}]
1308 %\addplot+[only marks] table {proteinsOrig.txt};
1309 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark
1310 size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
1313 \caption{In the case of molecules, the algorithms have
1314 similar behaviour, but VF2++ is almost two times faster even on such
1315 small graphs.} \label{fig:INDMolecule}
1319 \begin{subfigure}[b]{0.55\textwidth}
1321 \begin{tikzpicture}[trim axis left, trim axis right]
1322 \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1323 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1324 west},scaled x ticks = false,x tick label style={/pgf/number
1325 format/1000 sep = \thinspace}]
1326 %\addplot+[only marks] table {proteinsOrig.txt};
1327 \addplot table {Orig/ContactMaps.128.txt};
1328 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1329 {VF2PPLabel/ContactMaps.128.txt};
1332 \caption{On contact maps, VF2++ runs in near constant time, while VF2
1333 Plus has a near linear behaviour.} \label{fig:INDContact}
1339 \begin{subfigure}[b]{0.55\textwidth}
1341 \begin{tikzpicture}[trim axis left, trim axis right]
1342 \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
1343 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1344 west},scaled x ticks = false,x tick label style={/pgf/number
1345 format/1000 sep = \thinspace}] %\addplot+[only marks] table
1346 {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]
1347 table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark
1348 size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
1351 \caption{Both the algorithms have linear behaviour on protein
1352 graphs. VF2++ is more than 10 times faster than VF2
1353 Plus.} \label{fig:INDProt}
1358 \caption{\normalsize{Graph isomomorphism on biological graphs}}\label{fig:bioIND}
1365 \subsection{Random graphs}
1366 This section compares VF2++ with VF2 Plus on random graphs of a large
1367 size. The node labels are uniformly distributed. Let $\delta$ denote
1368 the average degree. For the parameters of problems solved in the
1369 experiments, please see the top of each chart.
1370 \subsubsection{Graph isomorphism}
1371 To evaluate the efficiency of the algorithms in the case of graph
1372 isomorphism, connected graphs of less than 20 000 nodes have been
1373 considered. Generating a random graph and shuffling its nodes, an
1374 isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results
1375 on graph sets of various density.
1383 \begin{subfigure}[b]{0.55\textwidth}
1386 \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1387 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1388 west},scaled x ticks = false,x tick label style={/pgf/number
1389 format/1000 sep = \space}]
1390 %\addplot+[only marks] table {proteinsOrig.txt};
1391 \addplot table {randGraph/iso/vf2pIso5_1.txt};
1392 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1393 {randGraph/iso/vf2ppIso5_1.txt};
1399 \begin{subfigure}[b]{0.55\textwidth}
1402 \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1403 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1404 west},scaled x ticks = false,x tick label style={/pgf/number
1405 format/1000 sep = \space}]
1406 %\addplot+[only marks] table {proteinsOrig.txt};
1407 \addplot table {randGraph/iso/vf2pIso10_1.txt};
1408 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1409 {randGraph/iso/vf2ppIso10_1.txt};
1416 \begin{subfigure}[b]{0.55\textwidth}
1419 \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1420 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1421 west},scaled x ticks = false,x tick label style={/pgf/number
1422 format/1000 sep = \space}]
1423 %\addplot+[only marks] table {proteinsOrig.txt};
1424 \addplot table {randGraph/iso/vf2pIso15_1.txt};
1425 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1426 {randGraph/iso/vf2ppIso15_1.txt};
1431 \begin{subfigure}[b]{0.55\textwidth}
1434 \begin{axis}[title={Random ISO, $\delta = 35$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1435 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1436 west},scaled x ticks = false,x tick label style={/pgf/number
1437 format/1000 sep = \space}]
1438 %\addplot+[only marks] table {proteinsOrig.txt};
1439 \addplot table {randGraph/iso/vf2pIso35_1.txt};
1440 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1441 {randGraph/iso/vf2ppIso35_1.txt};
1446 \begin{subfigure}[b]{0.55\textwidth}
1449 \begin{axis}[title={Random ISO, $\delta = 45$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1450 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1451 west},scaled x ticks = false,x tick label style={/pgf/number
1452 format/1000 sep = \space}]
1453 %\addplot+[only marks] table {proteinsOrig.txt};
1454 \addplot table {randGraph/iso/vf2pIso45_1.txt};
1455 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1456 {randGraph/iso/vf2ppIso45_1.txt};
1461 \begin{subfigure}[b]{0.55\textwidth}
1463 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1464 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1465 west},scaled x ticks = false,x tick label style={/pgf/number
1466 format/1000 sep = \thinspace}]
1467 %\addplot+[only marks] table {proteinsOrig.txt};
1468 \addplot table {randGraph/iso/vf2pIso100_1.txt};
1469 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1470 {randGraph/iso/vf2ppIso100_1.txt};
1475 \caption{IND on graphs having an average degree of
1476 5.}\label{fig:randISO}
1488 Considering the graph isomorphism problem, VF2++ consistently
1489 outperforms its rival especially on sparse graphs. The reason for the
1490 slightly super linear behaviour of VF2++ on denser graphs is the
1491 larger number of nodes in the BFS tree constructed in
1492 Algorithm~\ref{alg:VF2PPPseu}.
1494 \subsubsection{Induced subgraph isomorphism}
1495 This section provides a comparison of VF2++ and VF2 Plus in the case
1496 of induced subgraph isomorphism. In addition to the size of the large
1497 graph, that of the small graph dramatically influences the hardness of
1498 a given problem too, so the overall picture is provided by examining
1499 small graphs of various size.
1501 For each chart, a number $0<\rho< 1$ has been fixed and the following
1502 has been executed 150 times. Generating a large graph $G_{large}$,
1503 choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes,
1504 and for all the 10 subgraphs find a mapping by using both the graph
1505 matching algorithms. The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
1506 0.3, 0.6, 0.8, 0.95$ cases have been examined (see
1507 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
1508 \ref{fig:randIND35}, and for each $\delta$, a cumulative chart is
1509 given as well, which excludes $\rho = 0.05$ and $0.1$ for the sake of
1510 perspicuity (see Figure~\ref{fig:randIND5Sum}, \ref{fig:randIND10Sum}
1511 and \ref{fig:randIND35Sum}).
1520 \begin{subfigure}[b]{0.55\textwidth}
1523 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1524 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1525 west},scaled x ticks = false,x tick label style={/pgf/number
1526 format/1000 sep = \space}]
1527 %\addplot+[only marks] table {proteinsOrig.txt};
1528 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
1529 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1530 {randGraph/ind/vf2ppInd5_0.05.txt};
1535 \begin{subfigure}[b]{0.55\textwidth}
1538 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1539 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1540 west},scaled x ticks = false,x tick label style={/pgf/number
1541 format/1000 sep = \space}]
1542 %\addplot+[only marks] table {proteinsOrig.txt};
1543 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
1544 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1545 {randGraph/ind/vf2ppInd5_0.1.txt};
1551 \begin{subfigure}[b]{0.55\textwidth}
1554 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1555 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1556 west},scaled x ticks = false,x tick label style={/pgf/number
1557 format/1000 sep = \space}]
1558 %\addplot+[only marks] table {proteinsOrig.txt};
1559 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
1560 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1561 {randGraph/ind/vf2ppInd5_0.3.txt};
1566 \begin{subfigure}[b]{0.55\textwidth}
1569 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1570 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1571 west},scaled x ticks = false,x tick label style={/pgf/number
1572 format/1000 sep = \space}]
1573 %\addplot+[only marks] table {proteinsOrig.txt};
1574 \addplot table {randGraph/ind/vf2pInd5_0.6.txt};
1575 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1576 {randGraph/ind/vf2ppInd5_0.6.txt};
1581 \begin{subfigure}[b]{0.55\textwidth}
1584 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1585 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1586 west},scaled x ticks = false,x tick label style={/pgf/number
1587 format/1000 sep = \space}]
1588 %\addplot+[only marks] table {proteinsOrig.txt};
1589 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
1590 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1591 {randGraph/ind/vf2ppInd5_0.8.txt};
1596 \begin{subfigure}[b]{0.55\textwidth}
1598 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1599 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1600 west},scaled x ticks = false,x tick label style={/pgf/number
1601 format/1000 sep = \thinspace}]
1602 %\addplot+[only marks] table {proteinsOrig.txt};
1603 \addplot table {randGraph/ind/vf2pInd5_0.95.txt};
1604 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1605 {randGraph/ind/vf2ppInd5_0.95.txt};
1610 \caption{IND on graphs having an average degree of
1611 5.}\label{fig:randIND5}
1618 \begin{axis}[title={Rand IND Summary, $\delta = 5$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
1619 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
1620 west},scaled x ticks = false,x tick label style={/pgf/number
1621 format/1000 sep = \thinspace}]
1622 %\addplot+[only marks] table {proteinsOrig.txt};
1623 \addplot[mark=*,mark size=1.5pt,color=blue] table
1624 {randGraph/ind/vf2pInd5_0.3.txt}; \addplot[mark=triangle*,mark
1625 size=1.8pt,color=red] table
1626 {randGraph/ind/vf2ppInd5_0.3.txt}; \addplot[mark=*,mark
1627 size=1.5pt,color=blue] table
1628 {randGraph/ind/vf2pInd5_0.6.txt}; \addplot[mark=triangle*,mark
1629 size=1.8pt,color=red] table
1630 {randGraph/ind/vf2ppInd5_0.6.txt}; \addplot[mark=*,mark
1631 size=1.5pt,color=blue] table
1632 {randGraph/ind/vf2pInd5_0.8.txt}; \addplot[mark=triangle*,mark
1633 size=1.8pt,color=red] table
1634 {randGraph/ind/vf2ppInd5_0.8.txt}; \addplot[mark=*,mark
1635 size=1.5pt,color=blue] table
1636 {randGraph/ind/vf2pInd5_0.95.txt};
1637 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1638 {randGraph/ind/vf2ppInd5_0.95.txt};
1643 \caption{Cummulative chart for $\delta=5$.}\label{fig:randIND5Sum}
1651 \begin{subfigure}[b]{0.55\textwidth}
1655 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1656 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1657 west},scaled x ticks = false,x tick label style={/pgf/number
1658 format/1000 sep = \space}]
1659 %\addplot+[only marks] table {proteinsOrig.txt};
1660 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
1661 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1662 {randGraph/ind/vf2ppInd10_0.05.txt};
1667 \begin{subfigure}[b]{0.55\textwidth}
1671 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1672 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1673 west},scaled x ticks = false,x tick label style={/pgf/number
1674 format/1000 sep = \space}]
1675 %\addplot+[only marks] table {proteinsOrig.txt};
1676 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
1677 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1678 {randGraph/ind/vf2ppInd10_0.1.txt};
1684 \begin{subfigure}[b]{0.55\textwidth}
1687 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1688 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1689 west},scaled x ticks = false,x tick label style={/pgf/number
1690 format/1000 sep = \space}]
1691 %\addplot+[only marks] table {proteinsOrig.txt};
1692 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
1693 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1694 {randGraph/ind/vf2ppInd10_0.3.txt};
1699 \begin{subfigure}[b]{0.55\textwidth}
1702 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1703 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1704 west},scaled x ticks = false,x tick label style={/pgf/number
1705 format/1000 sep = \space}]
1706 %\addplot+[only marks] table {proteinsOrig.txt};
1707 \addplot table {randGraph/ind/vf2pInd10_0.6.txt};
1708 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1709 {randGraph/ind/vf2ppInd10_0.6.txt};
1715 \begin{subfigure}[b]{0.55\textwidth}
1717 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1718 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1719 west},scaled x ticks = false,x tick label style={/pgf/number
1720 format/1000 sep = \space}]
1721 %\addplot+[only marks] table {proteinsOrig.txt};
1722 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
1723 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1724 {randGraph/ind/vf2ppInd10_0.8.txt};
1728 \begin{subfigure}[b]{0.55\textwidth}
1730 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1731 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1732 west},scaled x ticks = false,x tick label style={/pgf/number
1733 format/1000 sep = \thinspace}]
1734 %\addplot+[only marks] table {proteinsOrig.txt};
1735 \addplot table {randGraph/ind/vf2pInd10_0.95.txt};
1736 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1737 {randGraph/ind/vf2ppInd10_0.95.txt};
1742 \caption{IND on graphs having an average degree of
1743 10.}\label{fig:randIND10}
1750 \begin{axis}[title={Rand IND Summary, $\delta = 10$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
1751 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
1752 west},scaled x ticks = false,x tick label style={/pgf/number
1753 format/1000 sep = \thinspace}]
1754 %\addplot+[only marks] table {proteinsOrig.txt};
1755 \addplot[mark=*,mark size=1.5pt,color=blue] table
1756 {randGraph/ind/vf2pInd10_0.3.txt};
1757 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1758 {randGraph/ind/vf2ppInd10_0.3.txt};
1759 \addplot[mark=*,mark size=1.5pt,color=blue] table
1760 {randGraph/ind/vf2pInd10_0.6.txt};
1761 \addplot[mark=triangle*,mark
1762 size=1.8pt,color=red] table
1763 {randGraph/ind/vf2ppInd10_0.6.txt};
1764 \addplot[mark=*,mark
1765 size=1.5pt,color=blue] table
1766 {randGraph/ind/vf2pInd10_0.8.txt};
1767 \addplot[mark=triangle*,mark
1768 size=1.8pt,color=red] table
1769 {randGraph/ind/vf2ppInd10_0.8.txt};
1770 \addplot[mark=*,mark
1771 size=1.5pt,color=blue]
1773 {randGraph/ind/vf2pInd10_0.95.txt};
1774 \addplot[mark=triangle*,mark
1775 size=1.8pt,color=red]
1777 {randGraph/ind/vf2ppInd10_0.95.txt};
1782 \caption{Cummulative chart for $\delta=10$.}\label{fig:randIND10Sum}
1790 \begin{subfigure}[b]{0.55\textwidth}
1793 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1794 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1795 west},scaled x ticks = false,x tick label style={/pgf/number
1796 format/1000 sep = \space}]
1797 %\addplot+[only marks] table {proteinsOrig.txt};
1798 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
1799 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1800 {randGraph/ind/vf2ppInd35_0.05.txt};
1805 \begin{subfigure}[b]{0.55\textwidth}
1808 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1809 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1810 west},scaled x ticks = false,x tick label style={/pgf/number
1811 format/1000 sep = \space}]
1812 %\addplot+[only marks] table {proteinsOrig.txt};
1813 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
1814 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1815 {randGraph/ind/vf2ppInd35_0.1.txt};
1821 \begin{subfigure}[b]{0.55\textwidth}
1824 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1825 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1826 west},scaled x ticks = false,x tick label style={/pgf/number
1827 format/1000 sep = \space}]
1828 %\addplot+[only marks] table {proteinsOrig.txt};
1829 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
1830 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1831 {randGraph/ind/vf2ppInd35_0.3.txt};
1836 \begin{subfigure}[b]{0.55\textwidth}
1839 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1840 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1841 west},scaled x ticks = false,x tick label style={/pgf/number
1842 format/1000 sep = \space}]
1843 %\addplot+[only marks] table {proteinsOrig.txt};
1844 \addplot table {randGraph/ind/vf2pInd35_0.6.txt};
1845 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1846 {randGraph/ind/vf2ppInd35_0.6.txt};
1852 \begin{subfigure}[b]{0.55\textwidth}
1854 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1855 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1856 west},scaled x ticks = false,x tick label style={/pgf/number
1857 format/1000 sep = \space}]
1858 %\addplot+[only marks] table {proteinsOrig.txt};
1859 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
1860 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1861 {randGraph/ind/vf2ppInd35_0.8.txt};
1865 \begin{subfigure}[b]{0.55\textwidth}
1867 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
1868 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
1869 west},scaled x ticks = false,x tick label style={/pgf/number
1870 format/1000 sep = \thinspace}]
1871 %\addplot+[only marks] table {proteinsOrig.txt};
1872 \addplot table {randGraph/ind/vf2pInd35_0.95.txt};
1873 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1874 {randGraph/ind/vf2ppInd35_0.95.txt};
1879 \caption{IND on graphs having an average degree of
1880 35.}\label{fig:randIND35}
1887 \begin{axis}[title={Rand IND Summary, $\delta = 35$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
1888 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
1889 west},scaled x ticks = false,x tick label style={/pgf/number
1890 format/1000 sep = \thinspace}]
1891 %\addplot+[only marks] table {proteinsOrig.txt};
1892 \addplot[mark=*,mark size=1.5pt,color=blue] table
1893 {randGraph/ind/vf2pInd35_0.3.txt};
1894 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
1895 {randGraph/ind/vf2ppInd35_0.3.txt};
1896 \addplot[mark=*,mark size=1.5pt,color=blue] table
1897 {randGraph/ind/vf2pInd35_0.6.txt};
1898 \addplot[mark=triangle*,mark
1899 size=1.8pt,color=red] table
1900 {randGraph/ind/vf2ppInd35_0.6.txt};
1901 \addplot[mark=*,mark
1902 size=1.5pt,color=blue] table
1903 {randGraph/ind/vf2pInd35_0.8.txt};
1904 \addplot[mark=triangle*,mark
1905 size=1.8pt,color=red] table
1906 {randGraph/ind/vf2ppInd35_0.8.txt};
1907 \addplot[mark=*,mark
1908 size=1.5pt,color=blue]
1910 {randGraph/ind/vf2pInd35_0.95.txt};
1911 \addplot[mark=triangle*,mark
1912 size=1.8pt,color=red]
1914 {randGraph/ind/vf2ppInd35_0.95.txt};
1919 \caption{Cummulative chart for $\delta=35$.}\label{fig:randIND35Sum}
1922 Based on these experiments, VF2++ is faster than VF2 Plus and able to
1923 handle really large graphs in milliseconds. Note that when $IND$ was
1924 considered and the small graphs had proportionally few nodes ($\rho =
1925 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
1926 orders (e.g. see the $\delta=10$ case on
1927 Figure~\ref{fig:randIND10}). If these examples had been excluded, the
1928 charts would have seemed to be similar to the other ones.
1929 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
1930 Plus slow slightly down, but remain practically usable even on graphs
1931 having 10 000 nodes.
1937 \section{Conclusion}
1938 In this paper, after providing a short summary of the recent
1939 algorithms, a new graph matching algorithm based on VF2, called VF2++,
1940 has been presented and analyzed from a practical viewpoint.
1942 Recognizing the importance of the node order and determining an
1943 efficient one, VF2++ is able to match graphs of thousands of nodes in
1944 near practically linear time including preprocessing. In addition to
1945 the proper order, VF2++ uses more efficient consistency and cutting
1946 rules which are easy to compute and make the algorithm able to prune
1947 most of the unfruitful branches without going astray.
1949 In order to show the efficiency of the new method, it has been
1950 compared to VF2 Plus, which is the best concurrent algorithm based on
1953 The experiments show that VF2++ consistently outperforms VF2 Plus on
1954 biological graphs. It seems to be asymptotically faster on protein and
1955 on contact map graphs in the case of induced subgraph isomorphism,
1956 while in the case of graph isomorphism, it has definitely better
1957 asymptotic behaviour on protein graphs.
1959 Regarding random sparse graphs, not only has VF2++ proved itself to be
1960 faster than VF2 Plus, but it has a practically linear behaviour both
1961 in the case of induced subgraph- and graph isomorphism, as well.
1965 %% The Appendices part is started with the command \appendix;
1966 %% appendix sections are then done as normal sections
1972 %% If you have bibdatabase file and want bibtex to generate the
1973 %% bibitems, please use
1975 \bibliographystyle{elsarticle-num} \bibliography{bibliography}
1977 %% else use the following coding to input the bibitems directly in the
1980 %% \begin{thebibliography}{00}
1982 %% %% \bibitem{label}
1983 %% %% Text of bibliographic item
1987 %% \end{thebibliography}
1992 %% End of file `elsarticle-template-num.tex'.