damecco.tex
changeset 19 b9a8744c5efc
parent 17 909bcb203f25
child 20 80d56dee41d9
equal deleted inserted replaced
16:f9608f3370b0 17:ba075f024c84
   121 many fields of application such as pattern analysis, computer vision
   121 many fields of application such as pattern analysis, computer vision
   122 questions and the analysis of chemical and biological systems has
   122 questions and the analysis of chemical and biological systems has
   123 fostered the design of various algorithms for handling special graph
   123 fostered the design of various algorithms for handling special graph
   124 structures.
   124 structures.
   125 
   125 
   126 The idea of using state space representation and checking some
       
   127 conditions in each state to prune the search tree has made the VF2
       
   128 algorithm one of the state of the art graph matching algorithms for
       
   129 more than a decade. Recently, biological questions of ever increasing
       
   130 importance have required more efficient, specialized algorithms.
       
   131 
       
   132 This paper presents VF2++, a new algorithm based on the original VF2,
   126 This paper presents VF2++, a new algorithm based on the original VF2,
   133 which runs significantly faster on most test cases and performs
   127 which runs significantly faster on most test cases and performs
   134 especially well on special graph classes stemming from biological
   128 especially well on special graph classes stemming from biological
   135 questions. VF2++ handles graphs of thousands of nodes in practically
   129 questions. VF2++ handles graphs of thousands of nodes in practically
   136 near linear time including preprocessing. Not only is it an improved
   130 near linear time including preprocessing. Not only is it an improved
   137 version of VF2, but in fact, it is by far the fastest existing
   131 version of VF2, but in fact, it is by far the fastest existing
   138 algorithm regarding biological graphs.
   132 algorithm especially on biological graphs.
   139 
   133 
   140 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking
   134 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking
   141 into account the structure and the node labeling of the graph, VF2++
   135 into account the structure and the node labeling of the graph, VF2++
   142 determines a state order in which most of the unfruitful branches of
   136 determines a state order in which most of the unfruitful branches of
   143 the search space can be pruned immediately. Secondly, introducing more
   137 the search space can be pruned immediately. Secondly, introducing more
   183 In the last decades, combinatorial structures, and especially graphs
   177 In the last decades, combinatorial structures, and especially graphs
   184 have been considered with ever increasing interest, and applied to the
   178 have been considered with ever increasing interest, and applied to the
   185 solution of several new and revised questions.  The expressiveness,
   179 solution of several new and revised questions.  The expressiveness,
   186 the simplicity and the studiedness of graphs make them practical for
   180 the simplicity and the studiedness of graphs make them practical for
   187 modelling and appear constantly in several seemingly independent
   181 modelling and appear constantly in several seemingly independent
   188 fields.  Bioinformatics and chemistry are amongst the most relevant
   182 fields, such as bioinformatics and chemistry.
   189 and most important fields.
       
   190 
   183 
   191 Complex biological systems arise from the interaction and cooperation
   184 Complex biological systems arise from the interaction and cooperation
   192 of plenty of molecular components. Getting acquainted with such
   185 of plenty of molecular components. Getting acquainted with such
   193 systems at the molecular level has primary importance, since
   186 systems at the molecular level is of primary importance, since
   194 protein-protein interaction, DNA-protein interaction, metabolic
   187 protein-protein interaction, DNA-protein interaction, metabolic
   195 interaction, transcription factor binding, neuronal networks, and
   188 interaction, transcription factor binding, neuronal networks, and
   196 hormone signaling networks can be understood only this way.
   189 hormone signaling networks can be understood this way.
   197 
   190 
   198 For instance, a molecular structure can be considered as a graph,
   191 Many chemical and biological structures can easily be modeled
   199 whose nodes correspond to atoms and whose edges to chemical bonds. The
   192 as graphs, for instance, a molecular structure can be
   200 secondary structure of a protein can also be represented as a graph,
   193 considered as a graph, whose nodes correspond to atoms and whose
   201 where nodes are associated with aminoacids and the edges with hydrogen
   194 edges to chemical bonds. The similarity and dissimilarity of
   202 bonds. The nodes are often whole molecular components and the edges
   195 objects corresponding to nodes are incorporated to the model
   203 represent some relationships among them.  The similarity and
   196 by \emph{node labels}. Understanding such networks basically
   204 dissimilarity of objects corresponding to nodes are incorporated to
   197 requires finding specific subgraphs, thus calls for efficient
   205 the model by \emph{node labels}.  Many other chemical and biological
   198 graph matching algorithms.
   206 structures can easily be modeled in a similar way. Understanding such
   199 
   207 networks basically requires finding specific subgraphs, which can not
   200 Other real-world fields related to some
   208 avoid the application of graph matching algorithms.
   201 variants of graph matching include pattern recognition
   209 
       
   210 Finally, let some of the other real-world fields related to some
       
   211 variants of graph matching be briefly mentioned: pattern recognition
       
   212 and machine vision \cite{HorstBunkeApplications}, symbol recognition
   202 and machine vision \cite{HorstBunkeApplications}, symbol recognition
   213 \cite{CordellaVentoSymbolRecognition}, face identification
   203 \cite{CordellaVentoSymbolRecognition}, face identification
   214 \cite{JianzhuangYongFaceIdentification}.  \\
   204 \cite{JianzhuangYongFaceIdentification}.  \\
   215 
   205 
   216 Subgraph and induced subgraph matching problems are known to be
   206 Subgraph and induced subgraph matching problems are known to be
   218 one of the few problems in NP neither known to be in P nor
   208 one of the few problems in NP neither known to be in P nor
   219 NP-Complete. Although polynomial time isomorphism algorithms are known
   209 NP-Complete. Although polynomial time isomorphism algorithms are known
   220 for various graph classes, like trees and planar
   210 for various graph classes, like trees and planar
   221 graphs\cite{PlanarGraphIso}, bounded valence
   211 graphs\cite{PlanarGraphIso}, bounded valence
   222 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
   212 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
   223 or permutation graphs\cite{PermGraphIso}.
   213 or permutation graphs\cite{PermGraphIso}, and recently, an FPT algorithm has been presented for the coloured hypergraph isomorphism problem in \cite{ColoredHiperGraphIso}.
   224 
   214 
   225 In the following, some algorithms based on other approaches are
   215 In the following, some algorithms based on other approaches are
   226 summarized, which do not need any restrictions on the graphs. However,
   216 summarized, which do not need any restrictions on the graphs. Even though,
   227 an overall polynomial behaviour is not expectable from such an
   217 an overall polynomial behaviour is not expectable from such an
   228 alternative, it may often have good performance, even on a graph class
   218 alternative, it may often have good practical performance, in fact,
   229 for which polynomial algorithm is known. Note that this summary
   219 it might be the best choice even on a graph class for which polynomial
   230 containing only exact matching algorithms is far not complete, neither
   220 algorithm is known.
   231 does it cover all the recent algorithms.
       
   232 
   221 
   233 The first practically usable approach was due to
   222 The first practically usable approach was due to
   234 Ullmann\cite{Ullmann} which is a commonly used depth-first
   223 \emph{Ullmann}\cite{Ullmann} which is a commonly used depth-first
   235 search based algorithm with a complex heuristic for reducing the
   224 search based algorithm with a complex heuristic for reducing the
   236 number of visited states. A major problem is its $\Theta(n^3)$ space
   225 number of visited states. A major problem is its $\Theta(n^3)$ space
   237 complexity, which makes it impractical in the case of big sparse
   226 complexity, which makes it impractical in the case of big sparse
   238 graphs.
   227 graphs.
   239 
   228 
   240 In a recent paper, Ullmann\cite{UllmannBit} presents an
   229 In a recent paper, Ullmann\cite{UllmannBit} presents an
   241 improved version of this algorithm based on a bit-vector solution for
   230 improved version of this algorithm based on a bit-vector solution for
   242 the binary Constraint Satisfaction Problem.
   231 the binary Constraint Satisfaction Problem.
   243 
   232 
   244 The Nauty algorithm\cite{Nauty} transforms the two graphs to
   233 The \emph{Nauty} algorithm\cite{Nauty} transforms the two graphs to
   245 a canonical form before starting to check for the isomorphism. It has
   234 a canonical form before starting to check for the isomorphism. It has
   246 been considered as one of the fastest graph isomorphism algorithms,
   235 been considered as one of the fastest graph isomorphism algorithms,
   247 although graph categories were shown in which it takes exponentially
   236 although graph categories were shown in which it takes exponentially
   248 many steps. This algorithm handles only the graph isomorphism problem.
   237 many steps. This algorithm handles only the graph isomorphism problem.
   249 
   238 
   251 strategy and formulates the matching as a Constraint Satisfaction
   240 strategy and formulates the matching as a Constraint Satisfaction
   252 Problem to prune the search tree. The constraints are that the mapping
   241 Problem to prune the search tree. The constraints are that the mapping
   253 has to be injective and edge-preserving, hence it is possible to
   242 has to be injective and edge-preserving, hence it is possible to
   254 handle new matching types as well.
   243 handle new matching types as well.
   255 
   244 
   256 The \textbf{RI} algorithm\cite{RI} and its variations are based on a
   245 The \emph{RI} algorithm\cite{RI} and its variations are based on a
   257 state space representation. After reordering the nodes of the graphs,
   246 state space representation. After reordering the nodes of the graphs,
   258 it uses some fast executable heuristic checks without using any
   247 it uses some fast executable heuristic checks without using any
   259 complex pruning rules. It seems to run really efficiently on graphs
   248 complex pruning rules. It seems to run really efficiently on graphs
   260 coming from biology, and won the International Contest on Pattern
   249 coming from biology, and won the International Contest on Pattern
   261 Search in Biological Databases\cite{Content}.
   250 Search in Biological Databases\cite{Content}.
   262 
   251 
   263 The currently most commonly used algorithm is the
   252 The currently most commonly used algorithm is the
   264 \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was
   253 \emph{VF2}\cite{VF2}, the improved version of \emph{VF}\cite{VF}, which was
   265 designed for solving pattern matching and computer vision problems,
   254 designed for solving pattern matching and computer vision problems,
   266 and has been one of the best overall algorithms for more than a
   255 and has been one of the best overall algorithms for more than a
   267 decade. Although, it can't be up to new specialized algorithms, it is
   256 decade. Although, it can't be up to new specialized algorithms, it is
   268 still widely used due to its simplicity and space efficiency. VF2 uses
   257 still widely used due to its simplicity and space efficiency. VF2 uses
   269 a state space representation and checks some conditions in each state
   258 a state space representation and checks some conditions in each state
   270 to prune the search tree.
   259 to prune the search tree.
   271 
   260 
   272 Our first graph matching algorithm was the first version of VF2 which
   261 Meanwhile, another variant called \emph{VF2 Plus}\cite{VF2Plus} has
   273 recognizes the significance of the node ordering, more opportunities
       
   274 to increase the cutting efficiency and reduce its computational
       
   275 complexity. This project was initiated and sponsored by QuantumBio
       
   276 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
       
   277 code --- has been published as a part of LEMON\cite{LEMON} open source
       
   278 graph library.
       
   279 
       
   280 This paper introduces \textbf{VF2++}, a new further improved algorithm
       
   281 for the graph and (induced)subgraph isomorphism problem, which uses
       
   282 efficient cutting rules and determines a node order in which VF2 runs
       
   283 significantly faster on practical inputs.
       
   284 
       
   285 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has
       
   286 been published. It is considered to be as efficient as the RI
   262 been published. It is considered to be as efficient as the RI
   287 algorithm and has a strictly better behavior on large graphs.  The
   263 algorithm and has a strictly better behavior on large graphs.  The
   288 main idea of VF2 Plus is to precompute a heuristic node order of the
   264 main idea of VF2 Plus is to precompute a heuristic node order of the
   289 small graph, in which the VF2 works more efficiently.
   265 small graph, in which the VF2 works more efficiently.
   290 
   266 
       
   267 This paper introduces \emph{VF2++}, a new further improved algorithm
       
   268 for the graph and (induced)subgraph isomorphism problem, which uses
       
   269 efficient cutting rules and determines a node order in which VF2 runs
       
   270 significantly faster on practical inputs.
       
   271 
       
   272 This project was initiated and sponsored by QuantumBio
       
   273 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
       
   274 code --- has been published as a part of LEMON\cite{LEMON} open source
       
   275 graph library.
       
   276 
   291 \section{Problem Statement}
   277 \section{Problem Statement}
   292 This section provides a detailed description of the problems to be
   278 This section provides a formal description of the problems to be
   293 solved.
   279 solved.
   294 \subsection{Definitions}
   280 \subsection{Definitions}
   295 
   281 
   296 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and
   282 Throughout the paper $G_{1}=(V_{1}, E_{1})$ and
   297 $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs.
   283 $G_{2}=(V_{2}, E_{2})$ denote two undirected graphs.
   298 \begin{definition}\label{sec:ismorphic}
   284 
   299 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M:
       
   300   V_{small} \longrightarrow V_{large}$ bijection, for which the
       
   301   following is true:
       
   302 \begin{center}
       
   303 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
       
   304   (M(u),M(v))\in{E_{large}}$
       
   305 \end{center}
       
   306 \end{definition}
       
   307 For the sake of simplicity in this paper subgraphs and induced
       
   308 subgraphs are defined in a more general way than usual:
       
   309 \begin{definition}
   285 \begin{definition}
   310 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I:
   286 $\mathcal{L}: (V_{1}\cup V_{2}) \longrightarrow K$ is a \textbf{node
   311   V_{small}\longrightarrow V_{large}$ injection, for which the
       
   312   following is true:
       
   313 \begin{center}
       
   314 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$
       
   315 \end{center}
       
   316 \end{definition}
       
   317 
       
   318 \begin{definition} 
       
   319 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists
       
   320   I: V_{small}\longrightarrow V_{large}$ injection, for which the
       
   321   following is true:
       
   322 \begin{center}
       
   323 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
       
   324   (I(u),I(v))\in E_{large}$
       
   325 \end{center}
       
   326 \end{definition}
       
   327 
       
   328 \begin{definition}
       
   329 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node
       
   330     label function}, where K is an arbitrary set. The elements in K
   287     label function}, where K is an arbitrary set. The elements in K
   331   are the \textbf{node labels}. Two nodes, u and v are said to be
   288   are the \textbf{node labels}. Two nodes, u and v are said to be
   332   \textbf{equivalent} if $lab(u)=lab(v)$.
   289   \textbf{equivalent} if $\mathcal{L}(u)=\mathcal{L}(v)$.
   333 \end{definition}
   290 \end{definition}
   334 
   291 
   335 When node labels are given, the matched nodes must have the same
   292 For the sake of simplicity, in this paper the graph, subgraph and induced subgraph isomorphisms are defined in a more general way.
   336 labels.  For example, the node labeled isomorphism is phrased by
   293 
       
   294 \begin{definition}\label{sec:ismorphic}
       
   295 $G_{1}$ and $G_{2}$ are \textbf{isomorphic} (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}:
       
   296   V_{1} \longrightarrow V_{2}$ bijection, for which the
       
   297   following is true:
       
   298 \begin{center}
       
   299 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and
       
   300 $\forall u,v\in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in{E_{2}}$
       
   301 \end{center}
       
   302 \end{definition}
       
   303 
   337 \begin{definition}
   304 \begin{definition}
   338 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label
   305 $G_{1}$ is a \textbf{subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}:
   339     function lab} if $\exists M: V_{small} \longrightarrow V_{large}$
   306   V_{1}\longrightarrow V_{2}$ injection, for which the
   340   bijection, for which the following is true:
   307   following is true:
   341 \begin{center}
   308 \begin{center}
   342 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
   309 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and
   343   (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} :
   310 $\mathbb{M}$
   344   lab(u)=lab(M(u)))$
   311 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Rightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$
   345 \end{center}
   312 \end{center}
   346 \end{definition}
   313 \end{definition}
   347 
   314 
   348 Note that he other two definitions can be extended in the same way.
   315 \begin{definition} 
       
   316 $G_{1}$ is an \textbf{induced subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists
       
   317   \mathfrak{m}: V_{1}\longrightarrow V_{2}$ injection, for which the
       
   318   following is true:
       
   319 \begin{center}
       
   320 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and
       
   321 
       
   322 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow
       
   323   (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$
       
   324 \end{center}
       
   325 \end{definition}
       
   326 
   349 
   327 
   350 \subsection{Common problems}\label{sec:CommProb}
   328 \subsection{Common problems}\label{sec:CommProb}
   351 
   329 
   352 The focus of this paper is on two extensively studied topics, the
   330 The focus of this paper is on two extensively studied topics, the
   353 subgraph isomorphism and its variations. However, the following
   331 subgraph isomorphism and its variations. However, the following
   354 problems also appear in many applications.
   332 problems also appear in many applications.
   355 
   333 
   356 The \textbf{subgraph matching problem} is the following: is
   334 The \textbf{subgraph matching problem} is the following: is
   357 $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node
   335 $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node
   358 label?
   336 label?
   359 
   337 
   360 The \textbf{induced subgraph matching problem} asks the same about the
   338 The \textbf{induced subgraph matching problem} asks the same about the
   361 existence of an induced subgraph.
   339 existence of an induced subgraph.
   362 
   340 
   380 for the sake of simplicity, only the undirected case will be
   358 for the sake of simplicity, only the undirected case will be
   381 discussed.
   359 discussed.
   382 
   360 
   383 
   361 
   384 \subsection{Common notations}
   362 \subsection{Common notations}
   385 \indent Assume $G_{small}$ is searched in $G_{large}$.  The following
   363 \indent Assume $G_{1}$ is searched in $G_{2}$.  The following
   386 definitions and notations will be used throughout the whole paper.
   364 definitions and notations will be used throughout the whole paper.
   387 \begin{definition}
   365 \begin{definition}
   388 A set $M\subseteq V_{small}\times V_{large}$ is called
   366 An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$.
   389 \textbf{mapping} if no node of $V_{small}\cup V_{large}$ appears
       
   390 in more than one pair in M.  That is, M uniquely associates some of
       
   391 the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice
       
   392 versa.
       
   393 \end{definition}
   367 \end{definition}
   394 
   368 
       
   369 \begin{notation}
       
   370 $\mathfrak{D}(f)$ and $\mathfrak{R}(f)$ denote the domain and the range of a function $f$, respectively.
       
   371 \end{notation}
       
   372 
   395 \begin{definition}
   373 \begin{definition}
   396 Mapping $M$ \textbf{covers} a node v if there exists a pair in M, which
   374 Mapping $\mathfrak{m}$ \textbf{covers} a node $u\in V_1\cup V_2$ if $u\in \mathfrak{D}(\mathfrak{m})\cup \mathfrak{R}(\mathfrak{m})$.
   397 contains v.
       
   398 \end{definition}
   375 \end{definition}
   399 
   376 
   400 \begin{definition}
   377 \begin{definition}
   401 A mapping $M$ is $\mathbf{whole\ mapping}$ if $M$ covers all the
   378 A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the
   402 nodes in $V_{small}$.
   379 nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$.
   403 \end{definition}
   380 \end{definition}
   404 
   381 
       
   382 \begin{definition}
       
   383 Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : \mathfrak{m}(w)=f(w)$ and $f(u)=v$ holds. Where $u\notin\mathfrak{D}(\mathfrak{m})$ and $v\notin\mathfrak{R}(\mathfrak{m})$, otherwise $extend(\mathfrak{m},(u,v))$ is undefined.
       
   384 \end{definition}
       
   385 
   405 \begin{notation}
   386 \begin{notation}
   406 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in
   387 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
   407 V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in
   388 which can be substituted by any of the $\mathbf{ISO}$, $\mathbf{SUB}$
   408 V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$.
   389 and $\mathbf{IND}$ problems.
   409 \end{notation}
   390 \end{notation}
   410 
   391 
   411 \begin{notation}
       
   412 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$ if such a node
       
   413 exist, otherwise $\mathbf{Pair(M,v)}$ is undefined, where $M$ is a mapping and $v\in V_{small}\cup V_{large}$.
       
   414 \end{notation}
       
   415 
       
   416 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique
       
   417 
       
   418 The definitions of the isomorphism types can be rephrased on the
       
   419 existence of a special whole mapping $M$, since it represents a
       
   420 bijection. For example
       
   421 \begin{center}
       
   422 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph
       
   423   isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v
       
   424   \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
       
   425   (Pair(M,u),Pair(M,v))\in E_{large}$.
       
   426 \end{center}
       
   427 
       
   428 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
       
   429 which can be substituted by any of $\mathbf{ISO}$, $\mathbf{SUB}$
       
   430 and $\mathbf{IND}$.
       
   431 
       
   432 \begin{definition}
   392 \begin{definition}
   433 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
   393 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
   434 \textbf{consistency function by } $\mathbf{PT}$ if the following
   394 \textbf{consistency function by } $\mathbf{PT}$ if the following
   435 holds. If there exists whole mapping $W$ of type $PT$ for which
   395 holds. If there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$.
   436 $M\subseteq W$, then $Cons_{PT}(M)$ is true.
       
   437 \end{definition}
   396 \end{definition}
   438 
   397 
   439 \begin{definition} 
   398 \begin{definition} 
   440 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
   399 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
   441 \textbf{cutting function by } $\mathbf{PT}$ if the following
   400 \textbf{cutting function by } $\mathbf{PT}$ if the following
   442 holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a
   401 holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$.
   443 whole mapping $W$ of type $PT$.
       
   444 \end{definition}
   402 \end{definition}
   445 
   403 
   446 \begin{definition}
   404 \begin{definition}
   447 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if
   405 $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if
   448   $Cons_{PT}(M)$ is true.
   406   $Cons_{PT}(\mathfrak{m})$ is true.
   449 \end{definition}
   407 \end{definition}
   450 
   408 
   451 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
   409 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
   452 \begin{notation}
   410 \begin{notation}
   453 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$, and
   411 Let $\mathbf{Cons_{PT}(p, \mathfrak{m})}:=Cons_{PT}(extend(\mathfrak{m},p))$, and
   454 $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where
   412 $\mathbf{Cut_{PT}(p, \mathfrak{m})}:=Cut_{PT}(extend(\mathfrak{m},p))$, where
   455 $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping.
   413 $p\in{V_{1}\backslash\mathfrak{D}(\mathfrak{m}) \!\times\!V_{2}\backslash\mathfrak{R}(\mathfrak{m})}$.
   456 \end{notation}
   414 \end{notation}
   457 
   415 
   458 $Cons_{PT}$ will be used to check the consistency of the already
   416 $Cons_{PT}$ will be used to check the consistency of the already
   459 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
   417 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
   460 no whole consistent mapping can contain the current mapping.
   418 no whole consistent mapping can contain the current mapping.
   461 
   419 
   462 \subsection{Overview of the algorithm}
   420 \subsection{Overview of the algorithm}
   463 VF2 uses a state space representation of mappings, $Cons_{PT}$ for
   421 VF2 uses a state space representation of mappings, $Cons_{PT}$ for
   464 excluding inconsistency with the problem type and $Cut_{PT}$ for
   422 excluding inconsistency with the problem type and $Cut_{PT}$ for
   465 pruning the search tree.  Each state $s$ of the matching process can
   423 pruning the search tree.
   466 be associated with a mapping $M(s)$.
       
   467 
   424 
   468 Algorithm~\ref{alg:VF2Pseu} is a high level description of
   425 Algorithm~\ref{alg:VF2Pseu} is a high level description of
   469 the VF2 matching algorithm.
   426 the VF2 matching algorithm. Each state of the matching process can
       
   427 be associated with a mapping $\mathfrak{m}$. The initial state
       
   428 is associated with a mapping $\mathfrak{m}$, for which
       
   429 $\mathfrak{D}(\mathfrak{m})=\emptyset$, i.e. it starts with an empty mapping.
   470 
   430 
   471 
   431 
   472 \begin{algorithm}
   432 \begin{algorithm}
   473 \algtext*{EndIf}%ne nyomtasson end if-et
   433 \algtext*{EndIf}%ne nyomtasson end if-et
   474 \algtext*{EndFor}%ne
   434 \algtext*{EndFor}%ne
   475 \algtext*{EndProcedure}%ne nyomtasson ..
   435 \algtext*{EndProcedure}%ne nyomtasson ..
   476 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
   436 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
   477 \begin{algorithmic}[1]
   437 \begin{algorithmic}[1]
   478 
   438 
   479 \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers
   439 \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$}
   480   $V_{small}$} \State Output($M(s)$) \Else
   440   \If{$\mathfrak{m}$ covers
   481   
   441     $V_{1}$} \State Output($\mathfrak{m}$)
   482   \State Compute the set $P(s)$ of the pairs candidate for inclusion
   442   \Else
   483   in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$
   443   \State Compute the set $P_\mathfrak{m}$ of the pairs candidate for inclusion
   484     $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state
   444   in $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$
   485   $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call}
   445     $\neg$Cut$_{PT}$($p,\mathfrak{m}$)}
   486   VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
   446     \State \textbf{call}
       
   447   VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
   487 \end{algorithmic}
   448 \end{algorithmic}
   488 \end{algorithm}
   449 \end{algorithm}
   489 
   450 
   490 
   451 
   491 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it
   452 For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of
   492 starts with an empty mapping.
   453 candidate node pairs for adding to the current mapping $\mathfrak{m}_s$.
   493 
   454 
   494 For each state $s$, the algorithm computes $P(s)$, the set of
   455 For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and
   495 candidate node pairs for adding to the current state $s$.
   456 $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and
   496 
   457 the latter is false, the whole process is recursively applied to
   497 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and
   458 $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it
   498 $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and
   459 can be proved that $\mathfrak{m}$ can not be extended to a whole mapping.
   499 $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup
       
   500 \{p\}$ is computed, and the whole process is recursively applied to
       
   501 $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$, or it
       
   502 can be proved that $s$ can not be extended to a whole mapping.
       
   503 
   460 
   504 In order to make sure of the correctness, see
   461 In order to make sure of the correctness, see
   505 \begin{claim}
   462 \begin{claim}
   506 Through consistent mappings, only consistent whole mappings can be
   463 Through consistent mappings, only consistent whole mappings can be
   507 reached, and all of the whole mappings are reachable through
   464 reached, and all the consistent whole mappings are reachable through
   508 consistent mappings.
   465 consistent mappings.
   509 \end{claim}
   466 \end{claim}
   510 
   467 
   511 Note that a state may be reached in exponentially many different ways, since the
   468 Note that a mapping may be reached in exponentially many different ways, since the
   512 order of insertions into $M$ does not influence the nascent mapping.
   469 order of extensions does not influence the nascent mapping.
   513 
   470 
   514 However, one may observe
   471 However, one may observe
   515 
   472 
   516 \begin{claim}
   473 \begin{claim}
   517 \label{claim:claimTotOrd}
   474 \label{claim:claimTotOrd}
   518 Let $\prec$ an arbitrary total ordering relation on $V_{small}$.  If
   475 Let $\prec$ be an arbitrary total ordering relation on $V_{1}$.  If
   519 the algorithm ignores each $p=(u,v) \in P(s)$, for which
   476 the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which
   520 \begin{center}
   477 \begin{center}
   521 $\exists (\tilde{u},\tilde{v})\in P(s): \tilde{u} \prec u$,
   478 $\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$,
   522 \end{center}
   479 \end{center}
   523 then no state can be reached more than once, and each state associated
   480 then no mapping can be reached more than once, and each whole mapping remains reachable.
   524 with a whole mapping remains reachable.
       
   525 \end{claim}
   481 \end{claim}
   526 
   482 
   527 Note that the cornerstone of the improvements to VF2 is a proper
   483 Note that the cornerstone of the improvements to VF2 is a proper
   528 choice of a total ordering.
   484 choice of a total ordering.
   529 
   485 
   530 \subsection{The candidate set P(s)}
   486 \subsection{The candidate set}
   531 \label{candidateComputingVF2}
   487 \label{candidateComputingVF2}
   532 Let $P(s)$ be the set of the candidate pairs for inclusion in $M(s)$.
   488 Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$.
   533 
   489 
   534 \begin{notation}
   490 \begin{notation}
   535 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by
   491 Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and
   536 $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}}
   492  $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$.
   537 \wedge \tilde{u}$ is covered by $M(s)\}$, and
       
   538 \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not
       
   539 covered by
       
   540 $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}}
       
   541 \wedge \tilde{v}$ is covered by $M(s)\}$
       
   542 \end{notation}
   493 \end{notation}
   543 
   494 
   544 The set $P(s)$ includes the pairs of uncovered neighbours of covered
   495 The set $P_\mathfrak{m}$ includes the pairs of uncovered neighbours of covered
   545 nodes, and if there is not such a node pair, all the pairs containing
   496 nodes, and if there is not such a node pair, all the pairs containing
   546 two uncovered nodes are added. Formally, let
   497 two uncovered nodes are added. Formally, let
   547 \[
   498 \[
   548  P(s)\!=\!
   499  P_\mathfrak{m}\!=\!
   549   \begin{cases} 
   500   \begin{cases} 
   550    T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if }
   501    T_{1}(\mathfrak{m})\times T_{2}(\mathfrak{m})&\hspace{-0.15cm}\text{if }
   551    T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq
   502    T_{1}(\mathfrak{m})\!\neq\!\emptyset\ \text{and }T_{2}(\mathfrak{m})\!\neq
   552    \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s))
   503    \emptyset,\\ (V_{1}\!\setminus\!\mathfrak{D}(\mathfrak{m}))\!\times\!(V_{2}\!\setminus\!\mathfrak{R}(\mathfrak{m}))
   553    &\hspace{-0.15cm}otherwise.
   504    &\hspace{-0.15cm}\text{otherwise}.
   554   \end{cases}
   505   \end{cases}
   555 \]
   506 \]
   556 
   507 
   557 \subsection{Consistency}
   508 \subsection{Consistency}
   558 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is
   509 Suppose $p=(u,v)$, where $u\in V_{1}$ and $v\in V_{2}$, $\mathfrak{m}$ is a consistent mapping by
   559 a state of the matching procedure, $M(s)$ is consistent mapping by
   510 $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether
   560 $PT$ and $lab(u)=lab(v)$.  $Cons_{PT}(p,M(s))$ checks whether
   511 including pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$.
   561 including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$.
       
   562 
   512 
   563 For example, the consistency function of induced subgraph isomorphism is as follows.
   513 For example, the consistency function of induced subgraph isomorphism is as follows.
   564 \begin{notation}
   514 \begin{notation}
   565 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} :
   515 Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} :
   566 (u,\tilde{u})\in E_{small}\}$, and $\mathbf{\Gamma_{large}
   516 (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2}
   567   (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$, where $u\in V_{small}$ and $v\in V_{large}$.
   517   (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$.
   568 \end{notation}
   518 \end{notation}
   569 
   519 
   570 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow
   520 $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ $\Leftrightarrow
   571 (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small}
   521 (\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}
   572 \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$. The
   522 \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. The
   573 following formulation gives an efficient way of calculating
   523 following formulation gives an efficient way of calculating
   574 $Cons_{IND}$.
   524 $Cons_{IND}$.
   575 \begin{claim}
   525 \begin{claim}
   576 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v)
   526 $Cons_{IND}((u,v),\mathfrak{m}):=\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge
   577   \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge
   527   (\forall \tilde{u}\in \Gamma_{1}(u)
   578   (\forall \tilde{u}\in \Gamma_{small}(u)
   528   \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is a
   579   \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
       
   580   consistency function in the case of $IND$.
   529   consistency function in the case of $IND$.
   581 \end{claim}
   530 \end{claim}
   582 
   531 
   583 \subsection{Cutting rules}
   532 \subsection{Cutting rules}
   584 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently
   533 $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently
   585 verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can
   534 verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can
   586 be true only if it is impossible to extended $M(s)\cup \{p\}$ to a
   535 be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a
   587 whole mapping.
   536 whole mapping.
   588 
   537 
   589 As an example, the cutting function of induced subgraph isomorphism is presented.
   538 As an example, the cutting function of induced subgraph isomorphism is presented.
   590 \begin{notation}
   539 \begin{notation}
   591 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash
   540 Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash
   592 M_{small}(s))\backslash T_{small}(s)$, and
   541 \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and
   593 \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash
   542 \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash
   594 M_{large}(s))\backslash T_{large}(s)$.
   543 \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$.
   595 \end{notation}
   544 \end{notation}
   596 
   545 
   597 \begin{claim}
   546 \begin{claim}
   598 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
   547 $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| <
   599   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
   548   |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap
   600   \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap
   549   \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap
   601   \tilde{T}_{small}(s)|$ is a cutting function by $IND$.
   550   \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function by $IND$.
   602 \end{claim}
   551 \end{claim}
   603 
       
   604 
   552 
   605 \section{The VF2++ Algorithm}
   553 \section{The VF2++ Algorithm}
   606 Although any total ordering relation makes the search space of VF2 a
   554 Although any total ordering relation makes the search space of VF2 a
   607 tree, its choice turns out to dramatically influence the number of
   555 tree, its choice turns out to dramatically influence the number of
   608 visited states. The goal is to determine an efficient one as quickly
   556 visited states. The goal is to determine an efficient one as quickly
   623 
   571 
   624 Note that a weaker version of the cutting rules and the more efficient
   572 Note that a weaker version of the cutting rules and the more efficient
   625 candidate set calculating were described in \cite{VF2Plus}, too.
   573 candidate set calculating were described in \cite{VF2Plus}, too.
   626 
   574 
   627 It should be noted that all the methods described in this section are
   575 It should be noted that all the methods described in this section are
   628 extendable to handle directed graphs and edge labels as well.
   576 extendable to handle directed graphs and edge labels as well.\newline
   629 
   577 
   630 The basic ideas and the detailed description of VF2++ are provided in
   578 The basic ideas and the detailed description of VF2++ are provided in
   631 the following.
   579 the following.
   632 
   580 
   633 \subsection{Preparations}
       
   634 \begin{claim}
       
   635 \label{claim:claimCoverFromLeft}
       
   636 The total ordering relation uniquely determines a node order, in which
       
   637 the nodes of $V_{small}$ will be covered by VF2. From the point of
       
   638 view of the matching procedure, this means, that always the same node
       
   639 of $G_{small}$ will be covered on the d-th level.
       
   640 \end{claim}
       
   641 
       
   642 \begin{definition}
       
   643 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
       
   644 $V_{small}$ is \textbf{matching order} if exists $\prec$ total
       
   645 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
       
   646 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$.
       
   647 \end{definition}
       
   648 
       
   649 \begin{claim}\label{claim:MOclaim}
       
   650 A total ordering is matching order iff the nodes of every component
       
   651 form an interval in the node sequence, and every node connects to a
       
   652 previous node in its component except the first node of each component.
       
   653 \end{claim}
       
   654 
       
   655 To summing up, a total ordering always uniquely determines a matching
       
   656 order, and every matching order can be determined by a total ordering,
       
   657 however, more than one different total orderings may determine the
       
   658 same matching order.
       
   659 \subsection{Idea behind the algorithm}
       
   660 The goal is to find a matching order in which the algorithm is able to
   581 The goal is to find a matching order in which the algorithm is able to
   661 recognize inconsistency or prune the infeasible branches on the
   582 recognize inconsistency or prune the infeasible branches on the
   662 highest levels and goes deep only if it is needed.
   583 highest levels and goes deep only if it is needed.
   663 
   584 
   664 \begin{notation}
   585 \begin{notation}
   665 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the
   586 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{1}(u)\cap H\}|$, that is the
   666 number of neighbours of u which are in H, where $u\in V_{small} $ and
   587 number of neighbours of u which are in H, where $u\in V_{1} $ and
   667 $H\subseteq V_{small}$.
   588 $H\subseteq V_{1}$.
   668 \end{notation}
   589 \end{notation}
   669 
   590 
   670 The principal question is the following. Suppose a state $s$ is
   591 The principal question is the following. Suppose a mapping $\mathfrak{m}$ is
   671 given. For which node of $T_{small}(s)$ is the hardest to find a
   592 given. For which node of $T_{1}(\mathfrak{m})$ is the hardest to find a
   672 consistent pair in $G_{large}$? The more covered neighbours a node in
   593 consistent pair in $G_{2}$? The more covered neighbours a node in
   673 $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has
   594 $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has
   674 ---, the more rarely satisfiable consistency constraints for its pair
   595 ---, the more rarely satisfiable consistency constraints for its pair
   675 are given.
   596 are given.
   676 
   597 
   677 In biology, most of the graphs are sparse, thus several nodes in
   598 In biology, most of the graphs are sparse, thus several nodes in
   678 $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes
   599 $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes
   679 reasonable to define a secondary and a tertiary order between them.
   600 reasonable to define a secondary and a tertiary order between them.
   680 The observation above proves itself to be as determining, that the
   601 The observation above proves itself to be as determining, that the
   681 secondary ordering prefers nodes with the most uncovered neighbours
   602 secondary ordering prefers nodes with the most uncovered neighbours
   682 among which have the same $Conn_{M_{small}(s)}$ to increase
   603 among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase
   683 $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible.  The
   604 $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes so much, as possible.  The
   684 tertiary ordering prefers nodes having the rarest uncovered labels.
   605 tertiary ordering prefers nodes having the rarest uncovered labels.
   685 
   606 
   686 Note that the secondary ordering is the same as the ordering by $deg$,
   607 Note that the secondary ordering is the same as the ordering by $deg$,
   687 which is a static data in front of the above used.
   608 which is a static data in front of the above used.
   688 
   609 
   689 These rules can easily result in a matching order which contains the
   610 These rules can easily result in a matching order which contains the
   690 nodes of a long path successively, whose nodes may have low $Conn$ and
   611 nodes of a long path successively, whose nodes may have low $Conn$ and
   691 is easily matchable into $G_{large}$. To avoid that, a BFS order is
   612 is easily matchable into $G_{2}$. To avoid that, a BFS order is
   692 used, which provides the shortest possible paths.
   613 used, which provides the shortest possible paths.
   693 \newline
   614 \newline
   694 
   615 
   695 In the following, some examples on which the VF2 may be slow are
   616 In the following, some examples on which the VF2 may be slow are
   696 described, although they are easily solvable by using a proper
   617 described, although they are easily solvable by using a proper
   697 matching order.
   618 matching order.
   698 
   619 
   699 \begin{example}
   620 \begin{example}
   700 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways
   621 Suppose $G_{1}$ can be mapped into $G_{2}$ in many ways
   701 without node labels. Let $u\in V_{small}$ and $v\in V_{large}$.
   622 without node labels. Let $u\in V_{1}$ and $v\in V_{2}$.
   702 \newline
   623 \newline
   703 $lab(u):=black$
   624 $\mathcal{L}(u):=black$
   704 \newline
   625 \newline
   705 $lab(v):=black$
   626 $\mathcal{L}(v):=black$
   706 \newline
   627 \newline
   707 $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash
   628 $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in (V_{1}\backslash
   708 \{u\})$
   629 \{u\})$
   709 \newline
   630 \newline
   710 $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash
   631 $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in (V_{2}\backslash
   711 \{v\})$
   632 \{v\})$
   712 \newline
   633 \newline
   713 
   634 
   714 Now, any mapping by the node label $lab$ must contain $(u,v)$, since
   635 Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since
   715 $u$ is black and no node in $V_{large}$ has a black label except
   636 $u$ is black and no node in $V_{2}$ has a black label except
   716 $v$. If unfortunately $u$ were the last node which will get covered,
   637 $v$. If unfortunately $u$ were the last node which will get covered,
   717 VF2 would check only in the last steps, whether $u$ can be matched to
   638 VF2 would check only in the last steps, whether $u$ can be matched to
   718 $v$.
   639 $v$.
   719 \newline
   640 \newline
   720 However, had $u$ been the first matched node, u would have been
   641 However, had $u$ been the first matched node, u would have been
   721 matched immediately to v, so all the mappings would have been
   642 matched immediately to v, so all the mappings would have been
   722 precluded in which node labels can not correspond.
   643 precluded in which node labels can not correspond.
   723 \end{example}
   644 \end{example}
   724 
   645 
   725 \begin{example}
   646 \begin{example}
   726 Suppose there is no node label given, $G_{small}$ is a small graph and
   647 Suppose there is no node label given, $G_{1}$ is a small graph and
   727 can not be mapped into $G_{large}$ and $u\in V_{small}$.
   648 can not be mapped into $G_{2}$ and $u\in V_{1}$.
   728 \newline
   649 \newline
   729 Let $G'_{small}:=(V_{small}\cup
   650 Let $G'_{1}:=(V_{1}\cup
   730 \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup
   651 \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup
   731 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
   652 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
   732 $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint
   653 $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint
   733 from $G_{small}$ and one of its starting points is connected to $u\in
   654 from $G_{1}$ and one of its starting points is connected to $u\in
   734 V_{small}\}$.
   655 V_{1}\}$.
   735 \newline
   656 \newline
   736 Is there a subgraph of $G_{large}$, which is isomorph with
   657 Is there a subgraph of $G_{2}$, which is isomorph with
   737 $G'_{small}$?
   658 $G'_{1}$?
   738 \newline
   659 \newline
   739 If unfortunately the nodes of the path were the first $k$ nodes in the
   660 If unfortunately the nodes of the path were the first $k$ nodes in the
   740 matching order, the algorithm would iterate through all the possible k
   661 matching order, the algorithm would iterate through all the possible k
   741 long paths in $G_{large}$, and it would recognize that no path can be
   662 long paths in $G_{2}$, and it would recognize that no path can be
   742 extended to $G'_{small}$.
   663 extended to $G'_{1}$.
   743 \newline
   664 \newline
   744 However, had it started by the matching of $G_{small}$, it would not
   665 However, had it started by the matching of $G_{1}$, it would not
   745 have matched any nodes of the path.
   666 have matched any nodes of the path.
   746 \end{example}
   667 \end{example}
   747 
   668 
   748 These examples may look artificial, but the same problems also appear
   669 These examples may look artificial, but the same problems also appear
   749 in real-world instances, even though in a less obvious way.
   670 in real-world instances, even though in a less obvious way.
   750 
   671 
       
   672 \subsection{Preparations}
       
   673 \begin{claim}
       
   674 \label{claim:claimCoverFromLeft}
       
   675 The total ordering relation uniquely determines a node order, in which
       
   676 the nodes of $V_{1}$ will be covered by VF2. From the point of
       
   677 view of the matching procedure, this means, that always the same node
       
   678 of $G_{1}$ will be covered on the d-th level.
       
   679 \end{claim}
       
   680 
       
   681 \begin{definition}
       
   682 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of
       
   683 $V_{1}$ is \textbf{matching order} if exists $\prec$ total
       
   684 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
       
   685 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$.
       
   686 \end{definition}
       
   687 
       
   688 \begin{claim}\label{claim:MOclaim}
       
   689 A total ordering is matching order iff the nodes of every component
       
   690 form an interval in the node sequence, and every node connects to a
       
   691 previous node in its component except the first node of each component.
       
   692 \end{claim}
       
   693 
       
   694 To summing up, a total ordering always uniquely determines a matching
       
   695 order, and every matching order can be determined by a total ordering,
       
   696 however, more than one different total orderings may determine the
       
   697 same matching order.
       
   698 
   751 \subsection{Total ordering}
   699 \subsection{Total ordering}
   752 Instead of the total ordering relation, the matching order will be
   700 The matching order will be searched directly.
   753 searched directly.
       
   754 \begin{notation}
   701 \begin{notation}
   755 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} :
   702 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} :
   756 l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ ,
   703 l=\mathcal{L}(v)\}|-|\{u\in V_{1}\backslash \mathcal{M} : l=\mathcal{L}(u)\}|$ ,
   757 where $l$ is a label and $\mathcal{M}\subseteq V_{small}$.
   704 where $l$ is a label and $\mathcal{M}\subseteq V_{1}$.
   758 \end{notation}
   705 \end{notation}
   759 
   706 
   760 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
   707 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
   761 \end{definition}
   708 \end{definition}
   762 
   709 
   766 \algtext*{EndWhile}
   713 \algtext*{EndWhile}
   767 \algtext*{EndFor}
   714 \algtext*{EndFor}
   768 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
   715 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
   769 \begin{algorithmic}[1]
   716 \begin{algorithmic}[1]
   770 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$
   717 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$
   771 \Comment{matching order} \While{$V_{small}\backslash \mathcal{M}
   718 \Comment{matching order} \While{$V_{1}\backslash \mathcal{M}
   772   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
   719   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
   773 min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash
   720 min$_{F_\mathcal{M}\circ \mathcal{L}}(V_{1}\backslash
   774 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
   721 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
   775 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
   722 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
   776 $d$-th level \State Process $V_d$ \Comment{See Algorithm
   723 $d$-th level \State Process $V_d$ \Comment{See Algorithm
   777   \ref{alg:VF2PPProcess1}} \EndFor
   724   \ref{alg:VF2PPProcess1}} \EndFor
   778 \EndWhile \EndProcedure
   725 \EndWhile \EndProcedure
   784 \algtext*{EndProcedure}%ne nyomtasson ..
   731 \algtext*{EndProcedure}%ne nyomtasson ..
   785 \algtext*{EndWhile}
   732 \algtext*{EndWhile}
   786 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
   733 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
   787 \begin{algorithmic}[1]
   734 \begin{algorithmic}[1]
   788 \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$}
   735 \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$}
   789 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg
   736 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ \mathcal{L}}($ arg max$_{deg}($arg
   790 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
   737 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
   791 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
   738 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
   792 $F_\mathcal{M}$ \EndWhile \EndProcedure
   739 $F_\mathcal{M}$ \EndWhile \EndProcedure
   793 \end{algorithmic}
   740 \end{algorithmic}
   794 \end{algorithm}
   741 \end{algorithm}
   795 
   742 
   796 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
   743 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
   797 matching order procedure of VF2++. It computes a BFS tree for each
   744 matching order procedure of VF2++. It computes a BFS tree for each
   798 component in ascending order of their rarest $lab$ and largest $deg$,
   745 component in ascending order of their rarest node labels and largest $deg$,
   799 whose root vertex is the component's minimal
   746 whose root vertex is the component's minimal
   800 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending
   747 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending
   801 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately
   748 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately
   802 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.
   749 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.
   803 
   750 
   805 provides a matching order.
   752 provides a matching order.
   806 
   753 
   807 
   754 
   808 \subsection{Cutting rules}
   755 \subsection{Cutting rules}
   809 \label{VF2PPCuttingRules}
   756 \label{VF2PPCuttingRules}
   810 This section presents the cutting rule of VF2++ in the case of IND, which is improved
   757 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels.
   811 by using extra information coming from the node labels. For other problem types, the rules can be formulated similarly.
       
   812 \begin{notation}
   758 \begin{notation}
   813 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l
   759 Let $\mathbf{\Gamma_{1}^{l}(u)}:=\{\tilde{u} : \mathcal{L}(\tilde{u})=l
   814 \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and
   760 \wedge \tilde{u}\in \Gamma_{1} (u)\}$ and
   815 $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge
   761 $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge
   816 \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in
   762 \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in
   817 V_{large}$ and $l$ is a label.
   763 V_{2}$ and $l$ is a label.
   818 \end{notation}
   764 \end{notation}
   819 
   765 
       
   766 \subsubsection{Induced subgraph isomorphism}
   820 \begin{claim}
   767 \begin{claim}
   821 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND.
   768 \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by IND.
   822 \end{claim}
   769 \end{claim}
   823 
   770 \subsubsection{Graph isomorphism}
   824 
   771 \begin{claim}
   825 \subsection{Implementation details}
   772 \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by ISO.
       
   773 \end{claim}
       
   774 
       
   775 \subsubsection{Subgraph isomorphism}
       
   776 \begin{claim}
       
   777 \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function by SUB.
       
   778 \end{claim}
       
   779 
       
   780 
       
   781 
       
   782 \section{Implementation details}
   826 This section provides a detailed summary of an efficient
   783 This section provides a detailed summary of an efficient
   827 implementation of VF2++.
   784 implementation of VF2++.
   828 \subsubsection{Storing a mapping}
   785 \subsubsection{Storing a mapping}
   829 After fixing an arbitrary node order ($u_0, u_1, ..,
   786 After fixing an arbitrary node order ($u_0, u_1, ..,
   830 u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store
   787 u_{|G_{1}|-1}$) of $G_{1}$, an array $M$ is usable to store
   831 the current mapping in the following way.
   788 the current mapping in the following way.
   832 \[
   789 \[
   833  M[i] =
   790  M[i] =
   834   \begin{cases} 
   791   \begin{cases} 
   835    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INVALID &
   792    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D &
   836    if\ no\ node\ has\ been\ mapped\ to\ u_i,
   793    if\ no\ node\ has\ been\ mapped\ to\ u_i,
   837   \end{cases}
   794   \end{cases}
   838 \]
   795 \]
   839 where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$
   796 where $i\in\{0,1, ..,|G_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$
   840 means "no node".
   797 means "no node".
   841 \subsubsection{Avoiding the recurrence}
   798 \subsubsection{Avoiding the recurrence}
   842 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
   799 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
   843 as a \textit{while loop}, which has a loop counter $depth$ denoting the
   800 as a \textit{while loop}, which has a loop counter $depth$ denoting the
   844 all-time depth of the recursion. Fixing a matching order, let $M$
   801 all-time depth of the recursion. Fixing a matching order, let $M$
   845 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},
   802 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},
   846 $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before
   803 $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before
   847 $depth$. $M[depth]$ changes
   804 $depth$. $M[depth]$ changes
   848 while the state is being processed, but the property is held before
   805 while the state is being processed, but the property is held before
   849 both stepping back to a predecessor state and exploring a successor
   806 both stepping back to a predecessor state and exploring a successor
   850 state.
   807 state.
   851 
   808 
   856 section for details.
   813 section for details.
   857 
   814 
   858 \subsubsection{Calculating the candidates for a node}
   815 \subsubsection{Calculating the candidates for a node}
   859 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
   816 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
   860 task is not to maintain the candidate set, but to generate the
   817 task is not to maintain the candidate set, but to generate the
   861 candidate nodes in $G_{large}$ for a given node $u\in V_{small}$.  In
   818 candidate nodes in $G_{2}$ for a given node $u\in V_{1}$.  In
   862 case of any of the three problem types and a mapping $M$ if a node $v\in
   819 case of any of the three problem types and a mapping $M$ if a node $v\in
   863 V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall
   820 V_{2}$ is a potential pair of $u\in V_{1}$, then $\forall
   864 u'\in V_{small} : (u,u')\in
   821 u'\in V_{1} : (u,u')\in
   865 E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in
   822 E_{1}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in
   866 E_{large}$. That is, each covered neighbour of $u$ has to be mapped to
   823 E_{2}$. That is, each covered neighbour of $u$ has to be mapped to
   867 a covered neighbour of $v$.
   824 a covered neighbour of $v$.
   868 
   825 
   869 Having said that, an algorithm running in $\Theta(deg)$ time is
   826 Having said that, an algorithm running in $\Theta(deg)$ time is
   870 describable if there exists a covered node in the component containing
   827 describable if there exists a covered node in the component containing
   871 $u$, and a linear one otherwise.
   828 $u$, and a linear one otherwise.
   877 
   834 
   878 For using lookup tables, the node labels are associated with the
   835 For using lookup tables, the node labels are associated with the
   879 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
   836 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
   880 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order
   837 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order
   881 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
   838 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
   882 in $V_{small}$ having label i, which is easy to compute in
   839 in $V_{1}$ having label $i$, which is easy to compute in
   883 $\Theta(|V_{small}|)$ steps.
   840 $\Theta(|V_{1}|)$ steps.
   884 
   841 
   885 Representing $\mathcal{M}\subseteq V_{small}$ as an array of
   842 Representing $\mathcal{M}\subseteq V_{1}$ as an array of
   886 size $|V_{small}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.
   843 size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.
   887 
   844 
   888 \subsubsection{Cutting rules}
   845 \subsubsection{Cutting rules}
   889 In Section~\ref{VF2PPCuttingRules}, the cutting rules were
   846 In Section~\ref{VF2PPCuttingRules}, the cutting rules were
   890 described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$
   847 described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$
   891 and $\tilde T_{large}$, which are dependent on the all-time mapping
   848 and $\tilde T_{2}$, which are dependent on the all-time mapping
   892 (i.e. on the all-time state). The aim is to check the labeled cutting
   849 (i.e. on the all-time state). The aim is to check the labeled cutting
   893 rules of VF2++ in $\Theta(deg)$ time.
   850 rules of VF2++ in $\Theta(deg)$ time.
   894 
   851 
   895 Firstly, suppose that these four sets are given in such a way, that
   852 Firstly, suppose that these four sets are given in such a way, that
   896 checking whether a node is in a certain set takes constant time,
   853 checking whether a node is in a certain set takes constant time,
   897 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
   854 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
   898 initially zero integer lookup table of size $|K|$. After incrementing
   855 initially zero integer lookup table of size $|K|$. After incrementing
   899 $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and
   856 $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and
   900 decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap
   857 decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap
   901 T_{large}(s)$, the first part of the cutting rules is checkable in
   858 T_{2}(s)$, the first part of the cutting rules is checkable in
   902 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
   859 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
   903 to zero takes $\Theta(deg)$ time again, which makes it possible to use
   860 to zero takes $\Theta(deg)$ time again, which makes it possible to use
   904 the same table through the whole algorithm. The second part of the
   861 the same table through the whole algorithm. The second part of the
   905 cutting rules can be verified using the same method with $\tilde
   862 cutting rules can be verified using the same method with $\tilde
   906 T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and
   863 T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and
   907 $T_{large}$. Thus, the overall complexity is $\Theta(deg)$.
   864 $T_{2}$. Thus, the overall complexity is $\Theta(deg)$.
   908 
   865 
   909 An other integer lookup table storing the number of covered neighbours
   866 Another integer lookup table storing the number of covered neighbours
   910 of each node in $G_{large}$ gives all the information about the sets
   867 of each node in $G_{2}$ gives all the information about the sets
   911 $T_{large}$ and $\tilde T_{large}$, which is maintainable in
   868 $T_{2}$ and $\tilde T_{2}$, which is maintainable in
   912 $\Theta(deg)$ time when a pair is added or substracted by incrementing
   869 $\Theta(deg)$ time when a pair is added or substracted by incrementing
   913 or decrementing the proper indices. A further improvement is that the
   870 or decrementing the proper indices. A further improvement is that the
   914 values of $L[lab(u')]$ in case of checking $u$ is dependent only on
   871 values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on
   915 $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an
   872 $u$, i.e. on the size of the mapping, so for each $u\in V_{1}$ an
   916 array of pairs (label, number of such labels) can be stored to skip
   873 array of pairs (label, number of such labels) can be stored to skip
   917 the maintaining operations. Note that these arrays are at most of size
   874 the maintaining operations. Note that these arrays are at most of size
   918 $deg$. Skipping this trick, the number of covered neighbours has to be
   875 $deg$.
   919 stored for each node of $G_{small}$ as well to get the sets
   876 
   920 $T_{small}$ and $\tilde T_{small}$.
   877 Using similar techniques, the consistency function can be evaluated in
   921 
       
   922 Using similar tricks, the consistency function can be evaluated in
       
   923 $\Theta(deg)$ steps, as well.
   878 $\Theta(deg)$ steps, as well.
   924 
   879 
   925 \section{The VF2 Plus Algorithm}
       
   926 The VF2 Plus algorithm is a recently improved version of VF2. It was
       
   927 compared with the state of the art algorithms in \cite{VF2Plus} and
       
   928 has proved itself to be competitive with RI, the best algorithm on
       
   929 biological graphs.  \\ A short summary of VF2 Plus follows, which uses
       
   930 the notation and the conventions of the original paper.
       
   931 
       
   932 \subsection{Ordering procedure}
       
   933 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$
       
   934 with the lowest probability to find a pair in $V_{small}$ and the
       
   935 highest number of connections with the nodes already sorted by the
       
   936 algorithm.
       
   937 
       
   938 \begin{definition}
       
   939 $(u,v)$ is a \textbf{feasible pair} if $lab(u)=lab(v)$ and
       
   940   $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$.
       
   941 \end{definition}
       
   942 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in
       
   943 $V_{large}$
       
   944 \newline
       
   945 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in
       
   946 $V_{large}$
       
   947 \newline
       
   948 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of
       
   949 already sorted nodes, $T$ is the set of nodes candidate to be
       
   950 selected, and $degreeM$ of a node is the number of its neighbours in
       
   951 $M$.
       
   952 \begin{algorithm}
       
   953 \algtext*{EndIf}%ne nyomtasson end if-et
       
   954 \algtext*{EndFor}%nenyomtasson ..  
       
   955 \algtext*{EndProcedure}%ne nyomtasson ..
       
   956 \algtext*{EndWhile}
       
   957 \caption{}\label{alg:VF2PlusPseu}
       
   958 \begin{algorithmic}[1]
       
   959 \Procedure{VF2 Plus order}{} \State Select the node with the lowest
       
   960 $P$.  \If {more nodes share the same $P$} \State select the one with
       
   961 maximum degree \EndIf \If {more nodes share the same $P$ and have the
       
   962   max degree} \State select the first \EndIf \State Put the selected
       
   963 node in the set $M$. \label{alg:putIn} \State Put all its unsorted
       
   964 neighbours in the set $T$.  \If {$M\neq V_{small}$} \State From set
       
   965 $T$ select the node with maximum $degreeM$.  \If {more nodes have
       
   966   maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf
       
   967 \If {more nodes have maximum $degreeM$ and $P$} \State Select the
       
   968 first.  \EndIf \State \textbf{goto \ref{alg:putIn}.}  \EndIf
       
   969 \EndProcedure
       
   970 \end{algorithmic}
       
   971 \end{algorithm}
       
   972 
       
   973 Using these notations, Algorithm~\ref{alg:VF2PlusPseu}
       
   974 provides the description of the sorting procedure.
       
   975 
       
   976 Note that $P(u)$ is not the exact probability of finding a consistent
       
   977 pair for $u$ by choosing a node of $V_{large}$ randomly, since
       
   978 $P_{lab}$ and $P_{deg}$ are not independent, though calculating the
       
   979 real probability would take quadratic time, which may be reduced by
       
   980 using fittingly lookup tables.
       
   981 
       
   982 \section{Experimental results}
   880 \section{Experimental results}
   983 This section compares the performance of VF2++ and VF2 Plus. Both
   881 This section compares the performance of VF2++ and VF2 Plus. According to
   984 algorithms have run faster with orders of magnitude than VF2, thus its
   882 our experience, both algorithms run faster than VF2 with orders of
   985 inclusion was not reasonable.
   883 magnitude, thus its inclusion was not reasonable.
       
   884 
       
   885 The algorithms were implemented in C++ using the open source
       
   886 LEMON graph and network optimization library\cite{LEMON}. The test were carried out on a linux based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM.
   986 \subsection{Biological graphs}
   887 \subsection{Biological graphs}
   987 The tests have been executed on a recent biological dataset created
   888 The tests have been executed on a recent biological dataset created
   988 for the International Contest on Pattern Search in Biological
   889 for the International Contest on Pattern Search in Biological
   989 Databases\cite{Content}, which has been constructed of molecule,
   890 Databases\cite{Content}, which has been constructed of molecule,
   990 protein and contact map graphs extracted from the Protein Data
   891 protein and contact map graphs extracted from the Protein Data
   994 and an average degree of less than 3. The protein dataset contains
   895 and an average degree of less than 3. The protein dataset contains
   995 graphs having 500-10 000 nodes and an average degree of 4, while the
   896 graphs having 500-10 000 nodes and an average degree of 4, while the
   996 contact map dataset contains graphs with 150-800 nodes and an average
   897 contact map dataset contains graphs with 150-800 nodes and an average
   997 degree of 20.  \\
   898 degree of 20.  \\
   998 
   899 
   999 In the following, the induced subgraph isomorphism and the graph
   900 In the following, both the induced subgraph isomorphism and the graph
  1000 isomorphism will be examined.
   901 isomorphism will be examined.
  1001 
   902 
  1002 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}.
   903 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}.
  1003 
   904 
  1004 In an other experiment, the nodes of each graph in the database had been
   905 In an other experiment, the nodes of each graph in the database had been
  1256 graph, that of the small graph dramatically influences the hardness of
  1157 graph, that of the small graph dramatically influences the hardness of
  1257 a given problem too, so the overall picture is provided by examining
  1158 a given problem too, so the overall picture is provided by examining
  1258 small graphs of various size.
  1159 small graphs of various size.
  1259 
  1160 
  1260 For each chart, a number $0<\rho< 1$ has been fixed, and the following
  1161 For each chart, a number $0<\rho< 1$ has been fixed, and the following
  1261 has been executed 150 times. Generating a large graph $G_{large}$ of an average degree of $\delta$,
  1162 has been executed 150 times. Generating a large graph $G_{2}$ of an average degree of $\delta$,
  1262 choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes,
  1163 choose 10 of its induced subgraphs having $\rho\ |V_{2}|$ nodes,
  1263 and for all the 10 subgraphs find a mapping by using both the graph
  1164 and for all the 10 subgraphs find a mapping by using both the graph
  1264 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
  1165 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
  1265 0.3, 0.6, 0.8, 0.95$ cases have been examined, see
  1166 0.3, 0.6, 0.8, 0.95$ cases have been examined, see
  1266 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
  1167 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
  1267 \ref{fig:randIND35}.
  1168 \ref{fig:randIND35}.
  1579 
  1480 
  1580 
  1481 
  1581 
  1482 
  1582 
  1483 
  1583 \section{Conclusion}
  1484 \section{Conclusion}
  1584 In this paper, after providing a short summary of the recent
  1485 This paper presented VF2++, a new graph matching algorithm based on VF2, called VF2++, and analyzed it from a practical viewpoint.
  1585 algorithms, a new graph matching algorithm based on VF2, called VF2++,
       
  1586 has been presented and analyzed from a practical viewpoint.
       
  1587 
  1486 
  1588 Recognizing the importance of the node order and determining an
  1487 Recognizing the importance of the node order and determining an
  1589 efficient one, VF2++ is able to match graphs of thousands of nodes in
  1488 efficient one, VF2++ is able to match graphs of thousands of nodes in
  1590 near practically linear time including preprocessing. In addition to
  1489 near practically linear time including preprocessing. In addition to
  1591 the proper order, VF2++ uses more efficient consistency and cutting
  1490 the proper order, VF2++ uses more efficient consistency and cutting
  1592 rules which are easy to compute and make the algorithm able to prune
  1491 rules which are easy to compute and make the algorithm able to prune
  1593 most of the unfruitful branches without going astray.
  1492 most of the unfruitful branches without going astray.
  1594 
  1493 
  1595 In order to show the efficiency of the new method, it has been
  1494 In order to show the efficiency of the new method, it has been
  1596 compared to VF2 Plus, which is the best concurrent algorithm based on
  1495 compared to VF2 Plus\cite{VF2Plus}, which is the best contemporary algorithm.
  1597 \cite{VF2Plus}.
  1496 .
  1598 
  1497 
  1599 The experiments show that VF2++ consistently outperforms VF2 Plus on
  1498 The experiments show that VF2++ consistently outperforms VF2 Plus on
  1600 biological graphs. It seems to be asymptotically faster on protein and
  1499 biological graphs. It seems to be asymptotically faster on protein and
  1601 on contact map graphs in the case of induced subgraph isomorphism,
  1500 on contact map graphs in the case of induced subgraph isomorphism,
  1602 while in the case of graph isomorphism, it has definitely better
  1501 while in the case of graph isomorphism, it has definitely better
  1603 asymptotic behaviour on protein graphs.
  1502 asymptotic behaviour on protein graphs.
  1604 
  1503 
  1605 Regarding random sparse graphs, not only has VF2++ proved itself to be
  1504 Regarding random sparse graphs, not only has VF2++ proved itself to be
  1606 faster than VF2 Plus, but it has a practically linear behaviour both
  1505 faster than VF2 Plus, but it also has a practically linear behaviour both
  1607 in the case of induced subgraph- and graph isomorphism, as well.
  1506 in the case of induced subgraph- and graph isomorphism.
  1608 
  1507 
  1609 
  1508 
  1610 
  1509 
  1611 %% The Appendices part is started with the command \appendix;
  1510 %% The Appendices part is started with the command \appendix;
  1612 %% appendix sections are then done as normal sections
  1511 %% appendix sections are then done as normal sections