damecco.tex
changeset 3 550f81b2f81c
parent 2 20d1b0e5838f
child 4 c682d3237141
equal deleted inserted replaced
2:3b7777a4db4b 3:2aabc34b45c7
   176 
   176 
   177 %% main text
   177 %% main text
   178 \section{Introduction}
   178 \section{Introduction}
   179 \label{sec:intro}
   179 \label{sec:intro}
   180 
   180 
   181 In the last decades, combinatorial structures, and especially graphs have been considered with ever increasing interest, and applied to the solution of several new and revised questions.
   181 In the last decades, combinatorial structures, and especially graphs
   182 The expressiveness, the simplicity and the studiedness of graphs make them practical for modelling and appear constantly in several seemingly independent fields.
   182 have been considered with ever increasing interest, and applied to the
   183 Bioinformatics and chemistry are amongst the most relevant and most important fields.
   183 solution of several new and revised questions.  The expressiveness,
   184 
   184 the simplicity and the studiedness of graphs make them practical for
   185 Complex biological systems arise from the interaction and cooperation of plenty of molecular components. Getting acquainted with such systems at the molecular level has primary importance, since protein-protein interaction, DNA-protein interaction, metabolic interaction, transcription factor binding, neuronal networks, and hormone signaling networks can be understood only this way.
   185 modelling and appear constantly in several seemingly independent
   186 
   186 fields.  Bioinformatics and chemistry are amongst the most relevant
   187 For instance, a molecular structure can be considered as a graph, whose nodes correspond to atoms and whose edges to chemical bonds. The secondary structure of a protein can also be represented as a graph, where nodes are associated with aminoacids and the edges with hydrogen bonds. The nodes are often whole molecular components and the edges represent some relationships among them.
   187 and most important fields.
   188 The similarity and dissimilarity of objects corresponding to nodes are incorporated to the model by \emph{node labels}.
   188 
   189 Many other chemical and biological structures can easily be modeled in a similar way. Understanding such networks basically requires finding specific subgraphs, which can not avoid the application of graph matching algorithms.
   189 Complex biological systems arise from the interaction and cooperation
   190 
   190 of plenty of molecular components. Getting acquainted with such
   191 Finally, let some of the other real-world fields related to some variants of graph matching be briefly mentioned: pattern recognition and machine vision \cite{HorstBunkeApplications}, symbol recognition \cite{CordellaVentoSymbolRecognition}, face identification \cite{JianzhuangYongFaceIdentification}.
   191 systems at the molecular level has primary importance, since
   192 \\
   192 protein-protein interaction, DNA-protein interaction, metabolic
   193 
   193 interaction, transcription factor binding, neuronal networks, and
   194 Subgraph and induced subgraph matching problems are known to be NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is one of the few problems in NP neither known to be in P nor NP-Complete. Although polynomial time isomorphism algorithms are known for various graph classes, like trees and planar graphs\cite{PlanarGraphIso}, bounded valence graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso} or permutation graphs\cite{PermGraphIso}.
   194 hormone signaling networks can be understood only this way.
   195 
   195 
   196 In the following, some algorithms based on other approaches are summarized, which do not need any restrictions on the graphs. However, an overall polynomial behaviour is not expectable from such an alternative, it may often have good performance, even on a graph class for which polynomial algorithm is known. Note that this summary containing only exact matching algorithms is far not complete, neither does it cover all the recent algorithms.
   196 For instance, a molecular structure can be considered as a graph,
   197 
   197 whose nodes correspond to atoms and whose edges to chemical bonds. The
   198 The first practically usable approach was due to \textbf{Ullmann}\cite{Ullmann} which is a commonly used depth-first search based algorithm with a complex heuristic for reducing the number of visited states. A major problem is its $\Theta(n^3)$ space complexity, which makes it impractical in the case of big sparse graphs.
   198 secondary structure of a protein can also be represented as a graph,
   199 
   199 where nodes are associated with aminoacids and the edges with hydrogen
   200 In a recent paper, \textbf{Ullmann}\cite{UllmannBit} presents an improved version of this algorithm based on a bit-vector solution for the binary Constraint Satisfaction Problem.
   200 bonds. The nodes are often whole molecular components and the edges
   201 
   201 represent some relationships among them.  The similarity and
   202 The \textbf{Nauty} algorithm\cite{Nauty} transforms the two graphs to a canonical form before starting to check for the isomorphism. It has been considered as one of the fastest graph isomorphism algorithms, although graph categories were shown in which it takes exponentially many steps. This algorithm handles only the graph isomorphism problem.
   202 dissimilarity of objects corresponding to nodes are incorporated to
   203 
   203 the model by \emph{node labels}.  Many other chemical and biological
   204 The \textbf{LAD} algorithm\cite{Lad} uses a depth-first search strategy and formulates the matching as a Constraint Satisfaction Problem to prune the search tree. The constraints are that the mapping has to be injective and edge-preserving, hence it is possible to handle new matching types as well.
   204 structures can easily be modeled in a similar way. Understanding such
   205 
   205 networks basically requires finding specific subgraphs, which can not
   206 The \textbf{RI} algorithm\cite{RI} and its variations are based on a state space representation. After reordering the nodes of the graphs, it uses some fast executable heuristic checks without using any complex pruning rules. It seems to run really efficiently on graphs coming from biology, and won the International Contest on Pattern Search in Biological Databases\cite{Content}. 
   206 avoid the application of graph matching algorithms.
   207 
   207 
   208 The currently most commonly used algorithm is the \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was designed for solving pattern matching and computer vision problems, and has been one of the best overall algorithms for more than a decade. Although, it can't be up to new specialized algorithms, it is still widely used due to its simplicity and space efficiency. VF2 uses a state space representation and checks some conditions in each state to prune the search tree.
   208 Finally, let some of the other real-world fields related to some
   209 
   209 variants of graph matching be briefly mentioned: pattern recognition
   210 Our first graph matching algorithm was the first version of VF2 which recognizes the significance of the node ordering, more opportunities to increase the cutting efficiency and reduce its computational complexity. This project was initiated and sponsored by QuantumBio Inc.\cite{QUANTUMBIO} and the implementation --- along with a source code --- has been published as a part of LEMON\cite{LEMON} open source graph library.
   210 and machine vision \cite{HorstBunkeApplications}, symbol recognition
   211 
   211 \cite{CordellaVentoSymbolRecognition}, face identification
   212 This thesis introduces \textbf{VF2++}, a new further improved algorithm for the graph and (induced)subgraph isomorphism problem, which uses efficient cutting rules and determines a node order in which VF2 runs significantly faster on practical inputs.
   212 \cite{JianzhuangYongFaceIdentification}.  \\
   213 
   213 
   214 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has been published. It is considered to be as efficient as the RI algorithm and has a strictly better behavior on large graphs.  The main idea of VF2 Plus is to precompute a heuristic node order of the small graph, in which the VF2 works more efficiently.\newline
   214 Subgraph and induced subgraph matching problems are known to be
   215 \newline
   215 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is
       
   216 one of the few problems in NP neither known to be in P nor
       
   217 NP-Complete. Although polynomial time isomorphism algorithms are known
       
   218 for various graph classes, like trees and planar
       
   219 graphs\cite{PlanarGraphIso}, bounded valence
       
   220 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
       
   221 or permutation graphs\cite{PermGraphIso}.
       
   222 
       
   223 In the following, some algorithms based on other approaches are
       
   224 summarized, which do not need any restrictions on the graphs. However,
       
   225 an overall polynomial behaviour is not expectable from such an
       
   226 alternative, it may often have good performance, even on a graph class
       
   227 for which polynomial algorithm is known. Note that this summary
       
   228 containing only exact matching algorithms is far not complete, neither
       
   229 does it cover all the recent algorithms.
       
   230 
       
   231 The first practically usable approach was due to
       
   232 \textbf{Ullmann}\cite{Ullmann} which is a commonly used depth-first
       
   233 search based algorithm with a complex heuristic for reducing the
       
   234 number of visited states. A major problem is its $\Theta(n^3)$ space
       
   235 complexity, which makes it impractical in the case of big sparse
       
   236 graphs.
       
   237 
       
   238 In a recent paper, \textbf{Ullmann}\cite{UllmannBit} presents an
       
   239 improved version of this algorithm based on a bit-vector solution for
       
   240 the binary Constraint Satisfaction Problem.
       
   241 
       
   242 The \textbf{Nauty} algorithm\cite{Nauty} transforms the two graphs to
       
   243 a canonical form before starting to check for the isomorphism. It has
       
   244 been considered as one of the fastest graph isomorphism algorithms,
       
   245 although graph categories were shown in which it takes exponentially
       
   246 many steps. This algorithm handles only the graph isomorphism problem.
       
   247 
       
   248 The \textbf{LAD} algorithm\cite{Lad} uses a depth-first search
       
   249 strategy and formulates the matching as a Constraint Satisfaction
       
   250 Problem to prune the search tree. The constraints are that the mapping
       
   251 has to be injective and edge-preserving, hence it is possible to
       
   252 handle new matching types as well.
       
   253 
       
   254 The \textbf{RI} algorithm\cite{RI} and its variations are based on a
       
   255 state space representation. After reordering the nodes of the graphs,
       
   256 it uses some fast executable heuristic checks without using any
       
   257 complex pruning rules. It seems to run really efficiently on graphs
       
   258 coming from biology, and won the International Contest on Pattern
       
   259 Search in Biological Databases\cite{Content}.
       
   260 
       
   261 The currently most commonly used algorithm is the
       
   262 \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was
       
   263 designed for solving pattern matching and computer vision problems,
       
   264 and has been one of the best overall algorithms for more than a
       
   265 decade. Although, it can't be up to new specialized algorithms, it is
       
   266 still widely used due to its simplicity and space efficiency. VF2 uses
       
   267 a state space representation and checks some conditions in each state
       
   268 to prune the search tree.
       
   269 
       
   270 Our first graph matching algorithm was the first version of VF2 which
       
   271 recognizes the significance of the node ordering, more opportunities
       
   272 to increase the cutting efficiency and reduce its computational
       
   273 complexity. This project was initiated and sponsored by QuantumBio
       
   274 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
       
   275 code --- has been published as a part of LEMON\cite{LEMON} open source
       
   276 graph library.
       
   277 
       
   278 This paper introduces \textbf{VF2++}, a new further improved algorithm
       
   279 for the graph and (induced)subgraph isomorphism problem, which uses
       
   280 efficient cutting rules and determines a node order in which VF2 runs
       
   281 significantly faster on practical inputs.
       
   282 
       
   283 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has
       
   284 been published. It is considered to be as efficient as the RI
       
   285 algorithm and has a strictly better behavior on large graphs.  The
       
   286 main idea of VF2 Plus is to precompute a heuristic node order of the
       
   287 small graph, in which the VF2 works more efficiently.
   216 
   288 
   217 \section{Problem Statement}
   289 \section{Problem Statement}
   218 This section provides a detailed description of the problems to be solved.
   290 This section provides a detailed description of the problems to be
       
   291 solved.
   219 \subsection{Definitions}
   292 \subsection{Definitions}
   220 
   293 
   221 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs.
   294 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and
       
   295 $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs.
   222 \begin{definition}\label{sec:ismorphic}
   296 \begin{definition}\label{sec:ismorphic}
   223 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M: V_{small} \longrightarrow V_{large}$ bijection, for which the following is true:
   297 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M:
   224 \begin{center}
   298   V_{small} \longrightarrow V_{large}$ bijection, for which the
   225 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (M(u),M(v))\in{E_{large}}$
   299   following is true:
       
   300 \begin{center}
       
   301 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
       
   302   (M(u),M(v))\in{E_{large}}$
   226 \end{center}
   303 \end{center}
   227 \end{definition}
   304 \end{definition}
   228 For the sake of simplicity in this paper subgraphs and induced subgraphs are defined in a more general way than usual:
   305 For the sake of simplicity in this paper subgraphs and induced
       
   306 subgraphs are defined in a more general way than usual:
   229 \begin{definition}
   307 \begin{definition}
   230 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I: V_{small}\longrightarrow V_{large}$ injection, for which the following is true:
   308 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I:
       
   309   V_{small}\longrightarrow V_{large}$ injection, for which the
       
   310   following is true:
   231 \begin{center}
   311 \begin{center}
   232 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$
   312 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$
   233 \end{center}
   313 \end{center}
   234 \end{definition}
   314 \end{definition}
   235 
   315 
   236 \begin{definition} 
   316 \begin{definition} 
   237 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists I: V_{small}\longrightarrow V_{large}$ injection, for which the following is true:
   317 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists
   238 \begin{center}
   318   I: V_{small}\longrightarrow V_{large}$ injection, for which the
   239 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (I(u),I(v))\in E_{large}$
   319   following is true:
       
   320 \begin{center}
       
   321 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
       
   322   (I(u),I(v))\in E_{large}$
   240 \end{center}
   323 \end{center}
   241 \end{definition}
   324 \end{definition}
   242 
   325 
   243 \begin{definition}
   326 \begin{definition}
   244 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node label function}, where K is an arbitrary set. The elements in K are the \textbf{node labels}. Two nodes, u and v are said to be \textbf{equivalent}, if $lab(u)=lab(v)$.
   327 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node
       
   328     label function}, where K is an arbitrary set. The elements in K
       
   329   are the \textbf{node labels}. Two nodes, u and v are said to be
       
   330   \textbf{equivalent}, if $lab(u)=lab(v)$.
   245 \end{definition}
   331 \end{definition}
   246 
   332 
   247 When node labels are also given, the matched nodes must have the same labels.
   333 When node labels are also given, the matched nodes must have the same
   248 For example, the node labeled isomorphism is phrased by
   334 labels.  For example, the node labeled isomorphism is phrased by
   249 \begin{definition}
   335 \begin{definition}
   250 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label function lab} if $\exists M: V_{small} \longrightarrow V_{large}$ bijection, for which the following is true:
   336 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label
   251 \begin{center}
   337     function lab} if $\exists M: V_{small} \longrightarrow V_{large}$
   252 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (M(u),M(v))\in{E_{large}})$
   338   bijection, for which the following is true:
   253  and $(\forall u\in{V_{small}} : lab(u)=lab(M(u)))$
   339 \begin{center}
       
   340 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
       
   341   (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} :
       
   342   lab(u)=lab(M(u)))$
   254 \end{center}
   343 \end{center}
   255 \end{definition}
   344 \end{definition}
   256 
   345 
   257 The other two definitions can be extended in the same way.
   346 The other two definitions can be extended in the same way.
   258 
   347 
   259 Note that edge label function can be defined similarly to node label function, and all the definitions can be extended with additional conditions, but it is out of the scope of this work.
   348 Note that edge label function can be defined similarly to node label
   260 
   349 function, and all the definitions can be extended with additional
   261 The equivalence of two nodes is usually defined by another relation, $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the definition given above if R is an equivalence relation, which does not mean restriction in biological and chemical applications.
   350 conditions, but it is out of the scope of this work.
       
   351 
       
   352 The equivalence of two nodes is usually defined by another relation,
       
   353 $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the
       
   354 definition given above if R is an equivalence relation, which does not
       
   355 mean restriction in biological and chemical applications.
   262 
   356 
   263 \subsection{Common problems}\label{sec:CommProb}
   357 \subsection{Common problems}\label{sec:CommProb}
   264 
   358 
   265 The focus of this paper is on two extensively studied topics, the subgraph isomorphism and its variations. However, the following problems also appear in many applications.
   359 The focus of this paper is on two extensively studied topics, the
   266 
   360 subgraph isomorphism and its variations. However, the following
   267 The \textbf{subgraph matching problem} is the following: is $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node label?
   361 problems also appear in many applications.
   268 
   362 
   269 The \textbf{induced subgraph matching problem} asks the same about the existence of an induced subgraph.
   363 The \textbf{subgraph matching problem} is the following: is
   270 
   364 $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node
   271 The \textbf{graph isomorphism problem} can be defined as induced subgraph matching problem where the sizes of the two graphs are equal.
   365 label?
   272 
   366 
   273 In addition to existence, it may be needed to show such a subgraph, or it may be necessary to list all of them.
   367 The \textbf{induced subgraph matching problem} asks the same about the
   274 
   368 existence of an induced subgraph.
   275 It should be noted that some authors misleadingly refer to the term \emph{subgraph isomorphism problem} as an \emph{induced subgraph isomorphism problem}.
   369 
   276 
   370 The \textbf{graph isomorphism problem} can be defined as induced
   277 The following sections give the descriptions of VF2, VF2++, VF2 Plus and a particular comparison.
   371 subgraph matching problem where the sizes of the two graphs are equal.
       
   372 
       
   373 In addition to existence, it may be needed to show such a subgraph, or
       
   374 it may be necessary to list all of them.
       
   375 
       
   376 It should be noted that some authors misleadingly refer to the term
       
   377 \emph{subgraph isomorphism problem} as an \emph{induced subgraph
       
   378   isomorphism problem}.
       
   379 
       
   380 The following sections give the descriptions of VF2, VF2++, VF2 Plus
       
   381 and a particular comparison.
   278 
   382 
   279 \section{The VF2 Algorithm}
   383 \section{The VF2 Algorithm}
   280 This algorithm is the basis of both the VF2++ and the VF2 Plus.
   384 This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2
   281 VF2 is able to handle all the variations mentioned in \textbf{Section \ref{sec:CommProb})}.
   385 is able to handle all the variations mentioned in \textbf{Section
   282 Although it can also handle directed graphs, for the sake of simplicity, only the undirected case will be discussed.
   386   \ref{sec:CommProb})}.  Although it can also handle directed graphs,
       
   387 for the sake of simplicity, only the undirected case will be
       
   388 discussed.
   283 
   389 
   284 
   390 
   285 \subsection{Common notations}
   391 \subsection{Common notations}
   286 \indent
   392 \indent Assume $G_{small}$ is searched in $G_{large}$.  The following
   287 Assume $G_{small}$ is searched in $G_{large}$.
   393 definitions and notations will be used throughout the whole paper.
   288 The following definitions and notations will be used throughout the whole paper.
       
   289 \begin{definition}
   394 \begin{definition}
   290 A set $M\subseteq V_{small}\times V_{large}$ is called \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears in more than one pair in M.
   395 A set $M\subseteq V_{small}\times V_{large}$ is called
   291 That is, M uniquely associates some of the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice versa.
   396 \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears
       
   397 in more than one pair in M.  That is, M uniquely associates some of
       
   398 the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice
       
   399 versa.
   292 \end{definition}
   400 \end{definition}
   293 
   401 
   294 \begin{definition}
   402 \begin{definition}
   295 Mapping M \textbf{covers} a node v, if there exists a pair in M, which contains v.
   403 Mapping M \textbf{covers} a node v, if there exists a pair in M, which
       
   404 contains v.
   296 \end{definition}
   405 \end{definition}
   297 
   406 
   298 \begin{definition}
   407 \begin{definition}
   299 A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the nodes in $V_{small}$.
   408 A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the
       
   409 nodes in $V_{small}$.
   300 \end{definition}
   410 \end{definition}
   301 
   411 
   302 \begin{notation}
   412 \begin{notation}
   303 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$.
   413 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in
       
   414 V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in
       
   415 V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$.
   304 \end{notation}
   416 \end{notation}
   305 
   417 
   306 \begin{notation}
   418 \begin{notation}
   307 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$ and $v\in V_{small}\cup V_{large}$.
   419 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node
       
   420 exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$
       
   421 and $v\in V_{small}\cup V_{large}$.
   308 \end{notation}
   422 \end{notation}
   309 
   423 
   310 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique
   424 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique
   311 
   425 
   312 The definitions of the isomorphism types can be rephrased on the existence of a special whole mapping $M$, since it represents a bijection. For example 
   426 The definitions of the isomorphism types can be rephrased on the
   313 \begin{center}
   427 existence of a special whole mapping $M$, since it represents a
   314 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (Pair(M,u),Pair(M,v))\in E_{large}$.
   428 bijection. For example
       
   429 \begin{center}
       
   430 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph
       
   431   isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v
       
   432   \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
       
   433   (Pair(M,u),Pair(M,v))\in E_{large}$.
   315 \end{center}
   434 \end{center}
   316 
   435 
   317 \begin{definition}
   436 \begin{definition}
   318 A set of whole mappings is called \textbf{problem type}.
   437 A set of whole mappings is called \textbf{problem type}.
   319 \end{definition}
   438 \end{definition}
   320 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type which can be substituted by any problem type.
   439 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
   321 
   440 which can be substituted by any problem type.
   322 A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using this notations, VF2 searches a whole mapping $W$ of type $PT$.
   441 
   323 
   442 A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using
   324 For example the problem type of graph isomorphism problem is the following.
   443 this notations, VF2 searches a whole mapping $W$ of type $PT$.
   325 A whole mapping $W$ is in $\mathbf{ISO}$, iff the bijection represented by $W$ satisfies \textbf{Definition \ref{sec:ismorphic})}.
   444 
   326 The subgraph- and induced subgraph matching problems can be formalized in a similar way. Let their problem types be denoted as $\mathbf{SUB}$ and $\mathbf{IND}$.
   445 For example the problem type of graph isomorphism problem is the
       
   446 following.  A whole mapping $W$ is in $\mathbf{ISO}$, iff the
       
   447 bijection represented by $W$ satisfies \textbf{Definition
       
   448   \ref{sec:ismorphic})}.  The subgraph- and induced subgraph matching
       
   449 problems can be formalized in a similar way. Let their problem types
       
   450 be denoted as $\mathbf{SUB}$ and $\mathbf{IND}$.
   327 
   451 
   328 \begin{definition}
   452 \begin{definition}
   329 \label{expPT}
   453 \label{expPT}
   330 $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of $G_{small}$ has to be mapped to an edge of $G_{large}$ for each mapping in $PT$.
   454 $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in
       
   455 PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow
       
   456 (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of
       
   457 $G_{small}$ has to be mapped to an edge of $G_{large}$ for each
       
   458 mapping in $PT$.
   331 \end{definition}
   459 \end{definition}
   332 
   460 
   333 Note that $ISO$, $SUB$ and $IND$ are expanding problem types.
   461 Note that $ISO$, $SUB$ and $IND$ are expanding problem types.
   334 
   462 
   335 This paper deals with the three problem types mentioned above only, but
   463 This paper deals with the three problem types mentioned above only,
   336 the following generic definitions make it possible to handle other types as well.
   464 but the following generic definitions make it possible to handle other
   337 Although it may be challenging to find a proper consistency function and an efficient
   465 types as well.  Although it may be challenging to find a proper
   338 cutting function.
   466 consistency function and an efficient cutting function.
   339 
   467 
   340 \begin{definition}
   468 \begin{definition}
   341 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a \textbf{consistency function by } $\mathbf{PT}$, if the following holds. If there exists whole mapping $W$ of type PT for which $M\subseteq W$, then $Cons_{PT}(M)$ is true.
   469 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
       
   470 \textbf{consistency function by } $\mathbf{PT}$, if the following
       
   471 holds. If there exists whole mapping $W$ of type PT for which
       
   472 $M\subseteq W$, then $Cons_{PT}(M)$ is true.
   342 \end{definition}
   473 \end{definition}
   343 
   474 
   344 \begin{definition} 
   475 \begin{definition} 
   345 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a \textbf{cutting function by } $\mathbf{PT}$, if the following holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a whole mapping W of type PT.
   476 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
       
   477 \textbf{cutting function by } $\mathbf{PT}$, if the following
       
   478 holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a
       
   479 whole mapping W of type PT.
   346 \end{definition}
   480 \end{definition}
   347 
   481 
   348 \begin{definition}
   482 \begin{definition}
   349 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if $Cons_{PT}(M)$ is true. 
   483 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if
       
   484   $Cons_{PT}(M)$ is true.
   350 \end{definition}
   485 \end{definition}
   351 
   486 
   352 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
   487 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
   353 \begin{notation}
   488 \begin{notation}
   354 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping.
   489 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and
       
   490 $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where
       
   491 $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping.
   355 \end{notation}
   492 \end{notation}
   356 
   493 
   357 $Cons_{PT}$ will be used to check the consistency of the already covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if no whole consistent mapping can contain the current mapping.
   494 $Cons_{PT}$ will be used to check the consistency of the already
       
   495 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
       
   496 no whole consistent mapping can contain the current mapping.
   358 
   497 
   359 \subsection{Overview of the algorithm}
   498 \subsection{Overview of the algorithm}
   360 VF2 uses a state space representation of mappings, $Cons_{PT}$ for excluding inconsistency with the problem type and $Cut_{PT}$ for pruning the search tree.
   499 VF2 uses a state space representation of mappings, $Cons_{PT}$ for
   361 Each state $s$ of the matching process can be associated with a mapping $M(s)$.
   500 excluding inconsistency with the problem type and $Cut_{PT}$ for
   362 
   501 pruning the search tree.  Each state $s$ of the matching process can
   363 \textbf{Algorithm~\ref{alg:VF2Pseu})} is a high level description of the VF2 matching algorithm.
   502 be associated with a mapping $M(s)$.
       
   503 
       
   504 \textbf{Algorithm~\ref{alg:VF2Pseu})} is a high level description of
       
   505 the VF2 matching algorithm.
   364 
   506 
   365 
   507 
   366 \begin{algorithm}
   508 \begin{algorithm}
   367 \algtext*{EndIf}%ne nyomtasson end if-et
   509 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
   368 \algtext*{EndFor}%ne nyomtasson ..
   510 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
   369 \algtext*{EndProcedure}%ne nyomtasson ..
       
   370 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
   511 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
   371 \begin{algorithmic}[1]
   512 \begin{algorithmic}[1]
   372 
   513 
   373 \Procedure{VF2}{State $s$, ProblemType $PT$}
   514 \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers
   374   \If{$M(s$) covers $V_{small}$}
   515   $V_{small}$} \State Output($M(s)$) \Else
   375     \State Output($M(s)$)
       
   376   \Else
       
   377   
   516   
   378   \State Compute the set $P(s)$ of the pairs candidate for inclusion in $M(s)$
   517   \State Compute the set $P(s)$ of the pairs candidate for inclusion
   379   \ForAll{$p\in{P(s)}$}
   518   in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$
   380     \If{Cons$_{PT}$($p, M(s)$) $\wedge$ $\neg$Cut$_{PT}$($p, M(s)$)}
   519     $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state
   381       \State Compute the nascent state $\tilde{s}$ by adding $p$ to $M(s)$
   520   $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call}
   382       \State \textbf{call} VF2($\tilde{s}$, $PT$)
   521   VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
   383     \EndIf
       
   384   \EndFor
       
   385   \EndIf
       
   386 \EndProcedure
       
   387 \end{algorithmic}
   522 \end{algorithmic}
   388 \end{algorithm}
   523 \end{algorithm}
   389 
   524 
   390 
   525 
   391 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it starts with an empty mapping.
   526 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it
   392 
   527 starts with an empty mapping.
   393 For each state $s$, the algorithm computes $P(s)$, the set of candidate node pairs for adding to the current state $s$.
   528 
   394 
   529 For each state $s$, the algorithm computes $P(s)$, the set of
   395 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup \{p\}$ is computed, and the whole process is recursively applied to $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it can be proved that $s$ can not be extended to a whole mapping.
   530 candidate node pairs for adding to the current state $s$.
       
   531 
       
   532 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and
       
   533 $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and
       
   534 $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup
       
   535 \{p\}$ is computed, and the whole process is recursively applied to
       
   536 $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it
       
   537 can be proved that $s$ can not be extended to a whole mapping.
   396 
   538 
   397 In order to make sure of the correctness see
   539 In order to make sure of the correctness see
   398 \begin{claim}
   540 \begin{claim}
   399 Through consistent mappings, only consistent whole mappings can be reached, and all of the whole mappings are reachable through consistent mappings.
   541 Through consistent mappings, only consistent whole mappings can be
       
   542 reached, and all of the whole mappings are reachable through
       
   543 consistent mappings.
   400 \end{claim}
   544 \end{claim}
   401 
   545 
   402 Note that a state may be reached in many different ways, since the order of insertions into M does not influence the nascent mapping. In fact, the number of different ways which lead to the same state can be exponentially large. If $G_{small}$ and $G_{large}$ are circles with n nodes and n different node labels, there exists exactly one graph isomorphism between them, but it will be reached in $n!$ different ways.
   546 Note that a state may be reached in many different ways, since the
       
   547 order of insertions into M does not influence the nascent mapping. In
       
   548 fact, the number of different ways which lead to the same state can be
       
   549 exponentially large. If $G_{small}$ and $G_{large}$ are circles with n
       
   550 nodes and n different node labels, there exists exactly one graph
       
   551 isomorphism between them, but it will be reached in $n!$ different
       
   552 ways.
   403 
   553 
   404 However, one may observe
   554 However, one may observe
   405 
   555 
   406 \begin{claim}
   556 \begin{claim}
   407 \label{claim:claimTotOrd}
   557 \label{claim:claimTotOrd}
   408 Let $\prec$ an arbitrary total ordering relation on $V_{small}$.
   558 Let $\prec$ an arbitrary total ordering relation on $V_{small}$.  If
   409 If the algorithm ignores each $p=(u,v) \in P(s)$, for which
   559 the algorithm ignores each $p=(u,v) \in P(s)$, for which
   410 \begin{center}
   560 \begin{center}
   411 $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$,
   561 $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$,
   412 \end{center}
   562 \end{center}
   413 then no state can be reached more than ones and each state associated with a whole mapping remains reachable. 
   563 then no state can be reached more than ones and each state associated
       
   564 with a whole mapping remains reachable.
   414 \end{claim}
   565 \end{claim}
   415 
   566 
   416 Note that the cornerstone of the improvements to VF2 is a proper choice of a total ordering.
   567 Note that the cornerstone of the improvements to VF2 is a proper
       
   568 choice of a total ordering.
   417 
   569 
   418 \subsection{The candidate set P(s)}
   570 \subsection{The candidate set P(s)}
   419 \label{candidateComputingVF2}
   571 \label{candidateComputingVF2}
   420 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.
   572 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.
   421 Suppose that $PT$ is an expanding problem type, see \textbf{Definition~\ref{expPT})}.
   573 Suppose that $PT$ is an expanding problem type, see
       
   574 \textbf{Definition~\ref{expPT})}.
   422 
   575 
   423 \begin{notation}
   576 \begin{notation}
   424 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}} \wedge \tilde{u}$ is covered by $M(s)\}$, and \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not covered by $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}} \wedge \tilde{v}$ is covered by $M(s)\}$
   577 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by
       
   578 $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}}
       
   579 \wedge \tilde{u}$ is covered by $M(s)\}$, and
       
   580 \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not
       
   581 covered by
       
   582 $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}}
       
   583 \wedge \tilde{v}$ is covered by $M(s)\}$
   425 \end{notation}
   584 \end{notation}
   426 
   585 
   427 The set $P(s)$ includes the pairs of uncovered neighbours of covered nodes and if there is not such a node pair, all the pairs containing two uncovered nodes are added. Formally, let
   586 The set $P(s)$ includes the pairs of uncovered neighbours of covered
       
   587 nodes and if there is not such a node pair, all the pairs containing
       
   588 two uncovered nodes are added. Formally, let
   428 \[
   589 \[
   429  P(s)\!=\!
   590  P(s)\!=\!
   430   \begin{cases} 
   591   \begin{cases} 
   431    T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if } T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq \emptyset,\\
   592    T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if }
   432    (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s)) &\hspace{-0.15cm}otherwise.
   593    T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq
       
   594    \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s))
       
   595    &\hspace{-0.15cm}otherwise.
   433   \end{cases}
   596   \end{cases}
   434 \]
   597 \]
   435 
   598 
   436 \subsection{Consistency}
   599 \subsection{Consistency}
   437 This section defines the consistency functions for the different problem types mentioned in \textbf{Section \ref{sec:CommProb})}.
   600 This section defines the consistency functions for the different
       
   601 problem types mentioned in \textbf{Section \ref{sec:CommProb})}.
   438 \begin{notation}
   602 \begin{notation}
   439 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} : (u,\tilde{u})\in E_{small}\}$\\
   603 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} :
   440 Let $\mathbf{\Gamma_{large} (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$
   604 (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large}
       
   605   (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$
   441 \end{notation}
   606 \end{notation}
   442 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$,
   607 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is
   443 $s$ is a state of the matching procedure,
   608 a state of the matching procedure, $M(s)$ is consistent mapping by
   444 $M(s)$ is consistent mapping by $PT$ and $lab(u)=lab(v)$.
   609 $PT$ and $lab(u)=lab(v)$.  $Cons_{PT}(p,M(s))$ checks whether
   445 $Cons_{PT}(p,M(s))$ checks whether including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$.
   610 including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$.
   446 
   611 
   447 \subsubsection{Induced subgraph isomorphism}
   612 \subsubsection{Induced subgraph isomorphism}
   448 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small} \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline
   613 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow
   449 The following formulation gives an efficient way of calculating $Cons_{IND}$.
   614 (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small}
       
   615 \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline The
       
   616 following formulation gives an efficient way of calculating
       
   617 $Cons_{IND}$.
   450 \begin{claim}
   618 \begin{claim}
   451 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v) \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge 
   619 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v)
   452 (\forall \tilde{u}\in \Gamma_{small}(u) \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a consistency function in the case of $IND$.
   620   \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge
       
   621   (\forall \tilde{u}\in \Gamma_{small}(u)
       
   622   \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
       
   623   consistency function in the case of $IND$.
   453 \end{claim}
   624 \end{claim}
   454 
   625 
   455 \subsubsection{Graph isomorphism}
   626 \subsubsection{Graph isomorphism}
   456 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$ $\Leftrightarrow$  $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$.
   627 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$
       
   628 $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by
       
   629 $IND$.
   457 \begin{claim}
   630 \begin{claim}
   458 $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and only if it is a consistency function by $IND$.
   631 $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and
       
   632   only if it is a consistency function by $IND$.
   459 \end{claim}
   633 \end{claim}
   460 \subsubsection{Subgraph isomorphism}
   634 \subsubsection{Subgraph isomorphism}
   461 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small} \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.
   635 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow
       
   636 (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small}
       
   637 \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.
   462 \newline
   638 \newline
   463 The following formulation gives an efficient way of calculating $Cons_{SUB}$.
   639 The following formulation gives an efficient way of calculating
       
   640 $Cons_{SUB}$.
   464 \begin{claim}
   641 \begin{claim}
   465 $Cons_{SUB}((u,v),M(s)):=
   642 $Cons_{SUB}((u,v),M(s)):= (\forall \tilde{u}\in \Gamma_{small}(u)
   466 (\forall \tilde{u}\in \Gamma_{small}(u) \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a consistency function by $SUB$.
   643   \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
       
   644   consistency function by $SUB$.
   467 \end{claim}
   645 \end{claim}
   468 
   646 
   469 \subsection{Cutting rules}
   647 \subsection{Cutting rules}
   470 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can be true only if it is impossible to extended $M(s)\cup \{p\}$ to a whole mapping.
   648 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently
       
   649 verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can
       
   650 be true only if it is impossible to extended $M(s)\cup \{p\}$ to a
       
   651 whole mapping.
   471 \begin{notation}
   652 \begin{notation}
   472 
   653 
   473 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash M_{small}(s))\backslash T_{small}(s)$, and \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash M_{large}(s))\backslash T_{large}(s)$.
   654 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash
       
   655 M_{small}(s))\backslash T_{small}(s)$, and
       
   656 \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash
       
   657 M_{large}(s))\backslash T_{large}(s)$.
   474 \end{notation}
   658 \end{notation}
   475 \subsubsection{Induced subgraph isomorphism}
   659 \subsubsection{Induced subgraph isomorphism}
   476 \begin{claim}
   660 \begin{claim}
   477 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is a cutting function by $IND$.
   661 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
       
   662   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
       
   663   \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap
       
   664   \tilde{T}_{small}(s)|$ is a cutting function by $IND$.
   478 \end{claim}
   665 \end{claim}
   479 \subsubsection{Graph isomorphism}
   666 \subsubsection{Graph isomorphism}
   480 Note that the cutting function of induced subgraph isomorphism defined above is a cutting function by $ISO$, too, however it is less efficient than the following while their computational complexity is the same.
   667 Note that the cutting function of induced subgraph isomorphism defined
       
   668 above is a cutting function by $ISO$, too, however it is less
       
   669 efficient than the following while their computational complexity is
       
   670 the same.
   481 \begin{claim}
   671 \begin{claim}
   482 $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap  \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is a cutting function by $ISO$.
   672 $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq
       
   673   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
       
   674   \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap
       
   675   \tilde{T}_{small}(s)|$ is a cutting function by $ISO$.
   483 \end{claim}
   676 \end{claim}
   484 
   677 
   485 \subsubsection{Subgraph isomorphism}
   678 \subsubsection{Subgraph isomorphism}
   486 \begin{claim}
   679 \begin{claim}
   487 $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by $SUB$.
   680 $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
       
   681   |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by
       
   682   $SUB$.
   488 \end{claim}
   683 \end{claim}
   489 Note that there is a significant difference between induced and non-induced subgraph isomorphism:
   684 Note that there is a significant difference between induced and
       
   685 non-induced subgraph isomorphism:
   490 
   686 
   491 \begin{claim}
   687 \begin{claim}
   492 \label{claimSUB}
   688 \label{claimSUB}
   493 $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is \textbf{not} a cutting function by $SUB$.
   689 $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
       
   690 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
       
   691 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$
       
   692 is \textbf{not} a cutting function by $SUB$.
   494 \end{claim}
   693 \end{claim}
   495 \begin{proof}$ $\\
   694 \begin{proof}$ $\\
   496 \vspace*{-0.5cm}
   695 \vspace*{-0.5cm}
   497 
   696 
   498 \begin{figure}
   697 \begin{figure}
   499 \begin{center}
   698 \begin{center}
   500 \begin{tikzpicture}
   699 \begin{tikzpicture}
   501   [scale=.8,auto=left,every node/.style={circle,fill=black!15}]
   700   [scale=.8,auto=left,every node/.style={circle,fill=black!15}]
   502   \node[rectangle,fill=black!15] at (4,6) {$G_{small}$};
   701   \node[rectangle,fill=black!15] at (4,6) {$G_{small}$}; \node (u4) at
   503   \node (u4) at (2.5,10)  {$u_4$};
   702   (2.5,10) {$u_4$}; \node (u3) at (5.5,10) {$u_3$}; \node (u1) at
   504   \node (u3) at (5.5,10) {$u_3$};
   703   (2.5,7) {$u_1$}; \node (u2) at (5.5,7) {$u_2$};
   505   \node (u1) at (2.5,7)  {$u_1$};
       
   506   \node (u2) at (5.5,7) {$u_2$};
       
   507   
   704   
   508   \node[rectangle,fill=black!30] at (13.5,6) {$G_{large}$};
   705   \node[rectangle,fill=black!30] at (13.5,6) {$G_{large}$};
   509   \node[fill=black!30] (v4) at (12,10)  {$v_4$};
   706   \node[fill=black!30] (v4) at (12,10) {$v_4$}; \node[fill=black!30]
   510   \node[fill=black!30] (v3) at (15,10) {$v_3$};
   707   (v3) at (15,10) {$v_3$}; \node[fill=black!30] (v1) at (12,7)
   511   \node[fill=black!30] (v1) at (12,7)  {$v_1$};
   708        {$v_1$}; \node[fill=black!30] (v2) at (15,7) {$v_2$};
   512   \node[fill=black!30] (v2) at (15,7) {$v_2$};
       
   513   
   709   
   514 
   710 
   515   \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1}
   711   \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1} \draw (\from) --
   516     \draw (\from) -- (\to);
   712   (\to); \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3} \draw
   517   \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3}
   713   (\from) -- (\to);
   518     \draw (\from) -- (\to);
       
   519 %    \draw[dashed] (\from) -- (\to);
   714 %    \draw[dashed] (\from) -- (\to);
   520 \end{tikzpicture}
   715 \end{tikzpicture}
   521 \caption{Graphs for the proof of \textbf{Claim \ref{claimSUB}}} \label{fig:proofSUB}
   716 \caption{Graphs for the proof of \textbf{Claim
       
   717     \ref{claimSUB}}} \label{fig:proofSUB}
   522 \end{center}
   718 \end{center}
   523 \end{figure}
   719 \end{figure}
   524 Let the two graphs of \textbf{Figure \ref{fig:proofSUB})} be the input graphs.
   720 Let the two graphs of \textbf{Figure \ref{fig:proofSUB})} be the input
   525 Suppose the total ordering relation is $u_1 \prec u_2 \prec u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add $(u_2,v_2)\in P(s)$.\newline
   721 graphs.  Suppose the total ordering relation is $u_1 \prec u_2 \prec
   526 $Cons_{SUB}((u_2,v_2),M(s))=true$, so $M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function $Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the tree.\newline
   722 u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add
   527 On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true, since\\$0=|\Gamma_{large}(v_2)\cap \tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap \tilde{T}_{small}(s)|=1$ is true, but still the tree can not be pruned, because otherwise the $\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found.
   723 $(u_2,v_2)\in P(s)$.\newline $Cons_{SUB}((u_2,v_2),M(s))=true$, so
       
   724 $M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function
       
   725 $Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the
       
   726 tree.\newline On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true,
       
   727 since\\$0=|\Gamma_{large}(v_2)\cap
       
   728 \tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap
       
   729 \tilde{T}_{small}(s)|=1$ is true, but still the tree can not be
       
   730 pruned, because otherwise the
       
   731 $\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found.
   528 \end{proof}
   732 \end{proof}
   529 
   733 
   530 \newpage
       
   531 \section{The VF2++ Algorithm}
   734 \section{The VF2++ Algorithm}
   532 Although any total ordering relation makes the search space of VF2 a tree, its
   735 Although any total ordering relation makes the search space of VF2 a
   533 choice turns out to dramatically influence the number of visited states. The goal is to determine an efficient one as quickly as possible.
   736 tree, its choice turns out to dramatically influence the number of
   534 
   737 visited states. The goal is to determine an efficient one as quickly
   535 The main reason for VF2++' superiority over VF2 is twofold. Firstly, taking into account the structure and the node labeling of the graph, VF2++ determines a state order in which most of the unfruitful branches of the search space can be pruned immediately. Secondly, introducing more efficient --- nevertheless still easier to compute --- cutting rules reduces the chance of going astray even further.
   738 as possible.
   536 
   739 
   537 In addition to the usual subgraph isomorphism, specialized versions for induced subgraph isomorphism and for graph isomorphism have been designed. VF2++ has gained a runtime improvement of one order of magnitude respecting induced subgraph isomorphism and a better asymptotical behaviour in the case of graph isomorphism problem.
   740 The main reason for VF2++' superiority over VF2 is twofold. Firstly,
   538 
   741 taking into account the structure and the node labeling of the graph,
   539 Note that a weaker version of the cutting rules and the more efficient candidate
   742 VF2++ determines a state order in which most of the unfruitful
   540 set calculating were described in \cite{VF2Plus}, too.
   743 branches of the search space can be pruned immediately. Secondly,
   541 
   744 introducing more efficient --- nevertheless still easier to compute
   542 It should be noted that all the methods described in this section are extendable to handle directed graphs and edge labels as well.
   745 --- cutting rules reduces the chance of going astray even further.
   543 
   746 
   544 The basic ideas and the detailed description of VF2++ are provided in the following.
   747 In addition to the usual subgraph isomorphism, specialized versions
       
   748 for induced subgraph isomorphism and for graph isomorphism have been
       
   749 designed. VF2++ has gained a runtime improvement of one order of
       
   750 magnitude respecting induced subgraph isomorphism and a better
       
   751 asymptotical behaviour in the case of graph isomorphism problem.
       
   752 
       
   753 Note that a weaker version of the cutting rules and the more efficient
       
   754 candidate set calculating were described in \cite{VF2Plus}, too.
       
   755 
       
   756 It should be noted that all the methods described in this section are
       
   757 extendable to handle directed graphs and edge labels as well.
       
   758 
       
   759 The basic ideas and the detailed description of VF2++ are provided in
       
   760 the following.
   545 
   761 
   546 \subsection{Preparations}
   762 \subsection{Preparations}
   547 \begin{claim}
   763 \begin{claim}
   548 \label{claim:claimCoverFromLeft}
   764 \label{claim:claimCoverFromLeft}
   549 The total ordering relation uniquely determines a node order, in which the nodes of $V_{small}$ will be covered by VF2. From the point of view of the matching procedure, this means, that always the same node of $G_{small}$ will be covered on the d-th level.
   765 The total ordering relation uniquely determines a node order, in which
       
   766 the nodes of $V_{small}$ will be covered by VF2. From the point of
       
   767 view of the matching procedure, this means, that always the same node
       
   768 of $G_{small}$ will be covered on the d-th level.
   550 \end{claim}
   769 \end{claim}
   551 \begin{proof}
   770 \begin{proof}
   552 In order to make the search space a tree, the pairs in $\{(u,v)\in P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$.
   771 In order to make the search space a tree, the pairs in $\{(u,v)\in
       
   772 P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$.
   553 \newline
   773 \newline
   554 Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} : \hat{u}\prec u\}$
   774 Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} :
       
   775 \hat{u}\prec u\}$
   555 \newline
   776 \newline
   556 The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} : \forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form $\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious, that only $\tilde{u}$ can be covered in $V_{small}$. Actually, $\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in $V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and $T_{small}(s)$ depends only on the covered nodes of $G_{small}$. 
   777 The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} :
       
   778 \forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form
       
   779 $\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious,
       
   780 that only $\tilde{u}$ can be covered in $V_{small}$. Actually,
       
   781 $\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in
       
   782 $V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and
       
   783 $T_{small}(s)$ depends only on the covered nodes of $G_{small}$.
   557 \newline
   784 \newline
   558 Simple induction on $d$ shows that the set of covered nodes of $G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if $d$ is given.
   785 Simple induction on $d$ shows that the set of covered nodes of
       
   786 $G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if
       
   787 $d$ is given.
   559 \end{proof}
   788 \end{proof}
   560 
   789 
   561 \begin{definition}
   790 \begin{definition}
   562 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of $V_{small}$ is \textbf{matching order}, if exists $\prec$ total ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$.
   791 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
       
   792 $V_{small}$ is \textbf{matching order}, if exists $\prec$ total
       
   793 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
       
   794 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$.
   563 \end{definition}
   795 \end{definition}
   564 
   796 
   565 \begin{claim}\label{claim:MOclaim}
   797 \begin{claim}\label{claim:MOclaim}
   566 A total ordering is matching order, iff the nodes of every component form an interval in the node sequence, and every node connects to a previous node in its component except the first node of the component. The order of the components is arbitrary.
   798 A total ordering is matching order, iff the nodes of every component
   567 \\Formally spoken, an order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of $V_{small}$ is matching order $\Leftrightarrow$ $\forall G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in \{1,..,|V_{small}|\}$\newline
   799 form an interval in the node sequence, and every node connects to a
       
   800 previous node in its component except the first node of the
       
   801 component. The order of the components is arbitrary.  \\Formally
       
   802 spoken, an order
       
   803 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
       
   804 $V_{small}$ is matching order $\Leftrightarrow$ $\forall
       
   805 G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall
       
   806 i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
       
   807 V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq
       
   808 l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge
       
   809 (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in
       
   810 \{1,..,|V_{small}|\}$\newline
   568 \end{claim}
   811 \end{claim}
   569 \begin{proof}
   812 \begin{proof}
   570 Suppose a matching order is given. It has to be shown that the node sequence has a structure described above.\\
   813 Suppose a matching order is given. It has to be shown that the node
   571 Let $G'_{small}=(V'_{small},E'_{small})$ be an arbitrary component and $i$ an arbitrary index. 
   814 sequence has a structure described above.\\ Let
       
   815 $G'_{small}=(V'_{small},E'_{small})$ be an arbitrary component and $i$
       
   816 an arbitrary index.
   572 \newline
   817 \newline
   573 $(\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in V'_{small})\Rightarrow u_{\sigma(i)}$ is not the first covered node in $G_{small}\ \Rightarrow u_{\sigma(i)}$ is connected to a covered node $u_{\sigma(k)}$ where $k<i$, since $u_{\sigma(i)}\in T_{small}(s)$ for some $s$ and $T_{small}(s)\subseteq{V'_{small}}$ contains only nodes connected to at least one covered node. It's easy to see, that $\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}$, since $T_{small}(s)$ contains only nodes connected to some covered ones, while it is not empty, but if it were empty, then $u_{\sigma(k)}$ and $u_{\sigma(i)}$ were not in the same component.
   818 $(\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
   574 
   819 V'_{small})\Rightarrow u_{\sigma(i)}$ is not the first covered node in
   575 Now, let us show that if a node sequence has the special structure described above, it has to be matching order.\\ $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ is a total ordering, which determines the matching order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$.
   820 $G_{small}\ \Rightarrow u_{\sigma(i)}$ is connected to a covered node
       
   821 $u_{\sigma(k)}$ where $k<i$, since $u_{\sigma(i)}\in T_{small}(s)$ for
       
   822 some $s$ and $T_{small}(s)\subseteq{V'_{small}}$ contains only nodes
       
   823 connected to at least one covered node. It's easy to see, that
       
   824 $\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}$, since
       
   825 $T_{small}(s)$ contains only nodes connected to some covered ones,
       
   826 while it is not empty, but if it were empty, then $u_{\sigma(k)}$ and
       
   827 $u_{\sigma(i)}$ were not in the same component.
       
   828 
       
   829 Now, let us show that if a node sequence has the special structure
       
   830 described above, it has to be matching
       
   831 order.\\ $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ is
       
   832 a total ordering, which determines the matching order
       
   833 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$.
   576 \end{proof}
   834 \end{proof}
   577 
   835 
   578 To summing up, a total ordering always uniquely determines a matching order, and every matching order can be determined by a total ordering, however, more than one different total orderings may determine the same matching order.
   836 To summing up, a total ordering always uniquely determines a matching
       
   837 order, and every matching order can be determined by a total ordering,
       
   838 however, more than one different total orderings may determine the
       
   839 same matching order.
   579 \subsection{Idea behind the algorithm}
   840 \subsection{Idea behind the algorithm}
   580 The goal is to find a matching order in which the algorithm is able to recognize inconsistency or prune the infeasible branches on the highest levels and goes deep only if it is needed.
   841 The goal is to find a matching order in which the algorithm is able to
       
   842 recognize inconsistency or prune the infeasible branches on the
       
   843 highest levels and goes deep only if it is needed.
   581 
   844 
   582 \begin{notation}
   845 \begin{notation}
   583 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the number of neighbours of u which are in H, where $u\in V_{small} $ and $H\subseteq V_{small}$.
   846 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the
       
   847 number of neighbours of u which are in H, where $u\in V_{small} $ and
       
   848 $H\subseteq V_{small}$.
   584 \end{notation}
   849 \end{notation}
   585 
   850 
   586 The principal question is the following. Suppose a state $s$ is given. For which node of $T_{small}(s)$ is the hardest to find a consistent pair in $G_{large}$? The more covered neighbours a node in $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has ---, the more rarely satisfiable consistency constraints for its pair are given.
   851 The principal question is the following. Suppose a state $s$ is
   587 
   852 given. For which node of $T_{small}(s)$ is the hardest to find a
   588 In biology, most of the graphs are sparse, thus several nodes in $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes reasonable to define a secondary and a tertiary order between them.
   853 consistent pair in $G_{large}$? The more covered neighbours a node in
   589 The observation above proves itself to be as determining, that the secondary ordering prefers nodes with the most uncovered neighbours among which have the same $Conn_{M_{small}(s)}$ to increase $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible.
   854 $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has
   590 The tertiary ordering prefers nodes having the rarest uncovered labels.
   855 ---, the more rarely satisfiable consistency constraints for its pair
   591 
   856 are given.
   592 Note that the secondary ordering is the same as the ordering by $deg$, which is a static data in front of the above used.
   857 
   593 
   858 In biology, most of the graphs are sparse, thus several nodes in
   594 These rules can easily result in a matching order which contains the nodes of a long path successively, whose nodes may have low $Conn$ and is easily matchable into $G_{large}$. To avoid that, a BFS order is used, which provides the shortest possible paths.
   859 $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes
       
   860 reasonable to define a secondary and a tertiary order between them.
       
   861 The observation above proves itself to be as determining, that the
       
   862 secondary ordering prefers nodes with the most uncovered neighbours
       
   863 among which have the same $Conn_{M_{small}(s)}$ to increase
       
   864 $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible.  The
       
   865 tertiary ordering prefers nodes having the rarest uncovered labels.
       
   866 
       
   867 Note that the secondary ordering is the same as the ordering by $deg$,
       
   868 which is a static data in front of the above used.
       
   869 
       
   870 These rules can easily result in a matching order which contains the
       
   871 nodes of a long path successively, whose nodes may have low $Conn$ and
       
   872 is easily matchable into $G_{large}$. To avoid that, a BFS order is
       
   873 used, which provides the shortest possible paths.
   595 \newline
   874 \newline
   596 
   875 
   597 In the following, some examples on which the VF2 may be slow are described, although they are easily solvable by using a proper matching order.
   876 In the following, some examples on which the VF2 may be slow are
       
   877 described, although they are easily solvable by using a proper
       
   878 matching order.
   598 
   879 
   599 \begin{example}
   880 \begin{example}
   600 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways without node labels. Let $u\in V_{small}$ and $v\in V_{large}$.
   881 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways
       
   882 without node labels. Let $u\in V_{small}$ and $v\in V_{large}$.
   601 \newline
   883 \newline
   602 $lab(u):=black$
   884 $lab(u):=black$
   603 \newline
   885 \newline
   604 $lab(v):=black$
   886 $lab(v):=black$
   605 \newline
   887 \newline
   606 $lab(\tilde{u}):=red \  \forall \tilde{u}\in (V_{small}\backslash \{u\})$
   888 $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash
       
   889 \{u\})$
   607 \newline
   890 \newline
   608 $lab(\tilde{v}):=red \  \forall \tilde{v}\in (V_{large}\backslash \{v\})$
   891 $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash
       
   892 \{v\})$
   609 \newline
   893 \newline
   610 
   894 
   611 Now, any mapping by the node label $lab$ must contain $(u,v)$, since $u$ is black and no node in $V_{large}$ has a black label except $v$. If unfortunately $u$ were the last node which will get covered, VF2 would check only in the last steps, whether $u$ can be matched to $v$.
   895 Now, any mapping by the node label $lab$ must contain $(u,v)$, since
       
   896 $u$ is black and no node in $V_{large}$ has a black label except
       
   897 $v$. If unfortunately $u$ were the last node which will get covered,
       
   898 VF2 would check only in the last steps, whether $u$ can be matched to
       
   899 $v$.
   612 \newline
   900 \newline
   613 However, had $u$ been the first matched node, u would have been matched immediately to v, so all the mappings would have been precluded in which node labels can not correspond.
   901 However, had $u$ been the first matched node, u would have been
       
   902 matched immediately to v, so all the mappings would have been
       
   903 precluded in which node labels can not correspond.
   614 \end{example}
   904 \end{example}
   615 
   905 
   616 \begin{example}
   906 \begin{example}
   617 Suppose there is no node label given, $G_{small}$ is a small graph and can not be mapped into $G_{large}$ and $u\in V_{small}$.
   907 Suppose there is no node label given, $G_{small}$ is a small graph and
       
   908 can not be mapped into $G_{large}$ and $u\in V_{small}$.
   618 \newline
   909 \newline
   619 Let $G'_{small}:=(V_{small}\cup \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is, $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint from $G_{small}$ and one of its starting points is connected to $u\in V_{small}\}$.
   910 Let $G'_{small}:=(V_{small}\cup
       
   911 \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup
       
   912 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
       
   913 $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint
       
   914 from $G_{small}$ and one of its starting points is connected to $u\in
       
   915 V_{small}\}$.
   620 \newline
   916 \newline
   621 Is there a subgraph of $G_{large}$, which is isomorph with $G'_{small}$?
   917 Is there a subgraph of $G_{large}$, which is isomorph with
       
   918 $G'_{small}$?
   622 \newline
   919 \newline
   623 If unfortunately the nodes of the path were the first $k$ nodes in the matching order, the algorithm would iterate through all the possible k long paths in $G_{large}$, and it would recognize that no path can be extended to $G'_{small}$.
   920 If unfortunately the nodes of the path were the first $k$ nodes in the
       
   921 matching order, the algorithm would iterate through all the possible k
       
   922 long paths in $G_{large}$, and it would recognize that no path can be
       
   923 extended to $G'_{small}$.
   624 \newline
   924 \newline
   625 However, had it started by the matching of $G_{small}$, it would not have matched any nodes of the path.
   925 However, had it started by the matching of $G_{small}$, it would not
       
   926 have matched any nodes of the path.
   626 \end{example}
   927 \end{example}
   627 
   928 
   628 These examples may look artificial, but the same problems also appear in real-world examples, even though in a less obvious way.
   929 These examples may look artificial, but the same problems also appear
       
   930 in real-world examples, even though in a less obvious way.
   629 
   931 
   630 \subsection{Total ordering}
   932 \subsection{Total ordering}
   631 Instead of the total ordering relation, the matching order will be searched directly.
   933 Instead of the total ordering relation, the matching order will be
       
   934 searched directly.
   632 \begin{notation}
   935 \begin{notation}
   633 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} : l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ , where $l$ is a label and $\mathcal{M}\subseteq V_{small}$.
   936 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} :
       
   937 l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ ,
       
   938 where $l$ is a label and $\mathcal{M}\subseteq V_{small}$.
   634 \end{notation}
   939 \end{notation}
   635 
   940 
   636 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
   941 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
   637 \end{definition}
   942 \end{definition}
   638 
   943 
   639 \begin{algorithm}
   944 \begin{algorithm}
   640 \algtext*{EndIf}%ne nyomtasson end if-et
   945 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
   641 \algtext*{EndFor}%ne nyomtasson ..
   946 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
   642 \algtext*{EndProcedure}%ne nyomtasson ..
       
   643 \algtext*{EndWhile}
   947 \algtext*{EndWhile}
   644 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
   948 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
   645 \begin{algorithmic}[1]
   949 \begin{algorithmic}[1]
   646 \Procedure{VF2++order}{}
   950 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$
   647   \State $\mathcal{M}$ := $\emptyset$ \Comment{matching order}
   951 \Comment{matching order} \While{$V_{small}\backslash \mathcal{M}
   648   \While{$V_{small}\backslash \mathcal{M} \neq\emptyset$}
   952   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
   649   \State $r\in$ arg max$_{deg}$ (arg min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash \mathcal{M})$)\label{alg:findMin}
   953 min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash
   650   \State Compute $T$, a BFS tree with root node $r$.
   954 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
   651   \For{$d=0,1,...,depth(T)$}
   955 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
   652   \State $V_d$:=nodes of the $d$-th level
   956 $d$-th level \State Process $V_d$ \Comment{See Algorithm
   653   \State Process $V_d$ \Comment{See Algorithm \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2})}
   957   \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2})} \EndFor
   654   \EndFor
   958 \EndWhile \EndProcedure
   655   \EndWhile
       
   656 \EndProcedure
       
   657 \end{algorithmic}
   959 \end{algorithmic}
   658 \end{algorithm}
   960 \end{algorithm}
   659 
   961 
   660 \begin{algorithm}
   962 \begin{algorithm}
   661 \algtext*{EndIf}%ne nyomtasson end if-et
   963 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
   662 \algtext*{EndFor}%ne nyomtasson ..
   964 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
   663 \algtext*{EndProcedure}%ne nyomtasson ..
       
   664 \algtext*{EndWhile}
   965 \algtext*{EndWhile}
   665 \caption{\hspace{.5cm}$A\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
   966 \caption{\hspace{.5cm}$A\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
   666 \begin{algorithmic}[1]
   967 \begin{algorithmic}[1]
   667 \Procedure{VF2++ProcessLevel1}{$V_{d}$}
   968 \Procedure{VF2++ProcessLevel1}{$V_{d}$} \While{$V_d\neq\emptyset$}
   668   \While{$V_d\neq\emptyset$}
   969 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg
   669   \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg max$_{Conn_{\mathcal{M}}}(V_{d})))$
   970 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
   670   \State $V_d:=V_d\backslash m$
   971 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
   671   \State Append node $m$ to the end of $\mathcal{M}$
   972 $F_\mathcal{M}$ \EndWhile \EndProcedure
   672   \State Refresh $F_\mathcal{M}$
       
   673   \EndWhile
       
   674 \EndProcedure
       
   675 \end{algorithmic}
   973 \end{algorithmic}
   676 \end{algorithm}
   974 \end{algorithm}
   677 
   975 
   678 \begin{algorithm}
   976 \begin{algorithm}
   679 \algtext*{EndIf}%ne nyomtasson end if-et
   977 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
   680 \algtext*{EndFor}%ne nyomtasson ..
   978 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
   681 \algtext*{EndProcedure}%ne nyomtasson ..
       
   682 \algtext*{EndWhile}
   979 \algtext*{EndWhile}
   683 \caption{\hspace{0.5cm}$Another\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess2}
   980 \caption{\hspace{0.5cm}$Another\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess2}
   684 \begin{algorithmic}[1]
   981 \begin{algorithmic}[1]
   685 \Procedure{VF2++ProcessLevel2}{$V_{d}$}
   982 \Procedure{VF2++ProcessLevel2}{$V_{d}$} \State Sort $V_d$ in
   686   \State Sort $V_d$ in descending lex. order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$
   983 descending lex. order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$
   687   \State Append the sorted $V_d$ to the end of $\mathcal{M}$
   984 \State Append the sorted $V_d$ to the end of $\mathcal{M}$ \State
   688   \State Refresh $F_\mathcal{M}$
   985 Refresh $F_\mathcal{M}$ \EndProcedure
   689 \EndProcedure
       
   690 \end{algorithmic}
   986 \end{algorithmic}
   691 \end{algorithm}
   987 \end{algorithm}
   692 \textbf{Algorithm~\ref{alg:VF2PPPseu})} is a high level description of the matching order procedure of VF2++. It computes a BFS tree for each component in ascending order of their rarest $lab$ and largest $deg$, whose root vertex is the component's minimal node. \textbf{Algorithm \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} are two different methods to process a level of the BFS tree.
   988 \textbf{Algorithm~\ref{alg:VF2PPPseu})} is a high level description of
   693 
   989 the matching order procedure of VF2++. It computes a BFS tree for each
   694 After sorting the nodes of the current level in descending lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$, \textbf{Algorithm \ref{alg:VF2PPProcess2})} appends them simultaneously to the matching order $\mathcal{M}$ and refreshes $F_\mathcal{M}$ just once, whereas \textbf{Algorithm \ref{alg:VF2PPProcess1})} appends the nodes separately to $\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it provides up-to-date label information and may result in a more efficient matching order.
   990 component in ascending order of their rarest $lab$ and largest $deg$,
   695 
   991 whose root vertex is the component's minimal node. \textbf{Algorithm
   696 \textbf{Claim~\ref{claim:MOclaim})} shows that \textbf{Algorithm \ref{alg:VF2PPPseu})} provides a matching order.
   992   \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} are
       
   993 two different methods to process a level of the BFS tree.
       
   994 
       
   995 After sorting the nodes of the current level in descending
       
   996 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$,
       
   997 \textbf{Algorithm \ref{alg:VF2PPProcess2})} appends them
       
   998 simultaneously to the matching order $\mathcal{M}$ and refreshes
       
   999 $F_\mathcal{M}$ just once, whereas \textbf{Algorithm
       
  1000   \ref{alg:VF2PPProcess1})} appends the nodes separately to
       
  1001 $\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it
       
  1002 provides up-to-date label information and may result in a more
       
  1003 efficient matching order.
       
  1004 
       
  1005 \textbf{Claim~\ref{claim:MOclaim})} shows that \textbf{Algorithm
       
  1006   \ref{alg:VF2PPPseu})} provides a matching order.
   697 
  1007 
   698 
  1008 
   699 \subsection{Cutting rules}
  1009 \subsection{Cutting rules}
   700 \label{VF2PPCuttingRules}
  1010 \label{VF2PPCuttingRules}
   701 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels.
  1011 This section presents the cutting rules of VF2++, which are improved
       
  1012 by using extra information coming from the node labels.
   702 \begin{notation}
  1013 \begin{notation}
   703 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in V_{large}$ and $l$ is a label.
  1014 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l
       
  1015 \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and
       
  1016 $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge
       
  1017 \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in
       
  1018 V_{large}$ and $l$ is a label.
   704 \end{notation}
  1019 \end{notation}
   705 
  1020 
   706 \subsubsection{Induced subgraph isomorphism}
  1021 \subsubsection{Induced subgraph isomorphism}
   707 \begin{claim}
  1022 \begin{claim}
   708 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND.
  1023 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND.
   709 \end{claim}
  1024 \end{claim}
   710 \begin{proof}
  1025 \begin{proof}
   711 It has to be shown, that $LabCut_{IND}((u,v),M(s))=true\Rightarrow$ the mapping can not be extended to a whole mapping.\\
  1026 It has to be shown, that $LabCut_{IND}((u,v),M(s))=true\Rightarrow$
   712 $LabCut_{IND}((u,v),M(s))=true,$ iff the following holds. $\\\exists l: |\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$.
  1027 the mapping can not be extended to a whole
   713 
  1028 mapping.\\ $LabCut_{IND}((u,v),M(s))=true,$ iff the following
   714 Suppose that $|\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)|$. Each node of $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$ has to be matched to a node in $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$, so $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$ can not be smaller than $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$. That is why $M(s)$ can not be extended to a whole mapping.
  1029 holds. $\\\exists l: |\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| <
   715 
  1030 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)| \vee
   716 Otherwise $|\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$ has to be true. Similarly, each node of $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ has to be matched to a node in $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, i.e. $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ can not be smaller than $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, if $M(s)$ is extendible.
  1031 |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| <
       
  1032 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$.
       
  1033 
       
  1034 Suppose that $|\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| <
       
  1035 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)|$. Each node of
       
  1036 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$ has to be matched to a
       
  1037 node in $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$, so
       
  1038 $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$ can not be smaller than
       
  1039 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$. That is why $M(s)$ can
       
  1040 not be extended to a whole mapping.
       
  1041 
       
  1042 Otherwise $|\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| <
       
  1043 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$ has to be
       
  1044 true. Similarly, each node of $\Gamma_{large}^{l}(v)\cap
       
  1045 \tilde{T}_{large}(s)$ has to be matched to a node in
       
  1046 $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$,
       
  1047 i.e. $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ can not be
       
  1048 smaller than $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, if
       
  1049 $M(s)$ is extendible.
   717 \end{proof}
  1050 \end{proof}
   718 The following claims can be proven similarly.
  1051 The following claims can be proven similarly.
   719 \subsubsection{Graph isomorphism}
  1052 \subsubsection{Graph isomorphism}
   720 \begin{claim}
  1053 \begin{claim}
   721 \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO.
  1054 \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO.
   727 \end{claim}
  1060 \end{claim}
   728 
  1061 
   729 
  1062 
   730 
  1063 
   731 \subsection{Implementation details}
  1064 \subsection{Implementation details}
   732 This section provides a detailed summary of an efficient implementation of VF2++.
  1065 This section provides a detailed summary of an efficient
       
  1066 implementation of VF2++.
   733 \subsubsection{Storing a mapping}
  1067 \subsubsection{Storing a mapping}
   734 After fixing an arbitrary node order ($u_0, u_1, .., u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store the current mapping in the following way.
  1068 After fixing an arbitrary node order ($u_0, u_1, ..,
       
  1069 u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store
       
  1070 the current mapping in the following way.
   735 \[
  1071 \[
   736  M[i] = 
  1072  M[i] =
   737   \begin{cases} 
  1073   \begin{cases} 
   738    v & if\ (u_i,v)\ is\ in\ the\ mapping\\
  1074    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INVALID &
   739    INVALID & if\ no\ node\ has\ been\ mapped\ to\ u_i.
  1075    if\ no\ node\ has\ been\ mapped\ to\ u_i.
   740   \end{cases}
  1076   \end{cases}
   741 \]
  1077 \]
   742 Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$ means "no node".
  1078 Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$
       
  1079 means "no node".
   743 \subsubsection{Avoiding the recurrence}
  1080 \subsubsection{Avoiding the recurrence}
   744 Exploring the state space was described in a recursive fashion using sets (see \textbf{Algorithm~\ref{alg:VF2Pseu})}), which makes the algorithm easy to understand but does not show directly an efficient way of implementation. The following approach avoiding recursion and using lookup tables instead of sets is not only fast but has linear space complexity as well.
  1081 Exploring the state space was described in a recursive fashion using
   745 
  1082 sets (see \textbf{Algorithm~\ref{alg:VF2Pseu})}), which makes the
   746 The recursion of \textbf{Algorithm~\ref{alg:VF2Pseu})} can be realized as a while loop, which has a loop counter $depth$ denoting the all-time depth of the recursion.
  1083 algorithm easy to understand but does not show directly an efficient
   747 Fixing a matching order, let $M$ denote the array storing the all-time mapping.
  1084 way of implementation. The following approach avoiding recursion and
   748 The initial state is associated with the empty mapping, which means that $\forall i: M[i]=INVALID$ and $depth=0$.
  1085 using lookup tables instead of sets is not only fast but has linear
   749 In case of a recursive call, $depth$ has to be incremented, while in case of a return, it has to be decremented.
  1086 space complexity as well.
   750 Based on \textbf{Claim~\ref{claim:claimCoverFromLeft})}, $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before $depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and $\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes while the state is being processed, but the property is held before both stepping back to a predecessor state and exploring a successor state.
  1087 
   751 
  1088 The recursion of \textbf{Algorithm~\ref{alg:VF2Pseu})} can be realized
   752 The necessary part of the candidate set is easily maintainable or computable by following \textbf{Section~\ref{candidateComputingVF2})}. A much faster method has been designed for biological- and sparse graphs, see the next section for details.
  1089 as a while loop, which has a loop counter $depth$ denoting the
       
  1090 all-time depth of the recursion.  Fixing a matching order, let $M$
       
  1091 denote the array storing the all-time mapping.  The initial state is
       
  1092 associated with the empty mapping, which means that $\forall i:
       
  1093 M[i]=INVALID$ and $depth=0$.  In case of a recursive call, $depth$ has
       
  1094 to be incremented, while in case of a return, it has to be
       
  1095 decremented.  Based on \textbf{Claim~\ref{claim:claimCoverFromLeft})},
       
  1096 $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before
       
  1097 $depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and
       
  1098 $\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes
       
  1099 while the state is being processed, but the property is held before
       
  1100 both stepping back to a predecessor state and exploring a successor
       
  1101 state.
       
  1102 
       
  1103 The necessary part of the candidate set is easily maintainable or
       
  1104 computable by following
       
  1105 \textbf{Section~\ref{candidateComputingVF2})}. A much faster method
       
  1106 has been designed for biological- and sparse graphs, see the next
       
  1107 section for details.
   753 
  1108 
   754 \subsubsection{Calculating the candidates for a node}
  1109 \subsubsection{Calculating the candidates for a node}
   755 Being aware of \textbf{Claim~\ref{claim:claimCoverFromLeft})}, the task is not to maintain the candidate set, but to generate the candidate nodes in $G_{large}$ for a given node $u\in V_{small}$.
  1110 Being aware of \textbf{Claim~\ref{claim:claimCoverFromLeft})}, the
   756 In case of an expanding problem type and $M$ mapping, if a node $v\in V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall u'\in V_{small} : (u,u')\in E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in E_{large}$. That is, each covered neighbour of $u$ has to be mapped to a covered neighbour of $v$.
  1111 task is not to maintain the candidate set, but to generate the
   757 
  1112 candidate nodes in $G_{large}$ for a given node $u\in V_{small}$.  In
   758 Having said that, an algorithm running in $\Theta(deg)$ time is describable if there exists a covered node in the component containing $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily --- such a node exists based on \textbf{Claim~\ref{claim:MOclaim})}. With all the candidates of $u$ being among the uncovered neighbours of $Pair(M,u')$, there are solely $deg(Pair(M,u'))$ nodes to check.
  1113 case of an expanding problem type and $M$ mapping, if a node $v\in
   759 
  1114 V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall
   760 An easy trick is to choose an $u'$, for which $|\{uncovered\ neighbours\ $ $of\ Pair(M,u')\}|$ is the smallest possible.
  1115 u'\in V_{small} : (u,u')\in
   761 
  1116 E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in
   762 Note that if $u$ is the first node of its component, then all the uncovered nodes of $G_{large}$ are candidates, so giving a sublinear method is impossible.
  1117 E_{large}$. That is, each covered neighbour of $u$ has to be mapped to
       
  1118 a covered neighbour of $v$.
       
  1119 
       
  1120 Having said that, an algorithm running in $\Theta(deg)$ time is
       
  1121 describable if there exists a covered node in the component containing
       
  1122 $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily
       
  1123 --- such a node exists based on
       
  1124 \textbf{Claim~\ref{claim:MOclaim})}. With all the candidates of $u$
       
  1125 being among the uncovered neighbours of $Pair(M,u')$, there are solely
       
  1126 $deg(Pair(M,u'))$ nodes to check.
       
  1127 
       
  1128 An easy trick is to choose an $u'$, for which
       
  1129 $|\{uncovered\ neighbours\ $ $of\ Pair(M,u')\}|$ is the smallest
       
  1130 possible.
       
  1131 
       
  1132 Note that if $u$ is the first node of its component, then all the
       
  1133 uncovered nodes of $G_{large}$ are candidates, so giving a sublinear
       
  1134 method is impossible.
   763 
  1135 
   764 
  1136 
   765 \subsubsection{Determining the node order}
  1137 \subsubsection{Determining the node order}
   766 This section describes how the node order preprocessing method of VF2++ can efficiently be implemented.
  1138 This section describes how the node order preprocessing method of
   767 
  1139 VF2++ can efficiently be implemented.
   768 For using lookup tables, the node labels are associated with the numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It enables $F_\mathcal{M}$ to be stored in an array, for which $F_\mathcal{M}[i]=F_\mathcal{M}(i)$ where $i=0,1,..,|K|-1$. At first, $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes in $V_{small}$ having label i, which is easy to compute in $\Theta(|V_{small}|)$ steps.
  1140 
   769 
  1141 For using lookup tables, the node labels are associated with the
   770 $\mathcal{M}\subseteq V_{small}$ can be represented as an array of size $|V_{small}|$.
  1142 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
   771 
  1143 enables $F_\mathcal{M}$ to be stored in an array, for which
   772 The BFS tree is computed by using a FIFO data structure which is usually implemented as a linked list, but one can avoid it by using the array $\mathcal{M}$ itself. $\mathcal{M}$ contains all the nodes seen before, a pointer shows where the first node of the FIFO is, and another one shows where the next explored node has to be inserted. So the nodes of each level of the BFS tree can be processed by \textbf{Algorithm \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} in place by swapping nodes.
  1144 $F_\mathcal{M}[i]=F_\mathcal{M}(i)$ where $i=0,1,..,|K|-1$. At first,
   773 
  1145 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
   774 After a node $u$ gets to the next place of the node order, $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is one less covered node in $V_{large}$ with label $lab(u)$, that is why min selection sort is preferred which gives the elements from left to right in descending order, see \textbf{Algorithm \ref{alg:VF2PPProcess1})}.
  1146 in $V_{small}$ having label i, which is easy to compute in
   775 
  1147 $\Theta(|V_{small}|)$ steps.
   776 Note that using a $\Theta(n^2)$ sort absolutely does not slow down the procedure on biological (and on sparse) graphs, since they have few nodes on a level. If a level had a large number of nodes, \textbf{Algorithm \ref{alg:VF2PPProcess2})} would seem to be a better choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not be immediately refreshed, so it is unable to provide up-to-date label information. 
  1148 
   777 
  1149 $\mathcal{M}\subseteq V_{small}$ can be represented as an array of
   778 Note that the \textit{while loop} of \textbf{Algorithm \ref{alg:VF2PPPseu})} takes one iteration per graph component and the graphs in biology are mostly connected.
  1150 size $|V_{small}|$.
       
  1151 
       
  1152 The BFS tree is computed by using a FIFO data structure which is
       
  1153 usually implemented as a linked list, but one can avoid it by using
       
  1154 the array $\mathcal{M}$ itself. $\mathcal{M}$ contains all the nodes
       
  1155 seen before, a pointer shows where the first node of the FIFO is, and
       
  1156 another one shows where the next explored node has to be inserted. So
       
  1157 the nodes of each level of the BFS tree can be processed by
       
  1158 \textbf{Algorithm \ref{alg:VF2PPProcess1})} and
       
  1159 \textbf{\ref{alg:VF2PPProcess2})} in place by swapping nodes.
       
  1160 
       
  1161 After a node $u$ gets to the next place of the node order,
       
  1162 $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is
       
  1163 one less covered node in $V_{large}$ with label $lab(u)$, that is why
       
  1164 min selection sort is preferred which gives the elements from left to
       
  1165 right in descending order, see \textbf{Algorithm
       
  1166   \ref{alg:VF2PPProcess1})}.
       
  1167 
       
  1168 Note that using a $\Theta(n^2)$ sort absolutely does not slow down the
       
  1169 procedure on biological (and on sparse) graphs, since they have few
       
  1170 nodes on a level. If a level had a large number of nodes,
       
  1171 \textbf{Algorithm \ref{alg:VF2PPProcess2})} would seem to be a better
       
  1172 choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the
       
  1173 efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not
       
  1174 be immediately refreshed, so it is unable to provide up-to-date label
       
  1175 information.
       
  1176 
       
  1177 Note that the \textit{while loop} of \textbf{Algorithm
       
  1178   \ref{alg:VF2PPPseu})} takes one iteration per graph component and
       
  1179 the graphs in biology are mostly connected.
   779 \subsubsection{Cutting rules}
  1180 \subsubsection{Cutting rules}
   780 In \textbf{Section \ref{VF2PPCuttingRules})}, the cutting rules were described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$ and $\tilde T_{large}$, which are dependent on the all-time mapping (i.e. on the all-time state). The aim is to check the labeled cutting rules of VF2++ in $\Theta(deg)$ time.
  1181 In \textbf{Section \ref{VF2PPCuttingRules})}, the cutting rules were
   781 
  1182 described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$
   782 Firstly, suppose that these four sets are given in such a way, that checking whether a node is in a certain set takes constant time, e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an initially zero integer lookup table of size $|K|$. After incrementing $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap T_{large}(s)$, the first part of the cutting rules is checkable in $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$ to zero takes $\Theta(deg)$ time again, which makes it possible to use the same table through the whole algorithm.
  1183 and $\tilde T_{large}$, which are dependent on the all-time mapping
   783 The second part of the cutting rules can be verified using the same method with $\tilde T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and $T_{large}$. Thus, the overall complexity is $\Theta(deg)$.
  1184 (i.e. on the all-time state). The aim is to check the labeled cutting
   784 
  1185 rules of VF2++ in $\Theta(deg)$ time.
   785 An other integer lookup table storing the number of covered neighbours of each node in $G_{large}$ gives all the information about the sets $T_{large}$ and $\tilde T_{large}$, which is maintainable in $\Theta(deg)$ time when a pair is added or substracted by incrementing or decrementing the proper indices. A further improvement is that the values of $L[lab(u')]$ in case of checking $u$ is dependent only on $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an array of pairs (label, number of such labels) can be stored to skip the maintaining operations. Note that these arrays are at most of size $deg$. Skipping this trick, the number of covered neighbours has to be stored for each node of $G_{small}$ as well to get the sets $T_{small}$ and $\tilde T_{small}$.
  1186 
   786 
  1187 Firstly, suppose that these four sets are given in such a way, that
   787 Using similar tricks, the consistency function can be evaluated in $\Theta(deg)$ steps, as well.
  1188 checking whether a node is in a certain set takes constant time,
       
  1189 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
       
  1190 initially zero integer lookup table of size $|K|$. After incrementing
       
  1191 $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and
       
  1192 decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap
       
  1193 T_{large}(s)$, the first part of the cutting rules is checkable in
       
  1194 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
       
  1195 to zero takes $\Theta(deg)$ time again, which makes it possible to use
       
  1196 the same table through the whole algorithm.  The second part of the
       
  1197 cutting rules can be verified using the same method with $\tilde
       
  1198 T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and
       
  1199 $T_{large}$. Thus, the overall complexity is $\Theta(deg)$.
       
  1200 
       
  1201 An other integer lookup table storing the number of covered neighbours
       
  1202 of each node in $G_{large}$ gives all the information about the sets
       
  1203 $T_{large}$ and $\tilde T_{large}$, which is maintainable in
       
  1204 $\Theta(deg)$ time when a pair is added or substracted by incrementing
       
  1205 or decrementing the proper indices. A further improvement is that the
       
  1206 values of $L[lab(u')]$ in case of checking $u$ is dependent only on
       
  1207 $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an
       
  1208 array of pairs (label, number of such labels) can be stored to skip
       
  1209 the maintaining operations. Note that these arrays are at most of size
       
  1210 $deg$. Skipping this trick, the number of covered neighbours has to be
       
  1211 stored for each node of $G_{small}$ as well to get the sets
       
  1212 $T_{small}$ and $\tilde T_{small}$.
       
  1213 
       
  1214 Using similar tricks, the consistency function can be evaluated in
       
  1215 $\Theta(deg)$ steps, as well.
   788 
  1216 
   789 \section{The VF2 Plus Algorithm}
  1217 \section{The VF2 Plus Algorithm}
   790 The VF2 Plus algorithm is a recently improved version of VF2. It was compared with the state of the art algorithms in \cite{VF2Plus} and has proven itself to be competitive with RI, the best algorithm on biological graphs.
  1218 The VF2 Plus algorithm is a recently improved version of VF2. It was
   791 \\
  1219 compared with the state of the art algorithms in \cite{VF2Plus} and
   792 A short summary of VF2 Plus follows, which uses the notation and the conventions of the original paper.
  1220 has proven itself to be competitive with RI, the best algorithm on
       
  1221 biological graphs.  \\ A short summary of VF2 Plus follows, which uses
       
  1222 the notation and the conventions of the original paper.
   793 
  1223 
   794 \subsection{Ordering procedure}
  1224 \subsection{Ordering procedure}
   795 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$ with the lowest probability to find a pair in $V_{small}$ and the highest number of connections with the nodes already sorted by the algorithm.
  1225 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$
       
  1226 with the lowest probability to find a pair in $V_{small}$ and the
       
  1227 highest number of connections with the nodes already sorted by the
       
  1228 algorithm.
   796 
  1229 
   797 \begin{definition}
  1230 \begin{definition}
   798 $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$. 
  1231 $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and
       
  1232   $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$.
   799 \end{definition}
  1233 \end{definition}
   800 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in $V_{large}$
  1234 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in
       
  1235 $V_{large}$
   801 \newline
  1236 \newline
   802 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in $V_{large}$
  1237 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in
       
  1238 $V_{large}$
   803 \newline
  1239 \newline
   804 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\
  1240 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of
   805 $M$ is the set of already sorted nodes, $T$ is the set of nodes candidate to be selected, and $degreeM$ of a node is the number of its neighbours in $M$.
  1241 already sorted nodes, $T$ is the set of nodes candidate to be
       
  1242 selected, and $degreeM$ of a node is the number of its neighbours in
       
  1243 $M$.
   806 \begin{algorithm}
  1244 \begin{algorithm}
   807 \algtext*{EndIf}%ne nyomtasson end if-et
  1245 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
   808 \algtext*{EndFor}%ne nyomtasson ..
  1246 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
   809 \algtext*{EndProcedure}%ne nyomtasson ..
       
   810 \algtext*{EndWhile}
  1247 \algtext*{EndWhile}
   811 \caption{}\label{alg:VF2PlusPseu}
  1248 \caption{}\label{alg:VF2PlusPseu}
   812 \begin{algorithmic}[1]
  1249 \begin{algorithmic}[1]
   813 \Procedure{VF2 Plus order}{}
  1250 \Procedure{VF2 Plus order}{} \State Select the node with the lowest
   814   \State Select the node with the lowest $P$.
  1251 $P$.  \If {more nodes share the same $P$} \State select the one with
   815     \If {more nodes share the same $P$}
  1252 maximum degree \EndIf \If {more nodes share the same $P$ and have the
   816     \State select the one with maximum degree
  1253   max degree} \State select the first \EndIf \State Put the selected
   817     \EndIf
  1254 node in the set $M$. \label{alg:putIn} \State Put all its unsorted
   818     \If {more nodes share the same $P$ and have the max degree}
  1255 neighbours in the set $T$.  \If {$M\neq V_{small}$} \State From set
   819     \State select the first
  1256 $T$ select the node with maximum $degreeM$.  \If {more nodes have
   820     \EndIf
  1257   maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf
   821   \State Put the selected node in the set $M$. \label{alg:putIn}
  1258 \If {more nodes have maximum $degreeM$ and $P$} \State Select the
   822   \State Put all its unsorted neighbours in the set $T$.
  1259 first.  \EndIf \State \textbf{goto \ref{alg:putIn}.}  \EndIf
   823   \If {$M\neq V_{small}$}
       
   824   \State From set $T$ select the node with maximum $degreeM$.
       
   825   \If {more nodes have maximum $degreeM$}
       
   826     \State Select the one with the lowest $P$
       
   827   \EndIf
       
   828   \If {more nodes have maximum $degreeM$ and $P$}
       
   829     \State Select the first.
       
   830   \EndIf
       
   831   \State \textbf{goto \ref{alg:putIn}.}
       
   832   \EndIf
       
   833 \EndProcedure
  1260 \EndProcedure
   834 \end{algorithmic}
  1261 \end{algorithmic}
   835 \end{algorithm}
  1262 \end{algorithm}
   836 
  1263 
   837 Using these notations, \textbf{Algorithm~\ref{alg:VF2PlusPseu})} provides the description of the sorting procedure.
  1264 Using these notations, \textbf{Algorithm~\ref{alg:VF2PlusPseu})}
   838 
  1265 provides the description of the sorting procedure.
   839 Note that $P(u)$ is not the exact probability of finding a consistent pair for $u$ by choosing a node of $V_{large}$ randomly, since $P_{lab}$ and $P_{deg}$ are not independent, though calculating the real probability would take quadratic time, which may be reduced by using fittingly lookup tables.
  1266 
   840 
  1267 Note that $P(u)$ is not the exact probability of finding a consistent
   841 \newpage
  1268 pair for $u$ by choosing a node of $V_{large}$ randomly, since
       
  1269 $P_{lab}$ and $P_{deg}$ are not independent, though calculating the
       
  1270 real probability would take quadratic time, which may be reduced by
       
  1271 using fittingly lookup tables.
       
  1272 
   842 \section{Experimental results}
  1273 \section{Experimental results}
   843 This section compares the performance of VF2++ and VF2 Plus. Both algorithms have run faster with orders of magnitude than VF2, thus its inclusion was not reasonable.
  1274 This section compares the performance of VF2++ and VF2 Plus. Both
       
  1275 algorithms have run faster with orders of magnitude than VF2, thus its
       
  1276 inclusion was not reasonable.
   844 \subsection{Biological graphs}
  1277 \subsection{Biological graphs}
   845 The tests have been executed on a recent biological dataset created for the International Contest on Pattern Search in Biological Databases\cite{Content}, which has been constructed of Molecule, Protein and Contact Map graphs extracted from the Protein Data Bank\cite{ProteinDataBank}.
  1278 The tests have been executed on a recent biological dataset created
   846 
  1279 for the International Contest on Pattern Search in Biological
   847 The molecule dataset contains small graphs with less than 100 nodes and an average degree of less than 3. The protein dataset contains graphs having 500-10 000 nodes and an average degree of 4, while the contact map dataset contains graphs with 150-800 nodes and an average degree of 20.
  1280 Databases\cite{Content}, which has been constructed of Molecule,
   848 \\
  1281 Protein and Contact Map graphs extracted from the Protein Data
   849 
  1282 Bank\cite{ProteinDataBank}.
   850 In the following, the induced subgraph isomorphism and the graph isomorphism will be examined.
  1283 
       
  1284 The molecule dataset contains small graphs with less than 100 nodes
       
  1285 and an average degree of less than 3. The protein dataset contains
       
  1286 graphs having 500-10 000 nodes and an average degree of 4, while the
       
  1287 contact map dataset contains graphs with 150-800 nodes and an average
       
  1288 degree of 20.  \\
       
  1289 
       
  1290 In the following, the induced subgraph isomorphism and the graph
       
  1291 isomorphism will be examined.
   851 
  1292 
   852 \subsubsection{Induced subgraph isomorphism}
  1293 \subsubsection{Induced subgraph isomorphism}
   853 This dataset contains a set of graph pairs, and \textbf{all} the induced subgraph ismorphisms have to be found between them. \textbf{Figure \ref{fig:INDProt}), \ref{fig:INDContact}),} and \textbf{ \ref{fig:INDMolecule})} show the solution time of the problems in the problem set.
  1294 This dataset contains a set of graph pairs, and \textbf{all} the
       
  1295 induced subgraph ismorphisms have to be found between
       
  1296 them. \textbf{Figure \ref{fig:INDProt}), \ref{fig:INDContact}),} and
       
  1297 \textbf{ \ref{fig:INDMolecule})} show the solution time of the
       
  1298 problems in the problem set.
   854 
  1299 
   855 \begin{figure}[H]
  1300 \begin{figure}[H]
   856 \begin{center}
  1301 \begin{center}
   857 \begin{tikzpicture}
  1302 \begin{tikzpicture}
   858   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1303   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   859   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label        style={/pgf/number format/1000 sep = \thinspace}]
  1304   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   860   %\addplot+[only marks] table {proteinsOrig.txt};
  1305     west},scaled x ticks = false,x tick label style={/pgf/number
   861   \addplot[mark=*,mark size=1.2pt,color=blue] table {Orig/Proteins.256.txt};
  1306     format/1000 sep = \thinspace}] %\addplot+[only marks] table
   862   \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
  1307     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]
       
  1308     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark
       
  1309       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
   863   \end{axis}
  1310   \end{axis}
   864   \end{tikzpicture}
  1311   \end{tikzpicture}
   865 \end{center}
  1312 \end{center}
   866 \vspace*{-0.8cm}
  1313 \vspace*{-0.8cm}
   867 \caption{Both the algorithms have linear behaviour on protein graphs. VF2++ is more than 10 times faster than VF2 Plus.} \label{fig:INDProt}
  1314 \caption{Both the algorithms have linear behaviour on protein
       
  1315   graphs. VF2++ is more than 10 times faster than VF2
       
  1316   Plus.} \label{fig:INDProt}
   868 \end{figure}
  1317 \end{figure}
   869 
  1318 
   870 \begin{figure}[H]
  1319 \begin{figure}[H]
   871 \begin{center}
  1320 \begin{center}
   872 \begin{tikzpicture}
  1321 \begin{tikzpicture}
   873 \begin{axis}[title=Contact Maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1322 \begin{axis}[title=Contact Maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   874 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1323 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1324   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1325   format/1000 sep = \thinspace}]
   875 %\addplot+[only marks] table {proteinsOrig.txt};
  1326 %\addplot+[only marks] table {proteinsOrig.txt};
   876 \addplot table {Orig/ContactMaps.128.txt};
  1327 \addplot table {Orig/ContactMaps.128.txt};
   877 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/ContactMaps.128.txt};
  1328 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1329         {VF2PPLabel/ContactMaps.128.txt};
   878 \end{axis}
  1330 \end{axis}
   879 \end{tikzpicture}
  1331 \end{tikzpicture}
   880 \end{center}
  1332 \end{center}
   881 \vspace*{-0.8cm}
  1333 \vspace*{-0.8cm}
   882 \caption{On Contact Maps, VF2++ runs in near constant time, while VF2 Plus has a near linear behaviour.} \label{fig:INDContact}
  1334 \caption{On Contact Maps, VF2++ runs in near constant time, while VF2
       
  1335   Plus has a near linear behaviour.} \label{fig:INDContact}
   883 \end{figure}
  1336 \end{figure}
   884 
  1337 
   885 \begin{figure}[H]
  1338 \begin{figure}[H]
   886 \begin{center}
  1339 \begin{center}
   887 \begin{tikzpicture}
  1340 \begin{tikzpicture}
   888 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1341 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   889 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1342 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   890 %\addplot+[only marks] table {proteinsOrig.txt};
  1343   west},scaled x ticks = false,x tick label style={/pgf/number
   891 \addplot table {Orig/Molecules.32.txt};
  1344   format/1000 sep = \thinspace}]
   892 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
  1345 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1346 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark
       
  1347   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
   893 \end{axis}
  1348 \end{axis}
   894 \end{tikzpicture}
  1349 \end{tikzpicture}
   895 \end{center}
  1350 \end{center}
   896 \vspace*{-0.8cm}
  1351 \vspace*{-0.8cm}
   897 \caption{In the case of Molecules, the algorithms seem to have a similar behaviour, but VF2++ is almost two times faster even on such small graphs.} \label{fig:INDMolecule}
  1352 \caption{In the case of Molecules, the algorithms seem to have a
       
  1353   similar behaviour, but VF2++ is almost two times faster even on such
       
  1354   small graphs.} \label{fig:INDMolecule}
   898 \end{figure}
  1355 \end{figure}
   899 
  1356 
   900 
  1357 
   901 \subsubsection{Graph ismorphism}
  1358 \subsubsection{Graph ismorphism}
   902 In this experiment, the nodes of each graph in the database have been shuffled and an isomorphism between the shuffled and the original graph has been searched. For runtime results, see \textbf{Figure \ref{fig:ISOProt}), \ref{fig:ISOContact}),} and \textbf{\ref{fig:ISOMolecule})}.
  1359 In this experiment, the nodes of each graph in the database have been
       
  1360 shuffled and an isomorphism between the shuffled and the original
       
  1361 graph has been searched. For runtime results, see \textbf{Figure
       
  1362   \ref{fig:ISOProt}), \ref{fig:ISOContact}),} and
       
  1363 \textbf{\ref{fig:ISOMolecule})}.
   903 \begin{figure}[H]
  1364 \begin{figure}[H]
   904 \begin{center}
  1365 \begin{center}
   905 \begin{tikzpicture}
  1366 \begin{tikzpicture}
   906 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1367 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   907 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1368 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   908 %\addplot+[only marks] table {proteinsOrig.txt};
  1369   west},scaled x ticks = false,x tick label style={/pgf/number
   909 \addplot table {Orig/proteinsIso.txt};
  1370   format/1000 sep = \thinspace}]
   910 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
  1371 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1372 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark
       
  1373   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
   911 \end{axis}
  1374 \end{axis}
   912 \end{tikzpicture}
  1375 \end{tikzpicture}
   913 \end{center}
  1376 \end{center}
   914 \vspace*{-0.8cm}
  1377 \vspace*{-0.8cm}
   915 \caption{On protein graphs, VF2 Plus has a super linear time complexity, while VF2++ runs in near constant time. The difference is about two order of magnitude on large graphs.}\label{fig:ISOProt}
  1378 \caption{On protein graphs, VF2 Plus has a super linear time
       
  1379   complexity, while VF2++ runs in near constant time. The difference
       
  1380   is about two order of magnitude on large graphs.}\label{fig:ISOProt}
   916 \end{figure}
  1381 \end{figure}
   917 
  1382 
   918 \begin{figure}[H]
  1383 \begin{figure}[H]
   919 \begin{center}
  1384 \begin{center}
   920 \begin{tikzpicture}
  1385 \begin{tikzpicture}
   921 \begin{axis}[title=Contact Maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1386 \begin{axis}[title=Contact Maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   922 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1387 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   923 %\addplot+[only marks] table {proteinsOrig.txt};
  1388   west},scaled x ticks = false,x tick label style={/pgf/number
   924 \addplot table {Orig/contactMapsIso.txt};
  1389   format/1000 sep = \thinspace}]
   925 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
  1390 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1391 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark
       
  1392   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
   926 \end{axis}
  1393 \end{axis}
   927 \end{tikzpicture}
  1394 \end{tikzpicture}
   928 \end{center}
  1395 \end{center}
   929 \vspace*{-0.8cm}
  1396 \vspace*{-0.8cm}
   930 \caption{The results are closer to each other on Contact Maps, but VF2++ still performs consistently better.}\label{fig:ISOContact}
  1397 \caption{The results are closer to each other on Contact Maps, but
       
  1398   VF2++ still performs consistently better.}\label{fig:ISOContact}
   931 \end{figure}
  1399 \end{figure}
   932 
  1400 
   933 \begin{figure}[H]
  1401 \begin{figure}[H]
   934 \begin{center}
  1402 \begin{center}
   935 \begin{tikzpicture}
  1403 \begin{tikzpicture}
   936 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1404 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   937 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1405 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   938 %\addplot+[only marks] table {proteinsOrig.txt};
  1406   west},scaled x ticks = false,x tick label style={/pgf/number
   939 \addplot table {Orig/moleculesIso.txt};
  1407   format/1000 sep = \thinspace}]
   940 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
  1408 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1409 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark
       
  1410   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
   941 \end{axis}
  1411 \end{axis}
   942 \end{tikzpicture}
  1412 \end{tikzpicture}
   943 \end{center}
  1413 \end{center}
   944 \vspace*{-0.8cm}
  1414 \vspace*{-0.8cm}
   945 \caption{In the case of Molecules, there is not such a significant difference, but VF2++ seems to be faster as the number of nodes increases.}\label{fig:ISOMolecule}
  1415 \caption{In the case of Molecules, there is not such a significant
       
  1416   difference, but VF2++ seems to be faster as the number of nodes
       
  1417   increases.}\label{fig:ISOMolecule}
   946 \end{figure}
  1418 \end{figure}
   947 
  1419 
   948 
  1420 
   949 \subsection{Random graphs}
  1421 \subsection{Random graphs}
   950 This section compares VF2++ with VF2 Plus on random graphs of a large size. The node labels are uniformly distributed.
  1422 This section compares VF2++ with VF2 Plus on random graphs of a large
   951 Let $\delta$ denote the average degree.
  1423 size. The node labels are uniformly distributed.  Let $\delta$ denote
   952 For the parameters of problems solved in the experiments, please see the top of each chart.
  1424 the average degree.  For the parameters of problems solved in the
       
  1425 experiments, please see the top of each chart.
   953 \subsubsection{Graph isomorphism}
  1426 \subsubsection{Graph isomorphism}
   954 To evaluate the efficiency of the algorithms in the case of graph isomorphism, connected graphs of less than 20 000 nodes have been considered. Generating a random graph and shuffling its nodes, an isomorphism had to be found. \textbf{Figure \ref{fig:randISO5}), \ref{fig:randISO10}), \ref{fig:randISO15}), \ref{fig:randISO35}), \ref{fig:randISO45}),} and \textbf{\ref{fig:randISO100}) } show the runtime results on graph sets of various density.
  1427 To evaluate the efficiency of the algorithms in the case of graph
       
  1428 isomorphism, connected graphs of less than 20 000 nodes have been
       
  1429 considered. Generating a random graph and shuffling its nodes, an
       
  1430 isomorphism had to be found. \textbf{Figure \ref{fig:randISO5}),
       
  1431   \ref{fig:randISO10}), \ref{fig:randISO15}), \ref{fig:randISO35}),
       
  1432   \ref{fig:randISO45}),} and \textbf{\ref{fig:randISO100}) } show the
       
  1433 runtime results on graph sets of various density.
   955 
  1434 
   956 \begin{figure}[H]
  1435 \begin{figure}[H]
   957 \begin{center}
  1436 \begin{center}
   958 \begin{tikzpicture}
  1437 \begin{tikzpicture}
   959 \begin{axis}[title={Random ISO, $\delta = 5$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1438 \begin{axis}[title={Random ISO, $\delta = 5$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   960 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1439 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1440   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1441   format/1000 sep = \thinspace}]
   961 %\addplot+[only marks] table {proteinsOrig.txt};
  1442 %\addplot+[only marks] table {proteinsOrig.txt};
   962 \addplot table {randGraph/iso/vf2pIso5_1.txt};
  1443 \addplot table {randGraph/iso/vf2pIso5_1.txt};
   963 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso5_1.txt};
  1444 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1445         {randGraph/iso/vf2ppIso5_1.txt};
   964 \end{axis}
  1446 \end{axis}
   965 \end{tikzpicture}
  1447 \end{tikzpicture}
   966 \end{center}
  1448 \end{center}
   967 \vspace*{-0.8cm}
  1449 \vspace*{-0.8cm}
   968 \caption{}\label{fig:randISO5}
  1450 \caption{}\label{fig:randISO5}
   970 
  1452 
   971 \begin{figure}[H]
  1453 \begin{figure}[H]
   972 \begin{center}
  1454 \begin{center}
   973 \begin{tikzpicture}
  1455 \begin{tikzpicture}
   974 \begin{axis}[title={Random ISO, $\delta = 10$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1456 \begin{axis}[title={Random ISO, $\delta = 10$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   975 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1457 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1458   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1459   format/1000 sep = \thinspace}]
   976 %\addplot+[only marks] table {proteinsOrig.txt};
  1460 %\addplot+[only marks] table {proteinsOrig.txt};
   977 \addplot table {randGraph/iso/vf2pIso10_1.txt};
  1461 \addplot table {randGraph/iso/vf2pIso10_1.txt};
   978 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso10_1.txt};
  1462 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1463         {randGraph/iso/vf2ppIso10_1.txt};
   979 \end{axis}
  1464 \end{axis}
   980 \end{tikzpicture}
  1465 \end{tikzpicture}
   981 \end{center}
  1466 \end{center}
   982 \vspace*{-0.8cm}
  1467 \vspace*{-0.8cm}
   983 \caption{}\label{fig:randISO10}
  1468 \caption{}\label{fig:randISO10}
   985 
  1470 
   986 \begin{figure}[H]
  1471 \begin{figure}[H]
   987 \begin{center}
  1472 \begin{center}
   988 \begin{tikzpicture}
  1473 \begin{tikzpicture}
   989 \begin{axis}[title={Random ISO, $\delta = 15$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1474 \begin{axis}[title={Random ISO, $\delta = 15$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   990 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1475 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1476   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1477   format/1000 sep = \thinspace}]
   991 %\addplot+[only marks] table {proteinsOrig.txt};
  1478 %\addplot+[only marks] table {proteinsOrig.txt};
   992 \addplot table {randGraph/iso/vf2pIso15_1.txt};
  1479 \addplot table {randGraph/iso/vf2pIso15_1.txt};
   993 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso15_1.txt};
  1480 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1481         {randGraph/iso/vf2ppIso15_1.txt};
   994 \end{axis}
  1482 \end{axis}
   995 \end{tikzpicture}
  1483 \end{tikzpicture}
   996 \end{center}
  1484 \end{center}
   997 \vspace*{-0.8cm}
  1485 \vspace*{-0.8cm}
   998 \caption{}\label{fig:randISO15}
  1486 \caption{}\label{fig:randISO15}
  1000 
  1488 
  1001 \begin{figure}[H]
  1489 \begin{figure}[H]
  1002 \begin{center}
  1490 \begin{center}
  1003 \begin{tikzpicture}
  1491 \begin{tikzpicture}
  1004 \begin{axis}[title={Random ISO, $\delta = 35$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1492 \begin{axis}[title={Random ISO, $\delta = 35$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1005 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1493 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1494   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1495   format/1000 sep = \thinspace}]
  1006 %\addplot+[only marks] table {proteinsOrig.txt};
  1496 %\addplot+[only marks] table {proteinsOrig.txt};
  1007 \addplot table {randGraph/iso/vf2pIso35_1.txt};
  1497 \addplot table {randGraph/iso/vf2pIso35_1.txt};
  1008 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso35_1.txt};
  1498 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1499         {randGraph/iso/vf2ppIso35_1.txt};
  1009 \end{axis}
  1500 \end{axis}
  1010 \end{tikzpicture}
  1501 \end{tikzpicture}
  1011 \end{center}
  1502 \end{center}
  1012 \vspace*{-0.8cm}
  1503 \vspace*{-0.8cm}
  1013 \caption{}\label{fig:randISO35}
  1504 \caption{}\label{fig:randISO35}
  1015 
  1506 
  1016 \begin{figure}[H]
  1507 \begin{figure}[H]
  1017 \begin{center}
  1508 \begin{center}
  1018 \begin{tikzpicture}
  1509 \begin{tikzpicture}
  1019 \begin{axis}[title={Random ISO, $\delta = 45$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1510 \begin{axis}[title={Random ISO, $\delta = 45$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1020 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1511 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1512   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1513   format/1000 sep = \thinspace}]
  1021 %\addplot+[only marks] table {proteinsOrig.txt};
  1514 %\addplot+[only marks] table {proteinsOrig.txt};
  1022 \addplot table {randGraph/iso/vf2pIso45_1.txt};
  1515 \addplot table {randGraph/iso/vf2pIso45_1.txt};
  1023 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso45_1.txt};
  1516 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1517         {randGraph/iso/vf2ppIso45_1.txt};
  1024 \end{axis}
  1518 \end{axis}
  1025 \end{tikzpicture}
  1519 \end{tikzpicture}
  1026 \end{center}
  1520 \end{center}
  1027 \vspace*{-0.8cm}
  1521 \vspace*{-0.8cm}
  1028 \caption{}\label{fig:randISO45}
  1522 \caption{}\label{fig:randISO45}
  1030 
  1524 
  1031 \begin{figure}[H]
  1525 \begin{figure}[H]
  1032 \begin{center}
  1526 \begin{center}
  1033 \begin{tikzpicture}
  1527 \begin{tikzpicture}
  1034 \begin{axis}[title={Random ISO, $\delta = 100$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1528 \begin{axis}[title={Random ISO, $\delta = 100$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1035 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1529 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1530   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1531   format/1000 sep = \thinspace}]
  1036 %\addplot+[only marks] table {proteinsOrig.txt};
  1532 %\addplot+[only marks] table {proteinsOrig.txt};
  1037 \addplot table {randGraph/iso/vf2pIso100_1.txt};
  1533 \addplot table {randGraph/iso/vf2pIso100_1.txt};
  1038 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso100_1.txt};
  1534 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1535         {randGraph/iso/vf2ppIso100_1.txt};
  1039 \end{axis}
  1536 \end{axis}
  1040 \end{tikzpicture}
  1537 \end{tikzpicture}
  1041 \end{center}
  1538 \end{center}
  1042 \vspace*{-0.8cm}
  1539 \vspace*{-0.8cm}
  1043 \caption{}\label{fig:randISO100}
  1540 \caption{}\label{fig:randISO100}
  1044 \end{figure}
  1541 \end{figure}
  1045 
  1542 
  1046 
  1543 
  1047 Considering the graph isomorphism problem, VF2++ consistently outperforms its rival especially on sparse graphs. The reason for the slightly super linear behaviour of VF2++ on denser graphs is the larger number of nodes in the BFS tree constructed in \textbf{Algorithm \ref{alg:VF2PPPseu})}.
  1544 Considering the graph isomorphism problem, VF2++ consistently
       
  1545 outperforms its rival especially on sparse graphs. The reason for the
       
  1546 slightly super linear behaviour of VF2++ on denser graphs is the
       
  1547 larger number of nodes in the BFS tree constructed in
       
  1548 \textbf{Algorithm \ref{alg:VF2PPPseu})}.
  1048 
  1549 
  1049 \subsubsection{Induced subgraph isomorphism}
  1550 \subsubsection{Induced subgraph isomorphism}
  1050 This section provides a comparison of VF2++ and VF2 Plus in the case of induced subgraph isomorphism. In addition to the size of the large graph, that of the small graph dramatically influences the hardness of a given problem too, so the overall picture is provided by examining small graphs of various size.
  1551 This section provides a comparison of VF2++ and VF2 Plus in the case
  1051 
  1552 of induced subgraph isomorphism. In addition to the size of the large
  1052 For each chart, a number $0<\rho< 1$ has been fixed and the following has been executed 150 times. Generating a large graph $G_{large}$, choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes, and for all the 10 subgraphs find a mapping by using both the graph matching algorithms.
  1553 graph, that of the small graph dramatically influences the hardness of
  1053 The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1, 0.3, 0.6, 0.8, 0.95$ cases have been examined (see \textbf{Figure \ref{fig:randIND5}), \ref{fig:randIND10})} and \textbf{\ref{fig:randIND35})}), and for each $\delta$, a cumulative chart is given as well, which excludes $\rho = 0.05$ and $0.1$ for the sake of perspicuity (see \textbf{Figure \ref{fig:randIND5Sum}), \ref{fig:randIND10Sum})} and \textbf{\ref{fig:randIND35Sum})}).
  1554 a given problem too, so the overall picture is provided by examining
       
  1555 small graphs of various size.
       
  1556 
       
  1557 For each chart, a number $0<\rho< 1$ has been fixed and the following
       
  1558 has been executed 150 times. Generating a large graph $G_{large}$,
       
  1559 choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes,
       
  1560 and for all the 10 subgraphs find a mapping by using both the graph
       
  1561 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
       
  1562 0.3, 0.6, 0.8, 0.95$ cases have been examined (see \textbf{Figure
       
  1563   \ref{fig:randIND5}), \ref{fig:randIND10})} and
       
  1564 \textbf{\ref{fig:randIND35})}), and for each $\delta$, a cumulative
       
  1565 chart is given as well, which excludes $\rho = 0.05$ and $0.1$ for the
       
  1566 sake of perspicuity (see \textbf{Figure \ref{fig:randIND5Sum}),
       
  1567   \ref{fig:randIND10Sum})} and \textbf{\ref{fig:randIND35Sum})}).
  1054 
  1568 
  1055 
  1569 
  1056 
  1570 
  1057 
  1571 
  1058 
  1572 
  1060 \vspace*{-0.8cm}
  1574 \vspace*{-0.8cm}
  1061 \begin{subfigure}[b]{0.55\textwidth}
  1575 \begin{subfigure}[b]{0.55\textwidth}
  1062 \begin{center}
  1576 \begin{center}
  1063 \begin{tikzpicture}
  1577 \begin{tikzpicture}
  1064 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1578 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1065 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1579 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1580   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1581   format/1000 sep = \space}]
  1066 %\addplot+[only marks] table {proteinsOrig.txt};
  1582 %\addplot+[only marks] table {proteinsOrig.txt};
  1067 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
  1583 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
  1068 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.05.txt};
  1584 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1585         {randGraph/ind/vf2ppInd5_0.05.txt};
  1069 \end{axis}
  1586 \end{axis}
  1070 \end{tikzpicture}
  1587 \end{tikzpicture}
  1071 \end{center}
  1588 \end{center}
  1072      \end{subfigure}
  1589      \end{subfigure}
  1073      \begin{subfigure}[b]{0.55\textwidth}
  1590      \begin{subfigure}[b]{0.55\textwidth}
  1074 \begin{center}
  1591 \begin{center}
  1075 \begin{tikzpicture}
  1592 \begin{tikzpicture}
  1076 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1593 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1077 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1594 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1595   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1596   format/1000 sep = \space}]
  1078 %\addplot+[only marks] table {proteinsOrig.txt};
  1597 %\addplot+[only marks] table {proteinsOrig.txt};
  1079 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
  1598 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
  1080 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.1.txt};
  1599 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1600         {randGraph/ind/vf2ppInd5_0.1.txt};
  1081 \end{axis}
  1601 \end{axis}
  1082 \end{tikzpicture}
  1602 \end{tikzpicture}
  1083 \end{center}
  1603 \end{center}
  1084 \end{subfigure}
  1604 \end{subfigure}
  1085 \hspace{1cm}
  1605 \hspace{1cm}
  1086 \begin{subfigure}[b]{0.55\textwidth}
  1606 \begin{subfigure}[b]{0.55\textwidth}
  1087 \begin{center}
  1607 \begin{center}
  1088 \begin{tikzpicture}
  1608 \begin{tikzpicture}
  1089 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1609 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1090 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1610 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1611   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1612   format/1000 sep = \space}]
  1091 %\addplot+[only marks] table {proteinsOrig.txt};
  1613 %\addplot+[only marks] table {proteinsOrig.txt};
  1092 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
  1614 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
  1093 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.3.txt};
  1615 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1616         {randGraph/ind/vf2ppInd5_0.3.txt};
  1094 \end{axis}
  1617 \end{axis}
  1095 \end{tikzpicture}
  1618 \end{tikzpicture}
  1096 \end{center}
  1619 \end{center}
  1097      \end{subfigure}
  1620      \end{subfigure}
  1098      \begin{subfigure}[b]{0.55\textwidth}
  1621      \begin{subfigure}[b]{0.55\textwidth}
  1099 \begin{center}
  1622 \begin{center}
  1100 \begin{tikzpicture}
  1623 \begin{tikzpicture}
  1101 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1624 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1102 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1625 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1626   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1627   format/1000 sep = \space}]
  1103 %\addplot+[only marks] table {proteinsOrig.txt};
  1628 %\addplot+[only marks] table {proteinsOrig.txt};
  1104 \addplot table {randGraph/ind/vf2pInd5_0.6.txt};
  1629 \addplot table {randGraph/ind/vf2pInd5_0.6.txt};
  1105 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.6.txt};
  1630 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1631         {randGraph/ind/vf2ppInd5_0.6.txt};
  1106 \end{axis}
  1632 \end{axis}
  1107 \end{tikzpicture}
  1633 \end{tikzpicture}
  1108 \end{center}
  1634 \end{center}
  1109 \end{subfigure}
  1635 \end{subfigure}
  1110 \begin{subfigure}[b]{0.55\textwidth}
  1636 \begin{subfigure}[b]{0.55\textwidth}
  1111           
  1637           
  1112 \begin{tikzpicture}
  1638 \begin{tikzpicture}
  1113 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1639 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1114 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1640 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1641   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1642   format/1000 sep = \space}]
  1115 %\addplot+[only marks] table {proteinsOrig.txt};
  1643 %\addplot+[only marks] table {proteinsOrig.txt};
  1116 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
  1644 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
  1117 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.8.txt};
  1645 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1646         {randGraph/ind/vf2ppInd5_0.8.txt};
  1118 \end{axis}
  1647 \end{axis}
  1119 \end{tikzpicture}
  1648 \end{tikzpicture}
  1120      \end{subfigure}
  1649      \end{subfigure}
  1121      \begin{subfigure}[b]{0.55\textwidth}
  1650      \begin{subfigure}[b]{0.55\textwidth}
  1122 \begin{tikzpicture}
  1651 \begin{tikzpicture}
  1123 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1652 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1124 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1653 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1654   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1655   format/1000 sep = \thinspace}]
  1125 %\addplot+[only marks] table {proteinsOrig.txt};
  1656 %\addplot+[only marks] table {proteinsOrig.txt};
  1126 \addplot table {randGraph/ind/vf2pInd5_0.95.txt};
  1657 \addplot table {randGraph/ind/vf2pInd5_0.95.txt};
  1127 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.95.txt};
  1658 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1659         {randGraph/ind/vf2ppInd5_0.95.txt};
  1128 \end{axis}
  1660 \end{axis}
  1129 \end{tikzpicture}
  1661 \end{tikzpicture}
  1130 \end{subfigure}
  1662 \end{subfigure}
  1131 \vspace*{-0.8cm}
  1663 \vspace*{-0.8cm}
  1132 \caption{IND on graphs having an average degree of 5.}\label{fig:randIND5}
  1664 \caption{IND on graphs having an average degree of
       
  1665   5.}\label{fig:randIND5}
  1133 \end{figure}
  1666 \end{figure}
  1134 
  1667 
  1135 \begin{figure}[H]
  1668 \begin{figure}[H]
  1136 \begin{center}
  1669 \begin{center}
  1137 \begin{tikzpicture}
  1670 \begin{tikzpicture}
  1138 \begin{axis}[title={Rand IND Summary, $\delta = 5$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1671 \begin{axis}[title={Rand IND Summary, $\delta = 5$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1139 =major,mark size=1pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1672 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
  1140 %\addplot+[only marks] table {proteinsOrig.txt};
  1673   west},scaled x ticks = false,x tick label style={/pgf/number
  1141 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.3.txt};
  1674   format/1000 sep = \thinspace}]
  1142 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.3.txt};
  1675 %\addplot+[only marks] table {proteinsOrig.txt};
  1143 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.6.txt};
  1676 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1144 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.6.txt};
  1677         {randGraph/ind/vf2pInd5_0.3.txt}; \addplot[mark=triangle*,mark
  1145 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.8.txt};
  1678           size=1.8pt,color=red] table
  1146 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.8.txt};
  1679         {randGraph/ind/vf2ppInd5_0.3.txt}; \addplot[mark=*,mark
  1147 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.95.txt};
  1680           size=1.5pt,color=blue] table
  1148 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.95.txt};
  1681         {randGraph/ind/vf2pInd5_0.6.txt}; \addplot[mark=triangle*,mark
       
  1682           size=1.8pt,color=red] table
       
  1683         {randGraph/ind/vf2ppInd5_0.6.txt}; \addplot[mark=*,mark
       
  1684           size=1.5pt,color=blue] table
       
  1685         {randGraph/ind/vf2pInd5_0.8.txt}; \addplot[mark=triangle*,mark
       
  1686           size=1.8pt,color=red] table
       
  1687         {randGraph/ind/vf2ppInd5_0.8.txt}; \addplot[mark=*,mark
       
  1688           size=1.5pt,color=blue] table
       
  1689         {randGraph/ind/vf2pInd5_0.95.txt};
       
  1690         \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1691                 {randGraph/ind/vf2ppInd5_0.95.txt};
  1149 \end{axis}
  1692 \end{axis}
  1150 \end{tikzpicture}
  1693 \end{tikzpicture}
  1151 \end{center}
  1694 \end{center}
  1152 \vspace*{-0.8cm}
  1695 \vspace*{-0.8cm}
  1153 \caption{Cummulative chart for $\delta=5$.}\label{fig:randIND5Sum}
  1696 \caption{Cummulative chart for $\delta=5$.}\label{fig:randIND5Sum}
  1159 \vspace*{-0.8cm}
  1702 \vspace*{-0.8cm}
  1160 \begin{subfigure}[b]{0.55\textwidth}
  1703 \begin{subfigure}[b]{0.55\textwidth}
  1161 \begin{center}
  1704 \begin{center}
  1162 \begin{tikzpicture}
  1705 \begin{tikzpicture}
  1163 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1706 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1164 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1707 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1708   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1709   format/1000 sep = \space}]
  1165 %\addplot+[only marks] table {proteinsOrig.txt};
  1710 %\addplot+[only marks] table {proteinsOrig.txt};
  1166 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
  1711 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
  1167 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.05.txt};
  1712 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1713         {randGraph/ind/vf2ppInd10_0.05.txt};
  1168 \end{axis}
  1714 \end{axis}
  1169 \end{tikzpicture}
  1715 \end{tikzpicture}
  1170 \end{center}
  1716 \end{center}
  1171      \end{subfigure}
  1717      \end{subfigure}
  1172      \begin{subfigure}[b]{0.55\textwidth}
  1718      \begin{subfigure}[b]{0.55\textwidth}
  1173 \begin{center}
  1719 \begin{center}
  1174 \begin{tikzpicture}
  1720 \begin{tikzpicture}
  1175 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1721 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1176 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1722 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1723   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1724   format/1000 sep = \space}]
  1177 %\addplot+[only marks] table {proteinsOrig.txt};
  1725 %\addplot+[only marks] table {proteinsOrig.txt};
  1178 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
  1726 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
  1179 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.1.txt};
  1727 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1728         {randGraph/ind/vf2ppInd10_0.1.txt};
  1180 \end{axis}
  1729 \end{axis}
  1181 \end{tikzpicture}
  1730 \end{tikzpicture}
  1182 \end{center}
  1731 \end{center}
  1183 \end{subfigure}
  1732 \end{subfigure}
  1184 \hspace{1cm}
  1733 \hspace{1cm}
  1185 \begin{subfigure}[b]{0.55\textwidth}
  1734 \begin{subfigure}[b]{0.55\textwidth}
  1186 \begin{center}
  1735 \begin{center}
  1187 \begin{tikzpicture}
  1736 \begin{tikzpicture}
  1188 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1737 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1189 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1738 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1739   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1740   format/1000 sep = \space}]
  1190 %\addplot+[only marks] table {proteinsOrig.txt};
  1741 %\addplot+[only marks] table {proteinsOrig.txt};
  1191 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
  1742 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
  1192 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.3.txt};
  1743 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1744         {randGraph/ind/vf2ppInd10_0.3.txt};
  1193 \end{axis}
  1745 \end{axis}
  1194 \end{tikzpicture}
  1746 \end{tikzpicture}
  1195 \end{center}
  1747 \end{center}
  1196      \end{subfigure}
  1748      \end{subfigure}
  1197      \begin{subfigure}[b]{0.55\textwidth}
  1749      \begin{subfigure}[b]{0.55\textwidth}
  1198 \begin{center}
  1750 \begin{center}
  1199 \begin{tikzpicture}
  1751 \begin{tikzpicture}
  1200 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1752 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1201 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1753 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1754   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1755   format/1000 sep = \space}]
  1202 %\addplot+[only marks] table {proteinsOrig.txt};
  1756 %\addplot+[only marks] table {proteinsOrig.txt};
  1203 \addplot table {randGraph/ind/vf2pInd10_0.6.txt};
  1757 \addplot table {randGraph/ind/vf2pInd10_0.6.txt};
  1204 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.6.txt};
  1758 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1759         {randGraph/ind/vf2ppInd10_0.6.txt};
  1205 \end{axis}
  1760 \end{axis}
  1206 \end{tikzpicture}
  1761 \end{tikzpicture}
  1207 \end{center}
  1762 \end{center}
  1208 \end{subfigure}
  1763 \end{subfigure}
  1209 \begin{subfigure}[b]{0.55\textwidth}
  1764 \begin{subfigure}[b]{0.55\textwidth}
  1210           
  1765           
  1211 \begin{tikzpicture}
  1766 \begin{tikzpicture}
  1212 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1767 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1213 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1768 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1769   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1770   format/1000 sep = \space}]
  1214 %\addplot+[only marks] table {proteinsOrig.txt};
  1771 %\addplot+[only marks] table {proteinsOrig.txt};
  1215 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
  1772 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
  1216 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.8.txt};
  1773 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1774         {randGraph/ind/vf2ppInd10_0.8.txt};
  1217 \end{axis}
  1775 \end{axis}
  1218 \end{tikzpicture}
  1776 \end{tikzpicture}
  1219      \end{subfigure}
  1777      \end{subfigure}
  1220      \begin{subfigure}[b]{0.55\textwidth}
  1778      \begin{subfigure}[b]{0.55\textwidth}
  1221 \begin{tikzpicture}
  1779 \begin{tikzpicture}
  1222 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1780 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1223 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1781 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1782   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1783   format/1000 sep = \thinspace}]
  1224 %\addplot+[only marks] table {proteinsOrig.txt};
  1784 %\addplot+[only marks] table {proteinsOrig.txt};
  1225 \addplot table {randGraph/ind/vf2pInd10_0.95.txt};
  1785 \addplot table {randGraph/ind/vf2pInd10_0.95.txt};
  1226 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.95.txt};
  1786 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1787         {randGraph/ind/vf2ppInd10_0.95.txt};
  1227 \end{axis}
  1788 \end{axis}
  1228 \end{tikzpicture}
  1789 \end{tikzpicture}
  1229 \end{subfigure}
  1790 \end{subfigure}
  1230 \vspace*{-0.8cm}
  1791 \vspace*{-0.8cm}
  1231 \caption{IND on graphs having an average degree of 10.}\label{fig:randIND10}
  1792 \caption{IND on graphs having an average degree of
       
  1793   10.}\label{fig:randIND10}
  1232 \end{figure}
  1794 \end{figure}
  1233 
  1795 
  1234 \begin{figure}[H]
  1796 \begin{figure}[H]
  1235 \begin{center}
  1797 \begin{center}
  1236 \begin{tikzpicture}
  1798 \begin{tikzpicture}
  1237 \begin{axis}[title={Rand IND Summary, $\delta = 10$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1799 \begin{axis}[title={Rand IND Summary, $\delta = 10$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1238 =major,mark size=1pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1800 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
  1239 %\addplot+[only marks] table {proteinsOrig.txt};
  1801   west},scaled x ticks = false,x tick label style={/pgf/number
  1240 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.3.txt};
  1802   format/1000 sep = \thinspace}]
  1241 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.3.txt};
  1803 %\addplot+[only marks] table {proteinsOrig.txt};
  1242 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.6.txt};
  1804 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1243 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.6.txt};
  1805         {randGraph/ind/vf2pInd10_0.3.txt};
  1244 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.8.txt};
  1806         \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1245 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.8.txt};
  1807                 {randGraph/ind/vf2ppInd10_0.3.txt};
  1246 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.95.txt};
  1808                 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1247 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.95.txt};
  1809                         {randGraph/ind/vf2pInd10_0.6.txt};
       
  1810                         \addplot[mark=triangle*,mark
       
  1811                           size=1.8pt,color=red] table
       
  1812                                 {randGraph/ind/vf2ppInd10_0.6.txt};
       
  1813                                 \addplot[mark=*,mark
       
  1814                                   size=1.5pt,color=blue] table
       
  1815                                         {randGraph/ind/vf2pInd10_0.8.txt};
       
  1816                                         \addplot[mark=triangle*,mark
       
  1817                                           size=1.8pt,color=red] table
       
  1818                                                 {randGraph/ind/vf2ppInd10_0.8.txt};
       
  1819                                                 \addplot[mark=*,mark
       
  1820                                                   size=1.5pt,color=blue]
       
  1821                                                 table
       
  1822                                                 {randGraph/ind/vf2pInd10_0.95.txt};
       
  1823                                                 \addplot[mark=triangle*,mark
       
  1824                                                   size=1.8pt,color=red]
       
  1825                                                 table
       
  1826                                                 {randGraph/ind/vf2ppInd10_0.95.txt};
  1248 \end{axis}
  1827 \end{axis}
  1249 \end{tikzpicture}
  1828 \end{tikzpicture}
  1250 \end{center}
  1829 \end{center}
  1251 \vspace*{-0.8cm}
  1830 \vspace*{-0.8cm}
  1252 \caption{Cummulative chart for $\delta=10$.}\label{fig:randIND10Sum}
  1831 \caption{Cummulative chart for $\delta=10$.}\label{fig:randIND10Sum}
  1258 \vspace*{-0.8cm}
  1837 \vspace*{-0.8cm}
  1259 \begin{subfigure}[b]{0.55\textwidth}
  1838 \begin{subfigure}[b]{0.55\textwidth}
  1260 \begin{center}
  1839 \begin{center}
  1261 \begin{tikzpicture}
  1840 \begin{tikzpicture}
  1262 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1841 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1263 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1842 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1843   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1844   format/1000 sep = \space}]
  1264 %\addplot+[only marks] table {proteinsOrig.txt};
  1845 %\addplot+[only marks] table {proteinsOrig.txt};
  1265 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
  1846 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
  1266 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.05.txt};
  1847 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1848         {randGraph/ind/vf2ppInd35_0.05.txt};
  1267 \end{axis}
  1849 \end{axis}
  1268 \end{tikzpicture}
  1850 \end{tikzpicture}
  1269 \end{center}
  1851 \end{center}
  1270      \end{subfigure}
  1852      \end{subfigure}
  1271      \begin{subfigure}[b]{0.55\textwidth}
  1853      \begin{subfigure}[b]{0.55\textwidth}
  1272 \begin{center}
  1854 \begin{center}
  1273 \begin{tikzpicture}
  1855 \begin{tikzpicture}
  1274 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1856 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1275 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1857 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1858   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1859   format/1000 sep = \space}]
  1276 %\addplot+[only marks] table {proteinsOrig.txt};
  1860 %\addplot+[only marks] table {proteinsOrig.txt};
  1277 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
  1861 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
  1278 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.1.txt};
  1862 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1863         {randGraph/ind/vf2ppInd35_0.1.txt};
  1279 \end{axis}
  1864 \end{axis}
  1280 \end{tikzpicture}
  1865 \end{tikzpicture}
  1281 \end{center}
  1866 \end{center}
  1282 \end{subfigure}
  1867 \end{subfigure}
  1283 \hspace{1cm}
  1868 \hspace{1cm}
  1284 \begin{subfigure}[b]{0.55\textwidth}
  1869 \begin{subfigure}[b]{0.55\textwidth}
  1285 \begin{center}
  1870 \begin{center}
  1286 \begin{tikzpicture}
  1871 \begin{tikzpicture}
  1287 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1872 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1288 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1873 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1874   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1875   format/1000 sep = \space}]
  1289 %\addplot+[only marks] table {proteinsOrig.txt};
  1876 %\addplot+[only marks] table {proteinsOrig.txt};
  1290 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
  1877 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
  1291 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.3.txt};
  1878 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1879         {randGraph/ind/vf2ppInd35_0.3.txt};
  1292 \end{axis}
  1880 \end{axis}
  1293 \end{tikzpicture}
  1881 \end{tikzpicture}
  1294 \end{center}
  1882 \end{center}
  1295      \end{subfigure}
  1883      \end{subfigure}
  1296      \begin{subfigure}[b]{0.55\textwidth}
  1884      \begin{subfigure}[b]{0.55\textwidth}
  1297 \begin{center}
  1885 \begin{center}
  1298 \begin{tikzpicture}
  1886 \begin{tikzpicture}
  1299 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1887 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1300 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1888 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1889   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1890   format/1000 sep = \space}]
  1301 %\addplot+[only marks] table {proteinsOrig.txt};
  1891 %\addplot+[only marks] table {proteinsOrig.txt};
  1302 \addplot table {randGraph/ind/vf2pInd35_0.6.txt};
  1892 \addplot table {randGraph/ind/vf2pInd35_0.6.txt};
  1303 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.6.txt};
  1893 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1894         {randGraph/ind/vf2ppInd35_0.6.txt};
  1304 \end{axis}
  1895 \end{axis}
  1305 \end{tikzpicture}
  1896 \end{tikzpicture}
  1306 \end{center}
  1897 \end{center}
  1307 \end{subfigure}
  1898 \end{subfigure}
  1308 \begin{subfigure}[b]{0.55\textwidth}
  1899 \begin{subfigure}[b]{0.55\textwidth}
  1309           
  1900           
  1310 \begin{tikzpicture}
  1901 \begin{tikzpicture}
  1311 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1902 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1312 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
  1903 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1904   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1905   format/1000 sep = \space}]
  1313 %\addplot+[only marks] table {proteinsOrig.txt};
  1906 %\addplot+[only marks] table {proteinsOrig.txt};
  1314 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
  1907 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
  1315 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.8.txt};
  1908 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1909         {randGraph/ind/vf2ppInd35_0.8.txt};
  1316 \end{axis}
  1910 \end{axis}
  1317 \end{tikzpicture}
  1911 \end{tikzpicture}
  1318      \end{subfigure}
  1912      \end{subfigure}
  1319      \begin{subfigure}[b]{0.55\textwidth}
  1913      \begin{subfigure}[b]{0.55\textwidth}
  1320 \begin{tikzpicture}
  1914 \begin{tikzpicture}
  1321 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1915 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1322 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1916 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
       
  1917   west},scaled x ticks = false,x tick label style={/pgf/number
       
  1918   format/1000 sep = \thinspace}]
  1323 %\addplot+[only marks] table {proteinsOrig.txt};
  1919 %\addplot+[only marks] table {proteinsOrig.txt};
  1324 \addplot table {randGraph/ind/vf2pInd35_0.95.txt};
  1920 \addplot table {randGraph/ind/vf2pInd35_0.95.txt};
  1325 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.95.txt};
  1921 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
       
  1922         {randGraph/ind/vf2ppInd35_0.95.txt};
  1326 \end{axis}
  1923 \end{axis}
  1327 \end{tikzpicture}
  1924 \end{tikzpicture}
  1328 \end{subfigure}
  1925 \end{subfigure}
  1329 \vspace*{-0.8cm}
  1926 \vspace*{-0.8cm}
  1330 \caption{IND on graphs having an average degree of 35.}\label{fig:randIND35}
  1927 \caption{IND on graphs having an average degree of
       
  1928   35.}\label{fig:randIND35}
  1331 \end{figure}
  1929 \end{figure}
  1332 
  1930 
  1333 \begin{figure}[H]
  1931 \begin{figure}[H]
  1334 \begin{center}
  1932 \begin{center}
  1335 \begin{tikzpicture}
  1933 \begin{tikzpicture}
  1336 \begin{axis}[title={Rand IND Summary, $\delta = 35$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1934 \begin{axis}[title={Rand IND Summary, $\delta = 35$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1337 =major,mark size=1pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
  1935 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
  1338 %\addplot+[only marks] table {proteinsOrig.txt};
  1936   west},scaled x ticks = false,x tick label style={/pgf/number
  1339 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.3.txt};
  1937   format/1000 sep = \thinspace}]
  1340 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.3.txt};
  1938 %\addplot+[only marks] table {proteinsOrig.txt};
  1341 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.6.txt};
  1939 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1342 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.6.txt};
  1940         {randGraph/ind/vf2pInd35_0.3.txt};
  1343 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.8.txt};
  1941         \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1344 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.8.txt};
  1942                 {randGraph/ind/vf2ppInd35_0.3.txt};
  1345 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.95.txt};
  1943                 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1346 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.95.txt};
  1944                         {randGraph/ind/vf2pInd35_0.6.txt};
       
  1945                         \addplot[mark=triangle*,mark
       
  1946                           size=1.8pt,color=red] table
       
  1947                                 {randGraph/ind/vf2ppInd35_0.6.txt};
       
  1948                                 \addplot[mark=*,mark
       
  1949                                   size=1.5pt,color=blue] table
       
  1950                                         {randGraph/ind/vf2pInd35_0.8.txt};
       
  1951                                         \addplot[mark=triangle*,mark
       
  1952                                           size=1.8pt,color=red] table
       
  1953                                                 {randGraph/ind/vf2ppInd35_0.8.txt};
       
  1954                                                 \addplot[mark=*,mark
       
  1955                                                   size=1.5pt,color=blue]
       
  1956                                                 table
       
  1957                                                 {randGraph/ind/vf2pInd35_0.95.txt};
       
  1958                                                 \addplot[mark=triangle*,mark
       
  1959                                                   size=1.8pt,color=red]
       
  1960                                                 table
       
  1961                                                 {randGraph/ind/vf2ppInd35_0.95.txt};
  1347 \end{axis}
  1962 \end{axis}
  1348 \end{tikzpicture}
  1963 \end{tikzpicture}
  1349 \end{center}
  1964 \end{center}
  1350 \vspace*{-0.8cm}
  1965 \vspace*{-0.8cm}
  1351 \caption{Cummulative chart for $\delta=35$.}\label{fig:randIND35Sum}
  1966 \caption{Cummulative chart for $\delta=35$.}\label{fig:randIND35Sum}
  1352 \end{figure}
  1967 \end{figure}
  1353 
  1968 
  1354 Based on these experiments, VF2++ is faster than VF2 Plus and able to handle really large graphs in milliseconds. Note that when $IND$ was considered and the small graphs had proportionally few nodes ($\rho = 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node orders(e.g. see the $\delta=10$ case on \textbf{Figure \ref{fig:randIND10})}). If these examples had been excluded, the charts would have seemed to be similar to the other ones.
  1969 Based on these experiments, VF2++ is faster than VF2 Plus and able to
  1355 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 Plus slow slightly down, but remain practically usable even on graphs having 10 000 nodes.
  1970 handle really large graphs in milliseconds. Note that when $IND$ was
  1356 
  1971 considered and the small graphs had proportionally few nodes ($\rho =
  1357 
  1972 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
  1358 
  1973 orders(e.g. see the $\delta=10$ case on \textbf{Figure
  1359 
  1974   \ref{fig:randIND10})}). If these examples had been excluded, the
  1360 \newpage
  1975 charts would have seemed to be similar to the other ones.
       
  1976 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
       
  1977 Plus slow slightly down, but remain practically usable even on graphs
       
  1978 having 10 000 nodes.
       
  1979 
       
  1980 
       
  1981 
       
  1982 
       
  1983 
  1361 \section{Conclusion}
  1984 \section{Conclusion}
  1362 In this thesis, after providing a short summary of the recent algorithms, a new graph matching algorithm based on VF2, called VF2++, has been presented and analyzed from a practical viewpoint.
  1985 In this paper, after providing a short summary of the recent
  1363 
  1986 algorithms, a new graph matching algorithm based on VF2, called VF2++,
  1364 Recognizing the importance of the node order and determining an efficient one, VF2++ is able to match graphs of thousands of nodes in near practically linear time including preprocessing. In addition to the proper order, VF2++ uses more efficient consistency and cutting rules which are easy to compute and make the algorithm able to prune most of the unfruitful branches without going astray.
  1987 has been presented and analyzed from a practical viewpoint.
  1365 
  1988 
  1366 In order to show the efficiency of the new method, it has been compared to VF2 Plus, which is the best concurrent algorithm based on \cite{VF2Plus}.
  1989 Recognizing the importance of the node order and determining an
  1367 
  1990 efficient one, VF2++ is able to match graphs of thousands of nodes in
  1368 The experiments show that VF2++ consistently outperforms VF2 Plus on biological graphs. It seems to be asymptotically faster on protein and on contact map graphs in the case of induced subgraph isomorphism, while in the case of graph isomorphism, it has definitely better asymptotic behaviour on protein graphs.
  1991 near practically linear time including preprocessing. In addition to
  1369 
  1992 the proper order, VF2++ uses more efficient consistency and cutting
  1370 Regarding random sparse graphs, not only has VF2++ proved itself to be faster than VF2 Plus, but it has a practically linear behaviour both in the case of induced subgraph- and graph isomorphism, as well.
  1993 rules which are easy to compute and make the algorithm able to prune
       
  1994 most of the unfruitful branches without going astray.
       
  1995 
       
  1996 In order to show the efficiency of the new method, it has been
       
  1997 compared to VF2 Plus, which is the best concurrent algorithm based on
       
  1998 \cite{VF2Plus}.
       
  1999 
       
  2000 The experiments show that VF2++ consistently outperforms VF2 Plus on
       
  2001 biological graphs. It seems to be asymptotically faster on protein and
       
  2002 on contact map graphs in the case of induced subgraph isomorphism,
       
  2003 while in the case of graph isomorphism, it has definitely better
       
  2004 asymptotic behaviour on protein graphs.
       
  2005 
       
  2006 Regarding random sparse graphs, not only has VF2++ proved itself to be
       
  2007 faster than VF2 Plus, but it has a practically linear behaviour both
       
  2008 in the case of induced subgraph- and graph isomorphism, as well.
  1371 
  2009 
  1372 
  2010 
  1373 
  2011 
  1374 %% The Appendices part is started with the command \appendix;
  2012 %% The Appendices part is started with the command \appendix;
  1375 %% appendix sections are then done as normal sections
  2013 %% appendix sections are then done as normal sections
  1379 %% \label{}
  2017 %% \label{}
  1380 
  2018 
  1381 %% If you have bibdatabase file and want bibtex to generate the
  2019 %% If you have bibdatabase file and want bibtex to generate the
  1382 %% bibitems, please use
  2020 %% bibitems, please use
  1383 %%
  2021 %%
  1384 \bibliographystyle{elsarticle-num} 
  2022 \bibliographystyle{elsarticle-num} \bibliography{bibliography}
  1385 \bibliography{bibliography}
       
  1386 
  2023 
  1387 %% else use the following coding to input the bibitems directly in the
  2024 %% else use the following coding to input the bibitems directly in the
  1388 %% TeX file.
  2025 %% TeX file.
  1389 
  2026 
  1390 %% \begin{thebibliography}{00}
  2027 %% \begin{thebibliography}{00}