damecco.tex
changeset 28 523fddfd7a01
parent 27 497868c58d36
child 29 0ff72a828b16
equal deleted inserted replaced
24:0050d7927c3a 25:b0be2b06ea29
   124   twofold. Firstly, it is based on a new approach for determining the
   124   twofold. Firstly, it is based on a new approach for determining the
   125   matching order of the nodes, and secondly, more efficient -
   125   matching order of the nodes, and secondly, more efficient -
   126   nevertheless easier to compute - cutting rules significantly
   126   nevertheless easier to compute - cutting rules significantly
   127   reducing the search space are applied.
   127   reducing the search space are applied.
   128 
   128 
   129   In addition to the usual subgraph isomorphism, the paper also
   129   In addition to the usual \emph{Subgraph Isomorphism Problem}, the paper also
   130   presents specialized versions for the \emph{Induced Subgraph
   130   presents specialized algorithms for the \emph{Induced Subgraph
   131     Isomorphism} and for the \emph{Graph Isomorphism Problems}.
   131     Isomorphism} and for the \emph{Graph Isomorphism Problems}.
   132 
   132 
   133   Finally, an extensive experimental evaluation is provided using a
   133   Finally, an extensive experimental evaluation is provided using a
   134   wide range of inputs, including both real life biological and
   134   wide range of inputs, including both real-life biological and
   135   chemical datasets and standard randomly generated graph series. The
   135   chemical datasets and standard randomly generated graph series. The
   136   results show major and consistent running time improvements over the
   136   results show major and consistent running time improvements over the
   137   other known methods.
   137   other known methods.
   138  
   138  
   139   The C++ implementations of the algorithms are available open source as
   139   The C++ implementations of the algorithms are available open-source as 
   140   the part of the LEMON graph and network optimization library.
   140   part of the LEMON graph and network optimization library.
   141   
   141   
   142 \end{abstract}
   142 \end{abstract}
   143 
   143 
   144 \begin{keyword}
   144 \begin{keyword}
   145   Computational Biology, Subgraph Isomorphism Problem
   145   Computational Biology, Subgraph Isomorphism Problem
   170 Complex biological systems arise from the interaction and cooperation
   170 Complex biological systems arise from the interaction and cooperation
   171 of plenty of molecular components. Getting acquainted with such
   171 of plenty of molecular components. Getting acquainted with such
   172 systems at the molecular level is of primary importance, since
   172 systems at the molecular level is of primary importance, since
   173 protein-protein interaction, DNA-protein interaction, metabolic
   173 protein-protein interaction, DNA-protein interaction, metabolic
   174 interaction, transcription factor binding, neuronal networks, and
   174 interaction, transcription factor binding, neuronal networks, and
   175 hormone signaling networks can be understood this way.
   175 hormone signalling networks can be understood this way.
   176 
   176 
   177 Many chemical and biological structures can easily be modeled
   177 Many chemical and biological structures can easily be modelled
   178 as graphs, for instance, a molecular structure can be
   178 as graphs, for instance, a molecular structure can be
   179 considered as a graph, whose nodes correspond to atoms and whose
   179 considered as a graph, whose nodes correspond to atoms and whose
   180 edges to chemical bonds. The similarity and dissimilarity of
   180 edges to chemical bonds. The similarity and dissimilarity of
   181 objects corresponding to nodes are incorporated to the model
   181 objects corresponding to nodes are incorporated to the model
   182 by \emph{node labels}. Understanding such networks basically
   182 by \emph{node labels}. Understanding such networks basically
   183 requires finding specific subgraphs, thus it calls for efficient
   183 requires finding specific subgraphs, thus it calls for efficient
   184 graph matching algorithms.
   184 graph matching algorithms.
   185 
   185 
   186 Other real-world fields related to some
   186 Other real-world fields related to some
   187 variants of graph matching include pattern recognition
   187 variants of graph matching include pattern recognition
   188 and machine vision \cite{HorstBunkeApplications}, symbol recognition
   188 and machine vision~\cite{HorstBunkeApplications}, symbol recognition~\cite{CordellaVentoSymbolRecognition}, and face identification~\cite{JianzhuangYongFaceIdentification}.  \\
   189 \cite{CordellaVentoSymbolRecognition}, face identification
       
   190 \cite{JianzhuangYongFaceIdentification}.  \\
       
   191 
   189 
   192 Subgraph and induced subgraph matching problems are known to be
   190 Subgraph and induced subgraph matching problems are known to be
   193 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is
   191 NP-Complete~\cite{SubgraphNPC}, while the graph isomorphism problem is
   194 one of the few problems in NP neither known to be in P nor
   192 one of the few problems in NP neither known to be in P nor
   195 NP-Complete. Although polynomial time isomorphism algorithms are known
   193 NP-Complete. Although polynomial-time isomorphism algorithms are known
   196 for various graph classes, like trees and planar
   194 for various graph classes, like trees and planar
   197 graphs\cite{PlanarGraphIso}, bounded valence
   195 graphs~\cite{PlanarGraphIso}, bounded valence
   198 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
   196 graphs~\cite{BondedDegGraphIso}, interval graphs~\cite{IntervalGraphIso}
   199 or permutation graphs\cite{PermGraphIso}, and recently, an FPT algorithm has been presented for the coloured hypergraph isomorphism problem in \cite{ColoredHiperGraphIso}.
   197 or permutation graphs~\cite{PermGraphIso}. Furthermore, an FPT algorithm has also been presented for the coloured hypergraph isomorphism problem in~\cite{ColoredHiperGraphIso}.
   200 
   198 
   201 In the following, some algorithms based on other approaches are
   199 In the following, some algorithms based on other approaches are
   202 summarized, which do not need any restrictions on the graphs. Even though,
   200 summarized, which do not need any restrictions on the graphs. Even though,
   203 an overall polynomial behaviour is not expectable from such an
   201 an overall polynomial behaviour is not expectable from such an
   204 alternative, it may often have good practical performance, in fact,
   202 alternative, it may often have good practical performance, in fact,
   205 it might be the best choice even on a graph class for which polynomial
   203 it might be the best choice even on a graph class for which polynomial
   206 algorithm is known.
   204 algorithm is known.
   207 
   205 
   208 The first practically usable approach was due to
   206 The first practically usable approach was due to
   209 \emph{Ullmann}\cite{Ullmann} which is a commonly used depth-first
   207 \emph{Ullmann}~\cite{Ullmann}, which is a commonly used algorithm based on depth-first
   210 search based algorithm with a complex heuristic for reducing the
   208 search with a complex heuristic for reducing the
   211 number of visited states. A major problem is its $\Theta(n^3)$ space
   209 number of visited states. A major problem is its $\Theta(n^3)$ space
   212 complexity, which makes it impractical in the case of big sparse
   210 complexity, which makes it impractical in the case of big sparse
   213 graphs.
   211 graphs.
   214 
   212 
   215 In a recent paper, Ullmann\cite{UllmannBit} presents an
   213 In a recent paper, Ullmann~\cite{UllmannBit} presents an
   216 improved version of this algorithm based on a bit-vector solution for
   214 improved version of this algorithm based on a bit-vector solution for
   217 the binary Constraint Satisfaction Problem.
   215 the binary Constraint Satisfaction Problem.
   218 
   216 
   219 The \emph{Nauty} algorithm\cite{Nauty} transforms the two graphs to
   217 The \emph{Nauty} algorithm~\cite{Nauty} transforms the two graphs to
   220 a canonical form before starting to check for the isomorphism. It has
   218 a canonical form before starting to check for isomorphism. It has
   221 been considered as one of the fastest graph isomorphism algorithms,
   219 been considered as one of the fastest graph isomorphism algorithms,
   222 although graph categories were shown in which it takes exponentially
   220 although graph categories were shown in which it takes exponentially
   223 many steps. This algorithm handles only the graph isomorphism problem.
   221 many steps. This algorithm handles only the graph isomorphism problem.
   224 
   222 
   225 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search
   223 The \emph{LAD} algorithm~\cite{Lad} uses a depth-first search
   226 strategy and formulates the matching as a Constraint Satisfaction
   224 strategy and formulates the matching as a Constraint Satisfaction
   227 Problem to prune the search tree. The constraints are that the mapping
   225 Problem to prune the search tree. The constraints are that the mapping
   228 has to be injective and edge-preserving, hence it is possible to
   226 has to be injective and edge-preserving, hence it is possible to
   229 handle new matching types as well.
   227 handle new matching types as well.
   230 
   228 
   231 The \emph{RI} algorithm\cite{RI} and its variations are based on a
   229 The \emph{RI} algorithm~\cite{RI} and its variations are based on a
   232 state space representation. After reordering the nodes of the graphs,
   230 state space representation. After reordering the nodes of the graphs,
   233 it uses some fast executable heuristic checks without using any
   231 it uses some fast executable heuristic checks without using any
   234 complex pruning rules. It seems to run really efficiently on graphs
   232 complex pruning rules. It seems to run really efficiently on graphs
   235 coming from biology, and won the International Contest on Pattern
   233 coming from biology, and won the International Contest on Pattern
   236 Search in Biological Databases\cite{Content}.
   234 Search in Biological Databases~\cite{Content}.
   237 
   235 
   238 The currently most commonly used algorithm is the
   236 Currently, the most commonly used algorithm is the
   239 \emph{VF2}\cite{VF2}, the improved version of \emph{VF}\cite{VF}, which was
   237 \emph{VF2}~\cite{VF2}, the improved version of \emph{VF}~\cite{VF}, which was
   240 designed for solving pattern matching and computer vision problems,
   238 designed for solving pattern matching and computer vision problems,
   241 and has been one of the best overall algorithms for more than a
   239 and has been one of the best overall algorithms for more than a
   242 decade. Although, it can't be up to new specialized algorithms, it is
   240 decade. Although, it is not as fast as some of the new specialized algorithms, it is still widely used due to its simplicity and space efficiency. VF2 uses
   243 still widely used due to its simplicity and space efficiency. VF2 uses
       
   244 a state space representation and checks some conditions in each state
   241 a state space representation and checks some conditions in each state
   245 to prune the search tree.
   242 to prune the search tree.
   246 
   243 
   247 Meanwhile, another variant called \emph{VF2 Plus}\cite{VF2Plus} has
   244 Meanwhile, another variant called \emph{VF2~Plus}~\cite{VF2Plus} has
   248 been published. It is considered to be as efficient as the RI
   245 been published. It is considered to be as efficient as the RI
   249 algorithm and has a strictly better behavior on large graphs.  The
   246 algorithm and has a strictly better behaviour on large graphs.  The
   250 main idea of VF2 Plus is to precompute a heuristic node order of the
   247 main idea of VF2~Plus is to precompute a heuristic node order of the graph to be embedded, in which VF2 works more efficiently.
   251 small graph, in which the VF2 works more efficiently.
       
   252 
   248 
   253 This paper introduces \emph{VF2++}, a new further improved algorithm
   249 This paper introduces \emph{VF2++}, a new further improved algorithm
   254 for the graph and (induced)subgraph isomorphism problem, which uses
   250 for the graph and (induced) subgraph isomorphism problems, which uses
   255 efficient cutting rules and determines a node order in which VF2 runs
   251 efficient cutting rules and determines a node order in which VF2 runs
   256 significantly faster on practical inputs.
   252 significantly faster on practical inputs.
   257 
   253 
   258 The rest of the paper is structured as
   254 The rest of the paper is structured as
   259 follows. Section~\ref{sec:ProbStat} defines the exact problems to be
   255 follows. Section~\ref{sec:ProbStat} defines the exact problems to be
   260 solved, Section~\ref{sec:VF2Alg} provides a description of VF2. Based
   256 solved, Section~\ref{sec:VF2Alg} provides a description of VF2. Based
   261 on that, Section~\ref{sec:VF2ppAlg} introduces VF2++. Some technical
   257 on that, Section~\ref{sec:VF2ppAlg} introduces VF2++. Some technical
   262 details necessary for an efficient implementation are discussed in
   258 details necessary for an efficient implementation are discussed in
   263 Section~\ref{sec:VF2ppImpl}.  Finally, Section~\ref{sec:ExpRes}
   259 Section~\ref{sec:VF2ppImpl}.  Finally, Section~\ref{sec:ExpRes}
   264 provide a detailed experimental evaluation of VF2++ and its comparison
   260 provides a detailed experimental evaluation of VF2++ and its comparison
   265 to the state-of-the-art algorithm.
   261 to the state-of-the-art algorithm.
   266 
   262 
   267 It must also be mentioned that the C++ implementations of the
   263 It must also be mentioned that the C++ implementations of the
   268 algorithms have been made available for evaluation and use under an
   264 algorithms have been made available for evaluation and use under an
   269 open source license as a part of LEMON\cite{LEMON} open source graph
   265 open-source license as a part of LEMON~\cite{LEMON} graph
   270 library.
   266 library.
   271 
   267 
   272 \section{Problem Statement}\label{sec:ProbStat}
   268 \section{Problem Statement}\label{sec:ProbStat}
   273 This section provides a formal description of the problems to be
   269 This section provides a formal description of the problems to be
   274 solved.
   270 solved.
   319 \end{definition}
   315 \end{definition}
   320 
   316 
   321 
   317 
   322 \subsection{Common problems}\label{sec:CommProb}
   318 \subsection{Common problems}\label{sec:CommProb}
   323 
   319 
   324 The focus of this paper is on two extensively studied topics, the
   320 The focus of this paper is on the following problems appearing in many applications.
   325 subgraph isomorphism and its variations. However, the following
   321 
   326 problems also appear in many applications.
   322 The \textbf{subgraph isomorphism problem} is the following: is
   327 
       
   328 The \textbf{subgraph matching problem} is the following: is
       
   329 $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node
   323 $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node
   330 label?
   324 label?
   331 
   325 
   332 The \textbf{induced subgraph matching problem} asks the same about the
   326 The \textbf{induced subgraph isomorphism problem} asks the same about the
   333 existence of an induced subgraph.
   327 existence of an induced subgraph.
   334 
   328 
   335 The \textbf{graph isomorphism problem} can be defined as induced
   329 The \textbf{graph isomorphism problem} can be defined as induced
   336 subgraph matching problem where the sizes of the two graphs are equal.
   330 subgraph matching problem where the sizes of the two graphs are equal.
   337 
   331 
   338 In addition, one may want to find a \textbf{single} mapping or \textbf{enumerate} all of them.
   332 In addition, one may want to find a \textbf{single} embedding or \textbf{enumerate} all of them.
   339 
       
   340 Note that some authors refer to the term
       
   341 \emph{subgraph isomorphism problem} as an \emph{induced subgraph
       
   342   isomorphism problem}.
       
   343 
   333 
   344 \section{The VF2 Algorithm}\label{sec:VF2Alg}
   334 \section{The VF2 Algorithm}\label{sec:VF2Alg}
   345 This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2
   335 This algorithm is the basis of both the VF2++ and the VF2~Plus.  VF2
   346 is able to handle all the variations mentioned in Section
   336 is able to handle all the variations mentioned in Section~\ref{sec:CommProb}.  Although it can also handle directed graphs,
   347   \ref{sec:CommProb}.  Although it can also handle directed graphs,
       
   348 for the sake of simplicity, only the undirected case will be
   337 for the sake of simplicity, only the undirected case will be
   349 discussed.
   338 discussed.
   350 
   339 
   351 
   340 
   352 \subsection{Common notations}
   341 \subsection{Common notations}
   353 \indent Assume $G_{1}$ is searched in $G_{2}$.  The following
   342 \indent Assume $G_{1}$ is searched in $G_{2}$.  The following
   354 definitions and notations will be used throughout the whole paper.
   343 definitions and notations will be used throughout this paper.
   355 \begin{definition}
   344 \begin{definition}
   356 An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$.
   345 An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$.
   357 \end{definition}
   346 \end{definition}
   358 
   347 
   359 \begin{notation}
   348 \begin{notation}
   368 A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the
   357 A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the
   369 nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$.
   358 nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$.
   370 \end{definition}
   359 \end{definition}
   371 
   360 
   372 \begin{definition}
   361 \begin{definition}
   373 Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : \mathfrak{m}(w)=f(w)$ and $f(u)=v$ holds. Where $u\in V_1\setminus\mathfrak{D}(\mathfrak{m})$ and $v\in V_2\setminus\mathfrak{R}(\mathfrak{m})$, otherwise $extend(\mathfrak{m},(u,v))$ is undefined.
   362 Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : f(w)=\mathfrak{m}(w)$ and $f(u)=v$ holds, where $u\in V_1\setminus\mathfrak{D}(\mathfrak{m})$ and $v\in V_2\setminus\mathfrak{R}(\mathfrak{m})$; otherwise $extend(\mathfrak{m},(u,v))$ is undefined.
   374 \end{definition}
   363 \end{definition}
   375 
   364 
   376 \begin{notation}
   365 \begin{notation}
   377 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
   366 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
   378 which can be substituted by any of the $\mathbf{ISO}$, $\mathbf{SUB}$
   367 which can be substituted by any of the $\mathbf{SUB}$, $\mathbf{IND}$
   379 and $\mathbf{IND}$ problems.
   368 and $\mathbf{ISO}$ problems, which stand for the the problems mentioned in Section~\ref{sec:CommProb} respectively.
   380 \end{notation}
   369 \end{notation}
   381 
   370 
   382 \begin{definition}
   371 \begin{definition}
   383 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
   372 Let $\mathfrak{m}$ be a mapping. The \textbf{consistency function for } $\mathbf{PT}$ is a logical function, for which $\mathbf{Cons_{PT}}(\mathfrak{m})$ is true if and only if $\mathfrak{m}$ satisfies the requirements of $\mathbf{PT}$ considering the subgraphs of $G_{1}$ and $G_{2}$ induced by $\mathfrak{D}(\mathfrak{m})$ and $\mathfrak{R}(\mathfrak{m})$, respectively.
   384 \textbf{consistency function by } $\mathbf{PT}$ if the following
   373 
   385 holds. If there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$.
   374 %$\mathbf{Cons_{PT}}(\mathfrak{m})$ is true if there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$.
   386 \end{definition}
   375 \end{definition}
   387 
   376 
   388 \begin{definition} 
   377 \begin{definition} 
   389 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
   378 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
   390 \textbf{cutting function by } $\mathbf{PT}$ if the following
   379 \textbf{cutting function for } $\mathbf{PT}$ if the following
   391 holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$.
   380 holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$.
   392 \end{definition}
   381 \end{definition}
   393 
   382 
   394 \begin{definition}
   383 \begin{definition}
   395 $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if
   384 A mapping $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if
   396   $Cons_{PT}(\mathfrak{m})$ is true.
   385   $Cons_{PT}(\mathfrak{m})$ is true.
   397 \end{definition}
   386 \end{definition}
   398 
   387 
   399 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
   388 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
   400 \begin{notation}
   389 \begin{notation}
   406 $Cons_{PT}$ will be used to check the consistency of the already
   395 $Cons_{PT}$ will be used to check the consistency of the already
   407 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
   396 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
   408 no whole consistent mapping can contain the current mapping.
   397 no whole consistent mapping can contain the current mapping.
   409 
   398 
   410 \subsection{Overview of the algorithm}
   399 \subsection{Overview of the algorithm}
   411 VF2 uses a state space representation of mappings, $Cons_{PT}$ for
   400 
   412 excluding inconsistency with the problem type and $Cut_{PT}$ for
   401 VF2 begins with an empty mapping and gradually extends it with respect to the consistency and cutting functions until a whole mapping is reached.
   413 pruning the search tree.
   402 
   414 
   403 Algorithm~\ref{alg:VF2Pseu} is a high-level description of
   415 Algorithm~\ref{alg:VF2Pseu} is a high level description of
   404 the VF2 algorithm. Each state of the matching process can
   416 the VF2 matching algorithm. Each state of the matching process can
       
   417 be associated with a mapping $\mathfrak{m}$. The initial state
   405 be associated with a mapping $\mathfrak{m}$. The initial state
   418 is associated with a mapping $\mathfrak{m}$, for which
   406 is associated with a mapping $\mathfrak{m}$, for which
   419 $\mathfrak{D}(\mathfrak{m})=\emptyset$, i.e. it starts with an empty mapping.
   407 $\mathfrak{D}(\mathfrak{m})=\emptyset$, i.e. it starts with an empty mapping.
   420 
   408 
   421 
   409 
   428 
   416 
   429 \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$}
   417 \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$}
   430   \If{$\mathfrak{m}$ covers
   418   \If{$\mathfrak{m}$ covers
   431     $V_{1}$} \State Output($\mathfrak{m}$)
   419     $V_{1}$} \State Output($\mathfrak{m}$)
   432   \Else
   420   \Else
   433   \State Compute the set $P_\mathfrak{m}$ of the pairs candidate for inclusion
   421   \State Compute the set $P_\mathfrak{m}$ of the candidate pairs for extending $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$
   434   in $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$
       
   435     $\neg$Cut$_{PT}$($p,\mathfrak{m}$)}
   422     $\neg$Cut$_{PT}$($p,\mathfrak{m}$)}
   436     \State \textbf{call}
   423     \State \textbf{call}
   437   VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
   424   VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
   438 \end{algorithmic}
   425 \end{algorithmic}
   439 \end{algorithm}
   426 \end{algorithm}
   440 
   427 
   441 
   428 
   442 For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of
   429 For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of
   443 candidate node pairs for adding to the current mapping $\mathfrak{m}_s$.
   430 candidate node pairs for extending the current mapping $\mathfrak{m}$.
   444 
   431 
   445 For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and
   432 For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and
   446 $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and
   433 $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and
   447 the latter is false, the whole process is recursively applied to
   434 the latter is false, the whole process is recursively applied to
   448 $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it
   435 $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it
   449 can be proved that $\mathfrak{m}$ can not be extended to a whole mapping.
   436 can be proved that $\mathfrak{m}$ can not be extended to a whole mapping.
   450 
   437 
   451 In order to make sure of the correctness, see
   438 In order to make sure of the correctness, see Claim~\ref{claim:consMapps}.
   452 \begin{claim}
   439 \begin{claim}\label{claim:consMapps}
   453 Through consistent mappings, only consistent whole mappings can be
   440 Through consistent mappings, only consistent whole mappings can be
   454 reached, and all the consistent whole mappings are reachable through
   441 reached, and all the consistent whole mappings are reachable through
   455 consistent mappings.
   442 consistent mappings.
   456 \end{claim}
   443 \end{claim}
   457 
   444 
   458 Note that a mapping may be reached in exponentially many different ways, since the
   445 Note that a mapping may be reached in exponentially many different ways, since the
   459 order of extensions does not influence the nascent mapping.
   446 order of extensions does not influence the nascent mapping.
   460 
   447 
   461 However, one may observe
   448 However, one may make the following observations.
       
   449 
       
   450 %\begin{claim}
       
   451 %\label{claim:claimTotOrd}
       
   452 %Let $\prec$ be an arbitrary total ordering relation on $V_{1}$.  If
       
   453 %the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which
       
   454 %\begin{center}
       
   455 %$\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$,
       
   456 %\end{center}
       
   457 %then no mapping can be reached more than once, and each whole mapping %remains reachable.
       
   458 %\end{claim}
       
   459 
       
   460 \begin{definition}
       
   461 A total order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of
       
   462 $V_{1}$ is \textbf{matching order} if VF2 can cover $u_{\sigma(d)}$ on the $d$-th level for all $d\in\{1,..,|V_{1}|\}$.
       
   463 \end{definition}
   462 
   464 
   463 \begin{claim}
   465 \begin{claim}
   464 \label{claim:claimTotOrd}
   466 \label{claim:claimTotOrd}
   465 Let $\prec$ be an arbitrary total ordering relation on $V_{1}$.  If
   467 If VF2 is prescribed to cover the nodes of $G_1$ according to a matching order, then no mapping can be reached more than once and each whole mapping remains reachable.
   466 the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which
       
   467 \begin{center}
       
   468 $\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$,
       
   469 \end{center}
       
   470 then no mapping can be reached more than once, and each whole mapping remains reachable.
       
   471 \end{claim}
   468 \end{claim}
   472 
   469 
   473 Note that the cornerstone of the improvements to VF2 is a proper
   470 Note that the cornerstone of the improvements to VF2 is to choose a proper
   474 choice of a total ordering.
   471 matching order.
   475 
   472 
   476 \subsection{The candidate set}
   473 \subsection{The candidate set}
   477 \label{candidateComputingVF2}
   474 \label{candidateComputingVF2}
   478 Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$.
   475 Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$.
   479 
   476 
   480 \begin{notation}
   477 \begin{notation}
   481 Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and
   478 Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and
   482  $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$.
   479  $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$.
   483 \end{notation}
   480 \end{notation}
   484 
   481 
   485 The set $P_\mathfrak{m}$ includes the pairs of uncovered neighbours of covered
   482 The set $P_\mathfrak{m}$ contains the pairs of uncovered neighbours of covered
   486 nodes, and if there is not such a node pair, all the pairs containing
   483 nodes, and if there is not such a node pair, all the pairs containing
   487 two uncovered nodes are added. Formally, let
   484 two uncovered nodes are added. Formally, let
   488 \[
   485 \[
   489  P_\mathfrak{m}\!=\!
   486  P_\mathfrak{m}\!=\!
   490   \begin{cases} 
   487   \begin{cases} 
   494    &\hspace{-0.15cm}\text{otherwise}.
   491    &\hspace{-0.15cm}\text{otherwise}.
   495   \end{cases}
   492   \end{cases}
   496 \]
   493 \]
   497 
   494 
   498 \subsection{Consistency}
   495 \subsection{Consistency}
   499 Suppose $p=(u,v)$, where $u\in V_{1}$ and $v\in V_{2}$, $\mathfrak{m}$ is a consistent mapping by
   496 Let $p=(u,v)\in V_{1}\times V_{2}$, and suppose $\mathfrak{m}$ is a consistent mapping by
   500 $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether
   497 $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether
   501 including pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$.
   498 adding pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$.
   502 
   499 
   503 For example, the consistency function of induced subgraph isomorphism is as follows.
   500 For example, the consistency function of the induced subgraph isomorphism problem is the following.
   504 \begin{notation}
   501 \begin{notation}
   505 Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} :
   502 Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} :
   506 (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2}
   503 (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2}
   507   (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$.
   504   (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$. That is, $\mathbf{\Gamma_{i} (w)}$ denotes the set of neighbours of node $w$ in $G_i$ $(i=1,2)$.
   508 \end{notation}
   505 \end{notation}
   509 
   506 
   510 $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ $\Leftrightarrow
       
   511 (\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}
       
   512 \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. The
       
   513 following formulation gives an efficient way of calculating
       
   514 $Cons_{IND}$.
       
   515 \begin{claim}
   507 \begin{claim}
   516 $Cons_{IND}((u,v),\mathfrak{m}):=\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge
   508 $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ if and only if $\mathfrak{m}$ is consistent and $(\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}
       
   509 \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. 
       
   510 \end{claim}
       
   511 
       
   512 The following formulation gives an efficient way of calculating $Cons_{IND}$.
       
   513 
       
   514 \begin{claim}
       
   515 $Cons_{IND}((u,v),\mathfrak{m}):=Cons_{IND}(\mathfrak{m})\wedge\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge
   517   (\forall \tilde{u}\in \Gamma_{1}(u)
   516   (\forall \tilde{u}\in \Gamma_{1}(u)
   518   \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is a
   517   \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is the consistency function for $IND$.
   519   consistency function in the case of $IND$.
       
   520 \end{claim}
   518 \end{claim}
   521 
   519 
   522 \subsection{Cutting rules}
   520 \subsection{Cutting rules}
   523 $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently
   521 $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently
   524 verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can
   522 verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can
   525 be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a
   523 be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a
   526 whole mapping.
   524 whole mapping.
   527 
   525 
   528 As an example, the cutting function of induced subgraph isomorphism is presented.
   526 As an example, a cutting function of induced subgraph isomorphism problem is presented.
   529 \begin{notation}
   527 \begin{notation}
   530 Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash
   528 Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash
   531 \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and
   529 \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and
   532 \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash
   530 \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash
   533 \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$.
   531 \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$.
   535 
   533 
   536 \begin{claim}
   534 \begin{claim}
   537 $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| <
   535 $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| <
   538   |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap
   536   |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap
   539   \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap
   537   \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap
   540   \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function by $IND$.
   538   \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function for $IND$.
   541 \end{claim}
   539 \end{claim}
   542 
   540 
   543 \section{The VF2++ Algorithm}\label{sec:VF2ppAlg}
   541 \section{The VF2++ Algorithm}\label{sec:VF2ppAlg}
   544 Although any total ordering relation makes the search space of VF2 a
   542 Although any matching order makes the search space of VF2 a
   545 tree, its choice turns out to dramatically influence the number of
   543 tree, its choice turns out to dramatically influence the number of
   546 visited states. The goal is to determine an efficient one as quickly
   544 visited states. The goal is to determine an efficient one as quickly
   547 as possible.
   545 as possible.
   548 
   546 
   549 The main reason for VF2++' superiority over VF2 is twofold. Firstly,
   547 The main reason for the superiority of VF2++ over VF2 is twofold. Firstly,
   550 taking into account the structure and the node labeling of the graph,
   548 taking into account the structure and the node labelling of the graph,
   551 VF2++ determines a state order in which most of the unfruitful
   549 VF2++ determines a matching order in which most of the unfruitful
   552 branches of the search space can be pruned immediately. Secondly,
   550 branches of the search space can be pruned immediately. Secondly,
   553 introducing more efficient --- nevertheless still easier to compute
   551 introducing more efficient --- nevertheless still easier to compute
   554 --- cutting rules reduces the chance of going astray even further.
   552 --- cutting rules reduces the chance of going astray even further.
   555 
   553 
   556 In addition to the usual subgraph isomorphism, specialized versions
   554 In addition to the usual subgraph isomorphism problem, specialized versions
   557 for induced subgraph isomorphism and for graph isomorphism have been
   555 for induced subgraph and graph isomorphism problems have been
   558 designed.
   556 designed.
   559 
   557 
   560 Note that a weaker version of the cutting rules and an efficient
   558 Note that a weaker version of the cutting rules of VF2++ and an efficient
   561 candidate set calculating were described in \cite{VF2Plus}.
   559 candidate set calculation method were described in~\cite{VF2Plus}.
   562 
   560 
   563 It should be noted that all the methods described in this section are
   561 It should be noted that all the methods described in this section are
   564 extendable to handle directed graphs and edge labels as well.
   562 extendable to handle directed graphs and edge labels as well.
   565 The basic ideas and the detailed description of VF2++ are provided in
   563 The basic ideas and the detailed description of VF2++ are provided in
   566 the following.\newline
   564 the following.\newline
   568 The goal is to find a matching order in which the algorithm is able to
   566 The goal is to find a matching order in which the algorithm is able to
   569 recognize inconsistency or prune the infeasible branches on the
   567 recognize inconsistency or prune the infeasible branches on the
   570 highest levels and goes deep only if it is needed.
   568 highest levels and goes deep only if it is needed.
   571 
   569 
   572 \begin{notation}
   570 \begin{notation}
   573 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{1}(u)\cap H\}|$, that is the
   571 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{1}(u)\cap H|$, that is the
   574 number of neighbours of u which are in H, where $u\in V_{1} $ and
   572 number of neighbours of u which are in H, where $u\in V_{1} $ and
   575 $H\subseteq V_{1}$.
   573 $H\subseteq V_{1}$.
   576 \end{notation}
   574 \end{notation}
   577 
   575 
   578 The principal question is the following. Suppose a mapping $\mathfrak{m}$ is
   576 The principal question is the following. Suppose a mapping $\mathfrak{m}$ is
   580 consistent pair in $G_{2}$? The more covered neighbours a node in
   578 consistent pair in $G_{2}$? The more covered neighbours a node in
   581 $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has
   579 $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has
   582 ---, the more rarely satisfiable consistency constraints for its pair
   580 ---, the more rarely satisfiable consistency constraints for its pair
   583 are given.
   581 are given.
   584 
   582 
   585 In biology, most of the graphs are sparse, thus several nodes in
   583 Most of the graphs of biological and chemical structures are sparse, thus several nodes in
   586 $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes
   584 $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes
   587 reasonable to define a secondary and a tertiary order between them.
   585 reasonable to define a secondary and a tertiary order between them.
   588 The observation above proves itself to be as determining, that the
   586 The observation above proves itself to be as determining, that the
   589 secondary ordering prefers nodes with the most uncovered neighbours
   587 secondary ordering prefers nodes with the most uncovered neighbours
   590 among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase
   588 among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase
   591 $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes so much, as possible.  The
   589 $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes as much, as possible.  The tertiary ordering prefers nodes having the rarest uncovered labels in $G_2$.
   592 tertiary ordering prefers nodes having the rarest uncovered labels.
   590 
   593 
   591 Note that the secondary ordering is the same as ordering by degrees,
   594 Note that the secondary ordering is the same as the ordering by $deg$,
       
   595 which is a static data in front of the above used.
   592 which is a static data in front of the above used.
   596 
   593 
   597 These rules can easily result in a matching order which contains the
   594 These rules can easily result in a matching order which contains the
   598 nodes of a long path successively, whose nodes may have low $Conn$ and
   595 nodes of a long path successively, whose nodes may have low $Conn$ and
   599 is easily matchable into $G_{2}$. To avoid that, a BFS order is
   596 is easily matchable into $G_{2}$. To try to avoid that, a Breadth-first-search order is used, and on each of its levels, the ordering procedure described above is applied.
   600 used, which provides the shortest possible paths.
       
   601 \newline
   597 \newline
   602 
   598 
   603 In the following, some examples on which the VF2 may be slow are
   599 In the following, some examples on which the VF2 may be slow are
   604 described, although they are easily solvable by using a proper
   600 described, although they are easily solvable by using a proper
   605 matching order.
   601 matching order.
   615 $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in V_{1}\backslash
   611 $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in V_{1}\backslash
   616 \{u\}$
   612 \{u\}$
   617 \newline
   613 \newline
   618 $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in V_{2}\backslash
   614 $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in V_{2}\backslash
   619 \{v\}$
   615 \{v\}$
   620 \newline
       
   621 
   616 
   622 Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since
   617 Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since
   623 $u$ is black and no node in $V_{2}$ has a black label except
   618 $u$ is black and no node in $V_{2}$ has a black label except
   624 $v$. If unfortunately $u$ were the last node which will get covered,
   619 $v$. If unfortunately $u$ were the last node which will get covered,
   625 VF2 would check only in the last steps, whether $u$ can be matched to
   620 VF2 would check only in the last steps, whether $u$ can be matched to
   626 $v$.
   621 $v$.
   627 \newline
   622 
   628 However, had $u$ been the first matched node, u would have been
   623 However, had $u$ been the first matched node, u would have been
   629 matched immediately to v, so all the mappings would have been
   624 matched immediately to v, so all the mappings would have been
   630 precluded in which node labels can not correspond.
   625 precluded in which node labels can not correspond.
   631 \end{example}
   626 \end{example}
   632 
   627 
   633 \begin{example}
   628 \begin{example}
   634 Suppose there is no node label given, $G_{1}$ is a small graph and
   629 Suppose there is no node label given, and $G_{1}$ is a small graph that can not be mapped into $G_{2}$ and $u\in V_{1}$.
   635 can not be mapped into $G_{2}$ and $u\in V_{1}$.
       
   636 \newline
   630 \newline
   637 Let $G'_{1}:=(V_{1}\cup
   631 Let $G'_{1}:=(V_{1}\cup
   638 \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup
   632 \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup
   639 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
   633 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
   640 $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint
   634 $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint
   641 from $G_{1}$ and one of its starting points is connected to $u\in
   635 from $G_{1}$ and one of its starting points is connected to $u\in
   642 V_{1}\}$.
   636 V_{1}\}$.
   643 \newline
   637 
   644 Is there a subgraph of $G_{2}$, which is isomorph with
       
   645 $G'_{1}$?
       
   646 \newline
       
   647 If unfortunately the nodes of the path were the first $k$ nodes in the
   638 If unfortunately the nodes of the path were the first $k$ nodes in the
   648 matching order, the algorithm would iterate through all the possible k
   639 matching order, the algorithm would iterate through all the possible k
   649 long paths in $G_{2}$, and it would recognize that no path can be
   640 long paths in $G_{2}$, and it would recognize that no path can be
   650 extended to $G'_{1}$.
   641 extended to $G'_{1}$.
   651 \newline
   642 \newline
   654 \end{example}
   645 \end{example}
   655 
   646 
   656 These examples may look artificial, but the same problems also appear
   647 These examples may look artificial, but the same problems also appear
   657 in real-world instances, even though in a less obvious way.
   648 in real-world instances, even though in a less obvious way.
   658 
   649 
   659 \subsection{Preparations}
   650 %\subsection{Preparations}
   660 \begin{claim}
   651 %\begin{claim}
   661 \label{claim:claimCoverFromLeft}
   652 %\label{claim:claimCoverFromLeft}
   662 The total ordering relation uniquely determines a node order, in which
   653 %The total ordering relation uniquely determines a node order, in which
   663 the nodes of $V_{1}$ will be covered by VF2. From the point of
   654 %the nodes of $V_{1}$ will be covered by VF2. From the point of
   664 view of the matching procedure, this means, that always the same node
   655 %view of the matching procedure, this means, that always the same node
   665 of $G_{1}$ will be covered on the d-th level.
   656 %of $G_{1}$ will be covered on the $d$-th level.
   666 \end{claim}
   657 %\end{claim}
   667 
   658 
   668 \begin{definition}
   659 %\begin{definition}
   669 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of
   660 %An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of
   670 $V_{1}$ is \textbf{matching order} if exists $\prec$ total
   661 %$V_{1}$ is \textbf{matching order} if there exists $\prec$ total
   671 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
   662 %ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
   672 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$.
   663 %pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$.
   673 \end{definition}
   664 %\end{definition}
   674 
   665 
   675 \begin{claim}\label{claim:MOclaim}
   666 %\begin{claim}\label{claim:MOclaim}
   676 A total ordering is matching order iff the nodes of every component
   667 %A total ordering is matching order iff the nodes of every component
   677 form an interval in the node sequence, and every node connects to a
   668 %form an interval in the node sequence, and every node connects to a
   678 previous node in its component except the first node of each component.
   669 %previous node in its component except the first node of each component.
   679 \end{claim}
   670 %\end{claim}
   680 
   671 
   681 To summing up, a total ordering always uniquely determines a matching
   672 %In summary, a total ordering always uniquely determines a matching
   682 order, and every matching order can be determined by a total ordering,
   673 %order, and every matching order can be determined by a total ordering,
   683 however, more than one different total orderings may determine the
   674 %however, more than one different total orderings may determine the
   684 same matching order.
   675 %same matching order.
   685 
   676 
   686 \subsection{Total ordering}
   677 \subsection{Matching order}
   687 The matching order will be searched directly.
       
   688 \begin{notation}
   678 \begin{notation}
   689 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} :
   679 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} :
   690 l=\mathcal{L}(v)\}|-|\{u\in V_{1}\backslash \mathcal{M} : l=\mathcal{L}(u)\}|$ ,
   680 l=\mathcal{L}(v)\}|-|\{u\in \mathcal{M} : l=\mathcal{L}(u)\}|$,
   691 where $l$ is a label and $\mathcal{M}\subseteq V_{1}$.
   681 where $l$ is a label and $\mathcal{M}\subseteq V_{1}$.
   692 \end{notation}
   682 \end{notation}
   693 
   683 
   694 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
   684 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{(-f)}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
   695 \end{definition}
   685 \end{definition}
   696 
   686 
   697 \begin{algorithm}
   687 \begin{notation}
       
   688 Let $deg(v)$ denote the degree of node $v$.
       
   689 \end{notation}
       
   690 
       
   691 \begin{algorithm}[H]
   698 \algtext*{EndIf}
   692 \algtext*{EndIf}
   699 \algtext*{EndProcedure}
   693 \algtext*{EndProcedure}
   700 \algtext*{EndWhile}
   694 \algtext*{EndWhile}
   701 \algtext*{EndFor}
   695 \algtext*{EndFor}
   702 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
   696 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
   705 \Comment{matching order} \While{$V_{1}\backslash \mathcal{M}
   699 \Comment{matching order} \While{$V_{1}\backslash \mathcal{M}
   706   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
   700   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
   707 min$_{F_\mathcal{M}\circ \mathcal{L}}(V_{1}\backslash
   701 min$_{F_\mathcal{M}\circ \mathcal{L}}(V_{1}\backslash
   708 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
   702 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
   709 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
   703 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
   710 $d$-th level \State Process $V_d$ \Comment{See Algorithm
   704 $d$-th level \State Process $V_d$ \Comment{See Algorithm~\ref{alg:VF2PPProcess1}} \EndFor
   711   \ref{alg:VF2PPProcess1}} \EndFor
       
   712 \EndWhile \EndProcedure
   705 \EndWhile \EndProcedure
   713 \end{algorithmic}
   706 \end{algorithmic}
   714 \end{algorithm}
   707 \end{algorithm}
   715 
   708 
   716 \begin{algorithm}
   709 \begin{algorithm}
   718 \algtext*{EndProcedure}%ne nyomtasson ..
   711 \algtext*{EndProcedure}%ne nyomtasson ..
   719 \algtext*{EndWhile}
   712 \algtext*{EndWhile}
   720 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
   713 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
   721 \begin{algorithmic}[1]
   714 \begin{algorithmic}[1]
   722 \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$}
   715 \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$}
   723 \State $m\in$ arg min$_{F_{\mathcal{M}\circ\ \mathcal{L}}}($ arg max$_{deg}($arg
   716 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ \mathcal{L}}($ arg max$_{deg}($arg
   724 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
   717 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
   725 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
   718 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
   726 $F_\mathcal{M}$ \EndWhile \EndProcedure
   719 $F_\mathcal{M}$ \EndWhile \EndProcedure
   727 \end{algorithmic}
   720 \end{algorithmic}
   728 \end{algorithm}
   721 \end{algorithm}
   729 
   722 
   730 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
   723 Algorithm~\ref{alg:VF2PPPseu} is a high-level description of the
   731 matching order procedure of VF2++. It computes a BFS tree for each
   724 matching order procedure of VF2++. It computes a BFS tree for each
   732 component in ascending order of their rarest node labels and largest $deg$,
   725 component in ascending order of their rarest node labels and largest $deg$,
   733 whose root vertex is the component's minimal
   726 whose root vertex is the minimal node of its component. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending
   734 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending
       
   735 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately
   727 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately
   736 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.
   728 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.
   737 
   729 
   738 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}
   730 \begin{claim}
   739 provides a matching order.
   731 Algorithm~\ref{alg:VF2PPPseu} provides a matching order.
       
   732 \end{claim}
   740 
   733 
   741 
   734 
   742 \subsection{Cutting rules}
   735 \subsection{Cutting rules}
   743 \label{VF2PPCuttingRules}
   736 \label{VF2PPCuttingRules}
   744 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels.
   737 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels.
   748 $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge
   741 $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge
   749 \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in
   742 \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in
   750 V_{2}$ and $l$ is a label.
   743 V_{2}$ and $l$ is a label.
   751 \end{notation}
   744 \end{notation}
   752 
   745 
   753 \subsubsection{Induced subgraph isomorphism}
   746 \begin{claim}[Cutting function for ISO]
   754 \begin{claim}
   747 \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function for ISO.
   755 \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by IND.
       
   756 \end{claim}
   748 \end{claim}
   757 \subsubsection{Graph isomorphism}
   749 
   758 \begin{claim}
   750 \begin{claim}[Cutting function for IND]
   759 \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by ISO.
   751 \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function for IND.
   760 \end{claim}
   752 \end{claim}
   761 
   753 
   762 \subsubsection{Subgraph isomorphism}
   754 \begin{claim}[Cutting function for SUB]
   763 \begin{claim}
   755 \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function for SUB.
   764 \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function by SUB.
       
   765 \end{claim}
   756 \end{claim}
   766 
   757 
   767 
   758 
   768 
   759 
   769 \section{Implementation details}\label{sec:VF2ppImpl}
   760 \section{Implementation details}\label{sec:VF2ppImpl}
   770 This section provides a detailed summary of an efficient
   761 This section provides a detailed summary of an efficient
   771 implementation of VF2++.
   762 implementation of VF2++.
       
   763 \begin{notation}
       
   764 Let $\Delta_1$ and $\Delta_2$ denote the largest degree in $G_1$ and $G_2$, respectively, and let $\Delta=\max\{\Delta_1,\Delta_2\}$.
       
   765 \end{notation}
   772 \subsection{Storing a mapping}
   766 \subsection{Storing a mapping}
   773 After fixing an arbitrary node order ($u_0, u_1, ..,
   767 After fixing an arbitrary node order ($u_0, u_1, ..,
   774 u_{|G_{1}|-1}$) of $G_{1}$, an array $M$ is usable to store
   768 u_{|V_{1}|-1}$) of $G_{1}$, an array $M$ can be used to store
   775 the current mapping in the following way.
   769 the current mapping in the following way.
   776 \[
   770 \[
   777  M[i] =
   771  M[i] =
   778   \begin{cases} 
   772   \begin{cases} 
   779    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D &
   773    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D &
   780    if\ no\ node\ has\ been\ mapped\ to\ u_i,
   774    if\ no\ node\ has\ been\ mapped\ to\ u_i,
   781   \end{cases}
   775   \end{cases}
   782 \]
   776 \]
   783 where $i\in\{0,1, ..,|G_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$
   777 where $i\in\{0,1, ..,|V_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$
   784 means "no node".
   778 means "no node".
   785 \subsection{Avoiding the recurrence}
   779 \subsection{Avoiding the recurrence}
   786 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
   780 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
   787 as a \textit{while loop}, which has a loop counter $depth$ denoting the
   781 as a \textit{while loop}, which has a loop counter $depth$ denoting the
   788 all-time depth of the recursion. Fixing a matching order, let $M$
   782 current depth of the recursion. Fixing a matching order, let $M$
   789 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},
   783 denote the array storing the current mapping. Observe that
   790 $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before
   784 $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before
   791 $depth$. $M[depth]$ changes
   785 $depth$. $M[depth]$ changes
   792 while the state is being processed, but the property is held before
   786 while the state is being processed, but the property is held before
   793 both stepping back to a predecessor state and exploring a successor
   787 both stepping back to a predecessor state and exploring a successor
   794 state.
   788 state.
   795 
   789 
   796 The necessary part of the candidate set is easily maintainable or
   790 The necessary part of the candidate set is easily maintainable or
   797 computable by following
   791 computable by following
   798 Section~\ref{candidateComputingVF2}. A much faster method
   792 the steps described in Section~\ref{candidateComputingVF2}. A much faster method
   799 has been designed for biological- and sparse graphs, see the next
   793 has been designed for biological and sparse graphs, see the next
   800 section for details.
   794 section for details.
   801 
   795 
   802 \subsection{Calculating the candidates for a node}
   796 \subsection{Calculating the candidates for a node}
   803 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
   797 The task is not to maintain the candidate set, but to generate the
   804 task is not to maintain the candidate set, but to generate the
       
   805 candidate nodes in $G_{2}$ for a given node $u\in V_{1}$.  In
   798 candidate nodes in $G_{2}$ for a given node $u\in V_{1}$.  In
   806 case of any of the three problem types and a mapping $\mathfrak{m}$, if a node $v\in
   799 case of any of the three problem types and a mapping $\mathfrak{m}$, if a node $v\in
   807 V_{2}$ is a potential pair of $u\in V_{1}$, then $\forall
   800 V_{2}$ is a potential pair of $u\in V_{1}$, then $\forall
   808 u'\in \mathfrak{D}(\mathfrak{m}) : (u,u')\in
   801 u'\in \mathfrak{D}(\mathfrak{m}) : (u,u')\in
   809 E_{1}\Rightarrow (v,\mathfrak{m}(u'))\in
   802 E_{1}\Rightarrow (v,\mathfrak{m}(u'))\in
   810 E_{2}$. That is, each covered neighbour of $u$ has to be mapped to
   803 E_{2}$. That is, each covered neighbour of $u$ has to be mapped to
   811 a covered neighbour of $v$.
   804 a covered neighbour of $v$, i.e. selecting arbitrarily a covered neighbour $u'$ of $u$, all of the admissible candidates for $u$ are among the neighbours of $\mathfrak{m}(u')$.
   812 
   805 
   813 Having said that, an algorithm running in $\Theta(deg)$ time is
   806 Having said that, an algorithm running in $\Theta(\Delta_2)$ time is
   814 describable if there exists a covered node in the component containing
   807 describable if there exists a covered node in the component containing
   815 $u$, and a linear one otherwise.
   808 $u$, and a linear one otherwise.
   816 
   809 
   817 
   810 
   818 \subsection{Determining the node order}
   811 \subsection{Determining the node order}
   821 
   814 
   822 For using lookup tables, the node labels are associated with the
   815 For using lookup tables, the node labels are associated with the
   823 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
   816 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
   824 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order
   817 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order
   825 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
   818 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
   826 in $V_{1}$ having label $i$, which is easy to compute in
   819 in $V_{2}$ having label $i$, which is easy to compute in
   827 $\Theta(|V_{1}|)$ steps.
   820 $\Theta(|V_{2}|)$ steps.
   828 
   821 
   829 Representing $\mathcal{M}\subseteq V_{1}$ as an array of
   822 Representing $\mathcal{M}\subseteq V_{1}$ as an array of
   830 size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.
   823 size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done in-place by swapping nodes.
   831 
   824 
   832 \subsection{Cutting rules}
   825 \subsection{Cutting rules}
   833 In Section~\ref{VF2PPCuttingRules}, the cutting rules were
   826 In Section~\ref{VF2PPCuttingRules}, the cutting rules were
   834 described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$
   827 described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$
   835 and $\tilde T_{2}$, which are dependent on the all-time mapping
   828 and $\tilde T_{2}$, which are dependent on the current mapping. The aim is to check the labelled cutting
   836 (i.e. on the all-time state). The aim is to check the labeled cutting
   829 rules of VF2++ in $\Theta(\Delta)$ time.
   837 rules of VF2++ in $\Theta(deg)$ time.
       
   838 
   830 
   839 Firstly, suppose that these four sets are given in such a way, that
   831 Firstly, suppose that these four sets are given in such a way, that
   840 checking whether a node is in a certain set takes constant time,
   832 checking whether a node is in a certain set takes constant time,
   841 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
   833 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
   842 initially zero integer lookup table of size $|K|$. After incrementing
   834 initially zero integer lookup table of size $|K|$. After incrementing
   843 $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and
   835 $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and
   844 decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap
   836 decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap
   845 T_{2}(s)$, the first part of the cutting rules is checkable in
   837 T_{2}(\mathfrak{m})$, the first part of the cutting rules is checkable in
   846 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
   838 $\Theta(\Delta)$ time by considering the proper signs of $L$. Setting $L$
   847 to zero takes $\Theta(deg)$ time again, which makes it possible to use
   839 to zero takes $\Theta(\Delta)$ time again, which makes it possible to use
   848 the same table through the whole algorithm. The second part of the
   840 the same table through the whole algorithm. The second part of the
   849 cutting rules can be verified using the same method with $\tilde
   841 cutting rules can be verified using the same method with $\tilde
   850 T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and
   842 T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and
   851 $T_{2}$. Thus, the overall complexity is $\Theta(deg)$.
   843 $T_{2}$. Thus, the overall time complexity is $\Theta(\Delta)$.
   852 
   844 
   853 Another integer lookup table storing the number of covered neighbours
   845 To maintain the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$
   854 of each node in $G_{2}$ gives all the information about the sets
   846 and $\tilde T_{2}$, two other integer lookup tables storing the number of covered neighbours of the nodes of the two graphs can be used. This representation allows constant-time membership checking, furthermore it is maintainable in $\Theta(\Delta)$ time whenever a node pair is added or subtracted by incrementing
   855 $T_{2}$ and $\tilde T_{2}$, which is maintainable in
       
   856 $\Theta(deg)$ time when a pair is added or substracted by incrementing
       
   857 or decrementing the proper indices. A further improvement is that the
   847 or decrementing the proper indices. A further improvement is that the
   858 values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on
   848 values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on
   859 $u$, i.e. on the size of the mapping, so for each $u\in V_{1}$ an
   849 $u$, i.e. on the current depth of the recursion, so for each $u\in V_{1}$, an array of pairs \textit{(label, number of such labels)} can store $L$. Note that these arrays are at most of size
   860 array of pairs (label, number of such labels) can be stored to skip
   850 $\Delta_1$ if pairs with non-appearing node labels are discarded.
   861 the maintaining operations. Note that these arrays are at most of size
       
   862 $deg$.
       
   863 
   851 
   864 Using similar techniques, the consistency function can be evaluated in
   852 Using similar techniques, the consistency function can be evaluated in
   865 $\Theta(deg)$ steps, as well.
   853 $\Theta(\Delta)$ steps, as well.
   866 
   854 
   867 \section{Experimental results}\label{sec:ExpRes}
   855 \section{Experimental results}\label{sec:ExpRes}
   868 This section compares the performance of VF2++ and VF2 Plus. According to
   856 This section compares the performance of VF2++ and VF2~Plus. According to
   869 our experience, both algorithms run faster than VF2 with orders of
   857 our experience, both algorithms run faster than VF2 with orders of
   870 magnitude, thus its inclusion was not reasonable.
   858 magnitude, thus its inclusion was not reasonable.
   871 
   859 
   872 The algorithms were implemented in C++ using the open source
   860 The algorithms were implemented in C++ using the open-source
   873 LEMON graph and network optimization library\cite{LEMON}. The test were carried out on a linux based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM.
   861 LEMON graph and network optimization library~\cite{LEMON}. The tests were carried out on a Linux-based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM.
   874 \subsection{Biological graphs}
   862 \subsection{Biological graphs}
   875 The tests have been executed on a recent biological dataset created
   863 The tests have been executed on a recent biological dataset created
   876 for the International Contest on Pattern Search in Biological
   864 for the International Contest on Pattern Search in Biological
   877 Databases\cite{Content}, which has been constructed of molecule,
   865 Databases~\cite{Content}, which has been constructed of molecule,
   878 protein and contact map graphs extracted from the Protein Data
   866 protein and contact map graphs extracted from the Protein Data
   879 Bank\cite{ProteinDataBank}.
   867 Bank~\cite{ProteinDataBank}.
   880 
   868 
   881 The molecule dataset contains small graphs with less than 100 nodes
   869 The molecule dataset contains small graphs with less than 100 nodes
   882 and an average degree of less than 3. The protein dataset contains
   870 and an average degree of less than 3. The protein dataset contains
   883 graphs having 500-10 000 nodes and an average degree of 4, while the
   871 graphs having 500-10 000 nodes and an average degree of 4, while the
   884 contact map dataset contains graphs with 150-800 nodes and an average
   872 contact map dataset contains graphs with 150-800 nodes and an average
   885 degree of 20.  \\
   873 degree of 20.  \\
   886 
   874 
   887 In the following, both the induced subgraph isomorphism and the graph
   875 In the following, both the induced subgraph and the graph
   888 isomorphism will be examined.
   876 isomorphism problems will be examined.
   889 
       
   890 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}.
   877 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}.
   891 
   878 
   892 In an other experiment, the nodes of each graph in the database had been
   879 In an other experiment, the nodes of each graph in the database had been
   893 shuffled, and an isomorphism between the shuffled and the original
   880 shuffled, and an isomorphism between the shuffled and the original
   894 graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}.
   881 graph was searched. The running times are shown on Figure~\ref{fig:bioISO}.
   895 
   882 
   896 
   883 
   897 \begin{figure}[H]
   884 \begin{figure}[H]
   898 \vspace*{-2cm}
   885 \vspace*{-2cm}
   899 \hspace*{-1.5cm}
   886 \hspace*{-1.5cm}
   900 \begin{subfigure}[b]{0.55\textwidth}
   887 \begin{subfigure}[b]{0.55\textwidth}
   901 \begin{figure}[H]
   888 \begin{figure}[H]
   902 \begin{tikzpicture}[trim axis left, trim axis right]
   889 \begin{tikzpicture}[trim axis left, trim axis right]
   903 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   890 \begin{axis}[title=Molecules IND,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   904 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   891 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   905   west},scaled x ticks = false,x tick label style={/pgf/number
   892   west},scaled x ticks = false,x tick label style={/pgf/number
   906   format/1000 sep = \thinspace}]
   893   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
   894   format/1000 sep = \kern 0.08em}]
   907 %\addplot+[only marks] table {proteinsOrig.txt};
   895 %\addplot+[only marks] table {proteinsOrig.txt};
   908 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark
   896 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark
   909   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
   897   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
   910 \end{axis}
   898 \end{axis}
   911 \end{tikzpicture}
   899 \end{tikzpicture}
   916 \end{subfigure}
   904 \end{subfigure}
   917 \hspace*{1.5cm}
   905 \hspace*{1.5cm}
   918 \begin{subfigure}[b]{0.55\textwidth}
   906 \begin{subfigure}[b]{0.55\textwidth}
   919 \begin{figure}[H]
   907 \begin{figure}[H]
   920 \begin{tikzpicture}[trim axis left, trim axis right]
   908 \begin{tikzpicture}[trim axis left, trim axis right]
   921 \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   909 \begin{axis}[title=Contact maps IND,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   922 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   910 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   923   west},scaled x ticks = false,x tick label style={/pgf/number
   911   west},scaled x ticks = false,x tick label style={/pgf/number
   924   format/1000 sep = \thinspace}]
   912   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
   913   format/1000 sep = \kern 0.08em}]
   925 %\addplot+[only marks] table {proteinsOrig.txt};
   914 %\addplot+[only marks] table {proteinsOrig.txt};
   926 \addplot table {Orig/ContactMaps.128.txt};
   915 \addplot table {Orig/ContactMaps.128.txt};
   927 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
   916 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
   928         {VF2PPLabel/ContactMaps.128.txt};
   917         {VF2PPLabel/ContactMaps.128.txt};
   929 \end{axis}
   918 \end{axis}
   930 \end{tikzpicture}
   919 \end{tikzpicture}
   931 \caption{On contact maps, VF2++ runs almost in constant time, while VF2
   920 \caption{On contact maps, VF2++ runs almost in constant time, while VF2
   932   Plus has a near linear behaviour.} \label{fig:INDContact}
   921   Plus has a near-linear behaviour.} \label{fig:INDContact}
   933 \end{figure}
   922 \end{figure}
   934 \end{subfigure}
   923 \end{subfigure}
   935 
   924 
   936 \begin{center}
   925 \begin{center}
   937 \vspace*{-0.5cm}
   926 \vspace*{-0.5cm}
   938 \begin{subfigure}[b]{0.55\textwidth}
   927 \begin{subfigure}[b]{0.55\textwidth}
   939 \begin{figure}[H]
   928 \begin{figure}[H]
   940 \begin{tikzpicture}[trim axis left, trim axis right]
   929 \begin{tikzpicture}[trim axis left, trim axis right]
   941   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   930   \begin{axis}[title=Proteins IND,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   942   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   931   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   943     west},scaled x ticks = false,x tick label style={/pgf/number
   932     west},scaled x ticks = false,x tick label style={/pgf/number
   944     format/1000 sep = \thinspace}] %\addplot+[only marks] table
   933   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
   934   format/1000 sep = \kern 0.08em}] %\addplot+[only marks] table
   945     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]
   935     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]
   946     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark
   936     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark
   947       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
   937       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
   948   \end{axis}
   938   \end{axis}
   949   \end{tikzpicture}
   939   \end{tikzpicture}
   950 \caption{Both the algorithms have linear behaviour on protein
   940 \caption{Both of the algorithms have linear behaviour on protein
   951   graphs. VF2++ is more than 10 times faster than VF2
   941   graphs. VF2++ is more than 10 times faster than VF2
   952   Plus.} \label{fig:INDProt}
   942   Plus.} \label{fig:INDProt}
   953 \end{figure}
   943 \end{figure}
   954 \end{subfigure}
   944 \end{subfigure}
   955 \end{center}
   945 \end{center}
   956 \vspace*{-0.5cm}
   946 \vspace*{-0.5cm}
   957 \caption{\normalsize{Induced subgraph isomorphism on biological graphs}}\label{fig:bioIND}
   947 \caption{\normalsize{Induced subgraph isomorphism problem on biological graphs}}\label{fig:bioIND}
   958 \end{figure}
   948 \end{figure}
   959 
   949 
   960 
   950 
   961 \begin{figure}[H]
   951 \begin{figure}[H]
   962 \vspace*{-2cm}
   952 \vspace*{-2cm}
   963 \hspace*{-1.5cm}
   953 \hspace*{-1.5cm}
   964 \begin{subfigure}[b]{0.55\textwidth}
   954 \begin{subfigure}[b]{0.55\textwidth}
   965 \begin{figure}[H]
   955 \begin{figure}[H]
   966 \begin{tikzpicture}[trim axis left, trim axis right]
   956 \begin{tikzpicture}[trim axis left, trim axis right]
   967 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   957 \begin{axis}[title=Molecules ISO,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   968 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   958 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   969   west},scaled x ticks = false,x tick label style={/pgf/number
   959   west},scaled x ticks = false,x tick label style={/pgf/number
   970   format/1000 sep = \thinspace}]
   960   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
   961   format/1000 sep = \kern 0.08em}]
   971 %\addplot+[only marks] table {proteinsOrig.txt};
   962 %\addplot+[only marks] table {proteinsOrig.txt};
   972 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark
   963 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark
   973   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
   964   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
   974 \end{axis}
   965 \end{axis}
   975 \end{tikzpicture}
   966 \end{tikzpicture}
   976 \caption{In the case of molecules, there is not such a significant
   967 \caption{The results are close to each other on contact maps, but VF2++ seems to be slightly faster as the number of nodes increases.
   977   difference, but VF2++ seems to be faster as the number of nodes
   968   }\label{fig:ISOMolecule}
   978   increases.}\label{fig:ISOMolecule}
       
   979 \end{figure}
   969 \end{figure}
   980 \end{subfigure}
   970 \end{subfigure}
   981 \hspace*{1.5cm}
   971 \hspace*{1.5cm}
   982 \begin{subfigure}[b]{0.55\textwidth}
   972 \begin{subfigure}[b]{0.55\textwidth}
   983 \begin{figure}[H]
   973 \begin{figure}[H]
   984 \begin{tikzpicture}[trim axis left, trim axis right]
   974 \begin{tikzpicture}[trim axis left, trim axis right]
   985 \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   975 \begin{axis}[title=Contact maps ISO,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   986 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   976 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   987   west},scaled x ticks = false,x tick label style={/pgf/number
   977   west},scaled x ticks = false,x tick label style={/pgf/number
   988   format/1000 sep = \thinspace}]
   978   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
   979   format/1000 sep = \kern 0.08em}]
   989 %\addplot+[only marks] table {proteinsOrig.txt};
   980 %\addplot+[only marks] table {proteinsOrig.txt};
   990 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark
   981 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark
   991   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
   982   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
   992 \end{axis}
   983 \end{axis}
   993 \end{tikzpicture}
   984 \end{tikzpicture}
   994 \caption{The results are closer to each other on contact maps, but
   985 \caption{In the case of molecules, there is no significant
   995   VF2++ still performs consistently better.}\label{fig:ISOContact}
   986   difference, but VF2++ performs consistently better.}\label{fig:ISOContact}
   996 \end{figure}
   987 \end{figure}
   997 \end{subfigure}
   988 \end{subfigure}
   998 
   989 
   999 \begin{center}
   990 \begin{center}
  1000 \vspace*{-0.5cm}
   991 \vspace*{-0.5cm}
  1001 \begin{subfigure}[b]{0.55\textwidth}
   992 \begin{subfigure}[b]{0.55\textwidth}
  1002 \begin{figure}[H]
   993 \begin{figure}[H]
  1003 \begin{tikzpicture}[trim axis left, trim axis right]
   994 \begin{tikzpicture}[trim axis left, trim axis right]
  1004 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   995 \begin{axis}[title=Proteins ISO,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1005 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   996 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1006   west},scaled x ticks = false,x tick label style={/pgf/number
   997   west},scaled x ticks = false,x tick label style={/pgf/number
  1007   format/1000 sep = \thinspace}]
   998   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
   999   format/1000 sep = \kern 0.08em}]
  1008 %\addplot+[only marks] table {proteinsOrig.txt};
  1000 %\addplot+[only marks] table {proteinsOrig.txt};
  1009 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark
  1001 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark
  1010   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
  1002   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
  1011 \end{axis}
  1003 \end{axis}
  1012 \end{tikzpicture}
  1004 \end{tikzpicture}
  1013 \caption{On protein graphs, VF2 Plus has a super linear time
  1005 \caption{On protein graphs, VF2~Plus has a super linear time
  1014   complexity, while VF2++ runs in near constant time. The difference
  1006   complexity, while VF2++ runs in near constant time. The difference
  1015   is about two order of magnitude on large graphs.}\label{fig:ISOProt}
  1007   is about two orders of magnitude on large graphs.}\label{fig:ISOProt}
  1016 \end{figure}
  1008 \end{figure}
  1017 \end{subfigure}
  1009 \end{subfigure}
  1018 \end{center}
  1010 \end{center}
  1019 \vspace*{-0.6cm}
  1011 \vspace*{-0.6cm}
  1020 \caption{\normalsize{Graph isomorphism on biological graphs}}\label{fig:bioISO}
  1012 \caption{\normalsize{Graph isomorphism problem on biological graphs}}\label{fig:bioISO}
  1021 \end{figure}
  1013 \end{figure}
  1022 
  1014 
  1023 
  1015 
  1024 
  1016 
  1025 
  1017 
  1026 \subsection{Random graphs}
  1018 \subsection{Random graphs}
  1027 This section compares VF2++ with VF2 Plus on random graphs of a large
  1019 This section compares VF2++ with VF2~Plus on random graphs of large
  1028 size. The node labels are uniformly distributed.  Let $\delta$ denote
  1020 size. The node labels are uniformly distributed.  Let $\delta$ denote
  1029 the average degree.  For the parameters of problems solved in the
  1021 the average degree.  For the parameters of problems solved in the
  1030 experiments, please see the top of each chart.
  1022 experiments, please see the top of each chart.
  1031 \subsubsection{Graph isomorphism}
  1023 \subsubsection{Graph isomorphism problem}
  1032 To evaluate the efficiency of the algorithms in the case of graph
  1024 To evaluate the efficiency of the algorithms in the case of graph
  1033 isomorphism, random connected graphs of less than 20 000 nodes have been
  1025 isomorphism problem, random connected graphs of less than 20 000 nodes have been
  1034 considered. Generating a random graph and shuffling its nodes, an
  1026 considered. Generating a random graph and shuffling its nodes, an
  1035 isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results
  1027 isomorphism had to be found. Figure~\ref{fig:randISO} shows the runtime results
  1036 on graph sets of various density.
  1028 on graph sets of various density.
  1037 
  1029 
  1038 
  1030 
  1039 
  1031 
  1040 
  1032 
  1042 \vspace*{-1.5cm}
  1034 \vspace*{-1.5cm}
  1043 \hspace*{-1.5cm}
  1035 \hspace*{-1.5cm}
  1044 \begin{subfigure}[b]{0.55\textwidth}
  1036 \begin{subfigure}[b]{0.55\textwidth}
  1045 \begin{center}
  1037 \begin{center}
  1046 \begin{tikzpicture}
  1038 \begin{tikzpicture}
  1047 \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1039 \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1048 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1040 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1049   west},scaled x ticks = false,x tick label style={/pgf/number
  1041   west},scaled x ticks = false,x tick label style={/pgf/number
  1050   format/1000 sep = \space}]
  1042   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1043   format/1000 sep = \kern 0.08em}]
  1051 %\addplot+[only marks] table {proteinsOrig.txt};
  1044 %\addplot+[only marks] table {proteinsOrig.txt};
  1052 \addplot table {randGraph/iso/vf2pIso5_1.txt};
  1045 \addplot table {randGraph/iso/vf2pIso5_1.txt};
  1053 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1046 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1054         {randGraph/iso/vf2ppIso5_1.txt};
  1047         {randGraph/iso/vf2ppIso5_1.txt};
  1055 \end{axis}
  1048 \end{axis}
  1058 \end{subfigure}
  1051 \end{subfigure}
  1059 %\hspace{1cm}
  1052 %\hspace{1cm}
  1060 \begin{subfigure}[b]{0.55\textwidth}
  1053 \begin{subfigure}[b]{0.55\textwidth}
  1061 \begin{center}
  1054 \begin{center}
  1062 \begin{tikzpicture}
  1055 \begin{tikzpicture}
  1063 \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1056 \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1064 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1057 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1065   west},scaled x ticks = false,x tick label style={/pgf/number
  1058   west},scaled x ticks = false,x tick label style={/pgf/number
  1066   format/1000 sep = \space}]
  1059   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1060   format/1000 sep = \kern 0.08em}]
  1067 %\addplot+[only marks] table {proteinsOrig.txt};
  1061 %\addplot+[only marks] table {proteinsOrig.txt};
  1068 \addplot table {randGraph/iso/vf2pIso10_1.txt};
  1062 \addplot table {randGraph/iso/vf2pIso10_1.txt};
  1069 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1063 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1070         {randGraph/iso/vf2ppIso10_1.txt};
  1064         {randGraph/iso/vf2ppIso10_1.txt};
  1071 \end{axis}
  1065 \end{axis}
  1075 %%\hspace{1cm}
  1069 %%\hspace{1cm}
  1076 \hspace*{-1.5cm}
  1070 \hspace*{-1.5cm}
  1077 \begin{subfigure}[b]{0.55\textwidth}
  1071 \begin{subfigure}[b]{0.55\textwidth}
  1078 \begin{center}
  1072 \begin{center}
  1079 \begin{tikzpicture}
  1073 \begin{tikzpicture}
  1080 \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1074 \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1081 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1075 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1082   west},scaled x ticks = false,x tick label style={/pgf/number
  1076   west},scaled x ticks = false,x tick label style={/pgf/number
  1083   format/1000 sep = \space}]
  1077   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1078   format/1000 sep = \kern 0.08em}]
  1084 %\addplot+[only marks] table {proteinsOrig.txt};
  1079 %\addplot+[only marks] table {proteinsOrig.txt};
  1085 \addplot table {randGraph/iso/vf2pIso15_1.txt};
  1080 \addplot table {randGraph/iso/vf2pIso15_1.txt};
  1086 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1081 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1087         {randGraph/iso/vf2ppIso15_1.txt};
  1082         {randGraph/iso/vf2ppIso15_1.txt};
  1088 \end{axis}
  1083 \end{axis}
  1090 \end{center}
  1085 \end{center}
  1091      \end{subfigure}
  1086      \end{subfigure}
  1092      \begin{subfigure}[b]{0.55\textwidth}
  1087      \begin{subfigure}[b]{0.55\textwidth}
  1093 \begin{center}
  1088 \begin{center}
  1094 \begin{tikzpicture}
  1089 \begin{tikzpicture}
  1095 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1090 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1096 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1091 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1097   west},scaled x ticks = false,x tick label style={/pgf/number
  1092   west},scaled x ticks = false,x tick label style={/pgf/number
  1098   format/1000 sep = \thinspace}]
  1093   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1094   format/1000 sep = \kern 0.08em}]
  1099 %\addplot+[only marks] table {proteinsOrig.txt};
  1095 %\addplot+[only marks] table {proteinsOrig.txt};
  1100 \addplot table {randGraph/iso/vf2pIso100_1.txt};
  1096 \addplot table {randGraph/iso/vf2pIso100_1.txt};
  1101 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1097 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1102         {randGraph/iso/vf2ppIso100_1.txt};
  1098         {randGraph/iso/vf2ppIso100_1.txt};
  1103 \end{axis}
  1099 \end{axis}
  1104 \end{tikzpicture}
  1100 \end{tikzpicture}
  1105 \end{center}
  1101 \end{center}
  1106 \end{subfigure}
  1102 \end{subfigure}
  1107 \vspace*{-0.8cm}
  1103 \vspace*{-0.8cm}
  1108 \caption{ISO on random graphs.
  1104 \caption{Graph isomorphism problem on random graphs
  1109 }\label{fig:randISO}
  1105 }\label{fig:randISO}
  1110 \end{figure}
  1106 \end{figure}
  1111 
  1107 
  1112 
  1108 
  1113 \subsubsection{Induced subgraph isomorphism}
  1109 \subsubsection{Induced subgraph isomorphism problem}
  1114 This section presents a comparison of VF2++ and VF2 Plus in the case
  1110 This section presents a comparison of VF2++ and VF2~Plus in the case
  1115 of induced subgraph isomorphism. In addition to the size of the large
  1111 of induced subgraph isomorphism problem. In addition to the size of graph $G_2$, that of $G_1$ dramatically influences the hardness of
  1116 graph, that of the small graph dramatically influences the hardness of
  1112 a given problem too, so the overall picture is provided by examining graphs to be embedded of various size.
  1117 a given problem too, so the overall picture is provided by examining
       
  1118 small graphs of various size.
       
  1119 
  1113 
  1120 For each chart, a number $0<\rho< 1$ has been fixed, and the following
  1114 For each chart, a number $0<\rho< 1$ has been fixed, and the following
  1121 has been executed 150 times. Generating a large graph $G_{2}$ of an average degree of $\delta$,
  1115 has been executed 150 times. Generating a large graph $G_{2}$ of an average degree of $\delta$,
  1122 choose 10 of its induced subgraphs having $\rho\ |V_{2}|$ nodes,
  1116 choose 10 of its induced subgraphs having $\rho|V_{2}|$ nodes,
  1123 and for all the 10 subgraphs find a mapping by using both the graph
  1117 and for all the 10 subgraphs find a mapping by using both graph
  1124 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
  1118 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
  1125 0.3, 0.8$ cases have been examined, see
  1119 0.3, 0.8$ cases have been examined, see
  1126 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
  1120 Figure~\ref{fig:randIND5},~\ref{fig:randIND10}~and~\ref{fig:randIND35}.
  1127 \ref{fig:randIND35}.
       
  1128 
  1121 
  1129 
  1122 
  1130 
  1123 
  1131 
  1124 
  1132 
  1125 
  1134 \vspace*{-1.5cm}
  1127 \vspace*{-1.5cm}
  1135 \hspace*{-1.5cm}
  1128 \hspace*{-1.5cm}
  1136 \begin{subfigure}[b]{0.55\textwidth}
  1129 \begin{subfigure}[b]{0.55\textwidth}
  1137 \begin{center}
  1130 \begin{center}
  1138 \begin{tikzpicture}
  1131 \begin{tikzpicture}
  1139 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1132 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1140 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1133 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1141   west},scaled x ticks = false,x tick label style={/pgf/number
  1134   west},scaled x ticks = false,x tick label style={/pgf/number
  1142   format/1000 sep = \space}]
  1135   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1136   format/1000 sep = \kern 0.08em}]
  1143 %\addplot+[only marks] table {proteinsOrig.txt};
  1137 %\addplot+[only marks] table {proteinsOrig.txt};
  1144 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
  1138 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
  1145 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1139 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1146         {randGraph/ind/vf2ppInd5_0.05.txt};
  1140         {randGraph/ind/vf2ppInd5_0.05.txt};
  1147 \end{axis}
  1141 \end{axis}
  1149 \end{center}
  1143 \end{center}
  1150      \end{subfigure}
  1144      \end{subfigure}
  1151      \begin{subfigure}[b]{0.55\textwidth}
  1145      \begin{subfigure}[b]{0.55\textwidth}
  1152 \begin{center}
  1146 \begin{center}
  1153 \begin{tikzpicture}
  1147 \begin{tikzpicture}
  1154 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1148 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1155 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1149 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1156   west},scaled x ticks = false,x tick label style={/pgf/number
  1150   west},scaled x ticks = false,x tick label style={/pgf/number
  1157   format/1000 sep = \space}]
  1151   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1152   format/1000 sep = \kern 0.08em}]
  1158 %\addplot+[only marks] table {proteinsOrig.txt};
  1153 %\addplot+[only marks] table {proteinsOrig.txt};
  1159 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
  1154 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
  1160 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1155 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1161         {randGraph/ind/vf2ppInd5_0.1.txt};
  1156         {randGraph/ind/vf2ppInd5_0.1.txt};
  1162 \end{axis}
  1157 \end{axis}
  1165 \end{subfigure}
  1160 \end{subfigure}
  1166 \hspace*{-1.5cm}
  1161 \hspace*{-1.5cm}
  1167 \begin{subfigure}[b]{0.55\textwidth}
  1162 \begin{subfigure}[b]{0.55\textwidth}
  1168 \begin{center}
  1163 \begin{center}
  1169 \begin{tikzpicture}
  1164 \begin{tikzpicture}
  1170 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1165 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1171 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1166 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1172   west},scaled x ticks = false,x tick label style={/pgf/number
  1167   west},scaled x ticks = false,x tick label style={/pgf/number
  1173   format/1000 sep = \space}]
  1168   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1169   format/1000 sep = \kern 0.08em}]
  1174 %\addplot+[only marks] table {proteinsOrig.txt};
  1170 %\addplot+[only marks] table {proteinsOrig.txt};
  1175 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
  1171 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
  1176 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1172 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1177         {randGraph/ind/vf2ppInd5_0.3.txt};
  1173         {randGraph/ind/vf2ppInd5_0.3.txt};
  1178 \end{axis}
  1174 \end{axis}
  1180 \end{center}
  1176 \end{center}
  1181      \end{subfigure}
  1177      \end{subfigure}
  1182      \begin{subfigure}[b]{0.55\textwidth}
  1178      \begin{subfigure}[b]{0.55\textwidth}
  1183 \begin{center}
  1179 \begin{center}
  1184 \begin{tikzpicture}
  1180 \begin{tikzpicture}
  1185 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1181 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1186 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1182 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1187   west},scaled x ticks = false,x tick label style={/pgf/number
  1183   west},scaled x ticks = false,x tick label style={/pgf/number
  1188   format/1000 sep = \space}]
  1184   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1185   format/1000 sep = \kern 0.08em}]
  1189 %\addplot+[only marks] table {proteinsOrig.txt};
  1186 %\addplot+[only marks] table {proteinsOrig.txt};
  1190 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
  1187 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
  1191 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1188 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1192         {randGraph/ind/vf2ppInd5_0.8.txt};
  1189         {randGraph/ind/vf2ppInd5_0.8.txt};
  1193 \end{axis}
  1190 \end{axis}
  1194 \end{tikzpicture}
  1191 \end{tikzpicture}
  1195 \end{center}
  1192 \end{center}
  1196 \end{subfigure}
  1193 \end{subfigure}
  1197 \vspace*{-0.8cm}
  1194 \vspace*{-0.8cm}
  1198 \caption{IND on graphs having an average degree of
  1195 \caption{Induced subgraph isomorphism problem on random graphs having an average degree of
  1199   5.}\label{fig:randIND5}
  1196   5}\label{fig:randIND5}
  1200 \end{figure}
  1197 \end{figure}
  1201 
  1198 
  1202 
  1199 
  1203 \begin{figure}
  1200 \begin{figure}
  1204 \vspace*{-1.5cm}
  1201 \vspace*{-1.5cm}
  1205 \hspace*{-1.5cm}
  1202 \hspace*{-1.5cm}
  1206 \begin{subfigure}[b]{0.55\textwidth}
  1203 \begin{subfigure}[b]{0.55\textwidth}
  1207 \begin{center}
  1204 \begin{center}
  1208 \hspace*{-0.5cm}
  1205 \hspace*{-0.5cm}
  1209 \begin{tikzpicture}
  1206 \begin{tikzpicture}
  1210 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1207 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1211 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1208 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1212   west},scaled x ticks = false,x tick label style={/pgf/number
  1209   west},scaled x ticks = false,x tick label style={/pgf/number
  1213   format/1000 sep = \space}]
  1210   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1211   format/1000 sep = \kern 0.08em}]
  1214 %\addplot+[only marks] table {proteinsOrig.txt};
  1212 %\addplot+[only marks] table {proteinsOrig.txt};
  1215 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
  1213 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
  1216 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1214 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1217         {randGraph/ind/vf2ppInd10_0.05.txt};
  1215         {randGraph/ind/vf2ppInd10_0.05.txt};
  1218 \end{axis}
  1216 \end{axis}
  1221      \end{subfigure}
  1219      \end{subfigure}
  1222      \begin{subfigure}[b]{0.55\textwidth}
  1220      \begin{subfigure}[b]{0.55\textwidth}
  1223 \begin{center}
  1221 \begin{center}
  1224      \hspace*{-0.5cm}
  1222      \hspace*{-0.5cm}
  1225 \begin{tikzpicture}
  1223 \begin{tikzpicture}
  1226 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1224 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1227 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1225 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1228   west},scaled x ticks = false,x tick label style={/pgf/number
  1226   west},scaled x ticks = false,x tick label style={/pgf/number
  1229   format/1000 sep = \space}]
  1227   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1228   format/1000 sep = \kern 0.08em}]
  1230 %\addplot+[only marks] table {proteinsOrig.txt};
  1229 %\addplot+[only marks] table {proteinsOrig.txt};
  1231 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
  1230 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
  1232 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1231 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1233         {randGraph/ind/vf2ppInd10_0.1.txt};
  1232         {randGraph/ind/vf2ppInd10_0.1.txt};
  1234 \end{axis}
  1233 \end{axis}
  1237 \end{subfigure}
  1236 \end{subfigure}
  1238 \hspace*{-1.5cm}
  1237 \hspace*{-1.5cm}
  1239 \begin{subfigure}[b]{0.55\textwidth}
  1238 \begin{subfigure}[b]{0.55\textwidth}
  1240 \begin{center}
  1239 \begin{center}
  1241 \begin{tikzpicture}
  1240 \begin{tikzpicture}
  1242 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1241 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1243 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1242 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1244   west},scaled x ticks = false,x tick label style={/pgf/number
  1243   west},scaled x ticks = false,x tick label style={/pgf/number
  1245   format/1000 sep = \space}]
  1244   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1245   format/1000 sep = \kern 0.08em}]
  1246 %\addplot+[only marks] table {proteinsOrig.txt};
  1246 %\addplot+[only marks] table {proteinsOrig.txt};
  1247 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
  1247 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
  1248 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1248 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1249         {randGraph/ind/vf2ppInd10_0.3.txt};
  1249         {randGraph/ind/vf2ppInd10_0.3.txt};
  1250 \end{axis}
  1250 \end{axis}
  1252 \end{center}
  1252 \end{center}
  1253      \end{subfigure}
  1253      \end{subfigure}
  1254      \begin{subfigure}[b]{0.55\textwidth}
  1254      \begin{subfigure}[b]{0.55\textwidth}
  1255 \begin{center}
  1255 \begin{center}
  1256 \begin{tikzpicture}
  1256 \begin{tikzpicture}
  1257 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1257 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1258 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1258 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1259   west},scaled x ticks = false,x tick label style={/pgf/number
  1259   west},scaled x ticks = false,x tick label style={/pgf/number
  1260   format/1000 sep = \space}]
  1260   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1261   format/1000 sep = \kern 0.08em}]
  1261 %\addplot+[only marks] table {proteinsOrig.txt};
  1262 %\addplot+[only marks] table {proteinsOrig.txt};
  1262 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
  1263 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
  1263 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1264 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1264         {randGraph/ind/vf2ppInd10_0.8.txt};
  1265         {randGraph/ind/vf2ppInd10_0.8.txt};
  1265 \end{axis}
  1266 \end{axis}
  1266 \end{tikzpicture}
  1267 \end{tikzpicture}
  1267 \end{center}
  1268 \end{center}
  1268 \end{subfigure}
  1269 \end{subfigure}
  1269 \vspace*{-0.8cm}
  1270 \vspace*{-0.8cm}
  1270 \caption{IND on graphs having an average degree of
  1271 \caption{Induced subgraph isomorphism problem on random graphs having an average degree of
  1271   10.}\label{fig:randIND10}
  1272   10}\label{fig:randIND10}
  1272 \end{figure}
  1273 \end{figure}
  1273 
  1274 
  1274 
  1275 
  1275 
  1276 
  1276 \begin{figure}
  1277 \begin{figure}
  1277 \vspace*{-1.5cm}
  1278 \vspace*{-1.5cm}
  1278 \hspace*{-1.5cm}
  1279 \hspace*{-1.5cm}
  1279 \begin{subfigure}[b]{0.55\textwidth}
  1280 \begin{subfigure}[b]{0.55\textwidth}
  1280 \begin{center}
  1281 \begin{center}
  1281 \begin{tikzpicture}
  1282 \begin{tikzpicture}
  1282 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1283 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1283 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1284 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1284   west},scaled x ticks = false,x tick label style={/pgf/number
  1285   west},scaled x ticks = false,x tick label style={/pgf/number
  1285   format/1000 sep = \space}]
  1286   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1287   format/1000 sep = \kern 0.08em}]
  1286 %\addplot+[only marks] table {proteinsOrig.txt};
  1288 %\addplot+[only marks] table {proteinsOrig.txt};
  1287 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
  1289 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
  1288 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1290 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1289         {randGraph/ind/vf2ppInd35_0.05.txt};
  1291         {randGraph/ind/vf2ppInd35_0.05.txt};
  1290 \end{axis}
  1292 \end{axis}
  1292 \end{center}
  1294 \end{center}
  1293      \end{subfigure}
  1295      \end{subfigure}
  1294      \begin{subfigure}[b]{0.55\textwidth}
  1296      \begin{subfigure}[b]{0.55\textwidth}
  1295 \begin{center}
  1297 \begin{center}
  1296 \begin{tikzpicture}
  1298 \begin{tikzpicture}
  1297 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1299 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1298 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1300 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1299   west},scaled x ticks = false,x tick label style={/pgf/number
  1301   west},scaled x ticks = false,x tick label style={/pgf/number
  1300   format/1000 sep = \space}]
  1302   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1303   format/1000 sep = \kern 0.08em}]
  1301 %\addplot+[only marks] table {proteinsOrig.txt};
  1304 %\addplot+[only marks] table {proteinsOrig.txt};
  1302 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
  1305 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
  1303 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1306 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1304         {randGraph/ind/vf2ppInd35_0.1.txt};
  1307         {randGraph/ind/vf2ppInd35_0.1.txt};
  1305 \end{axis}
  1308 \end{axis}
  1308 \end{subfigure}
  1311 \end{subfigure}
  1309 \hspace*{-1.5cm}
  1312 \hspace*{-1.5cm}
  1310 \begin{subfigure}[b]{0.55\textwidth}
  1313 \begin{subfigure}[b]{0.55\textwidth}
  1311 \begin{center}
  1314 \begin{center}
  1312 \begin{tikzpicture}
  1315 \begin{tikzpicture}
  1313 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1316 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1314 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1317 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1315   west},scaled x ticks = false,x tick label style={/pgf/number
  1318   west},scaled x ticks = false,x tick label style={/pgf/number
  1316   format/1000 sep = \space}]
  1319   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1320   format/1000 sep = \kern 0.08em}]
  1317 %\addplot+[only marks] table {proteinsOrig.txt};
  1321 %\addplot+[only marks] table {proteinsOrig.txt};
  1318 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
  1322 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
  1319 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1323 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1320         {randGraph/ind/vf2ppInd35_0.3.txt};
  1324         {randGraph/ind/vf2ppInd35_0.3.txt};
  1321 \end{axis}
  1325 \end{axis}
  1323 \end{center}
  1327 \end{center}
  1324      \end{subfigure}
  1328      \end{subfigure}
  1325      \begin{subfigure}[b]{0.55\textwidth}
  1329      \begin{subfigure}[b]{0.55\textwidth}
  1326 \begin{center}
  1330 \begin{center}
  1327 \begin{tikzpicture}
  1331 \begin{tikzpicture}
  1328 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1332 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1329 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1333 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1330   west},scaled x ticks = false,x tick label style={/pgf/number
  1334   west},scaled x ticks = false,x tick label style={/pgf/number
  1331   format/1000 sep = \space}]
  1335   format/1000 sep = \kern 0.08em},y tick label style={/pgf/number
       
  1336   format/1000 sep = \kern 0.08em}]
  1332 %\addplot+[only marks] table {proteinsOrig.txt};
  1337 %\addplot+[only marks] table {proteinsOrig.txt};
  1333 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
  1338 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
  1334 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1339 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1335         {randGraph/ind/vf2ppInd35_0.8.txt};
  1340         {randGraph/ind/vf2ppInd35_0.8.txt};
  1336 \end{axis}
  1341 \end{axis}
  1337 \end{tikzpicture}
  1342 \end{tikzpicture}
  1338 \end{center}
  1343 \end{center}
  1339 \end{subfigure}
  1344 \end{subfigure}
  1340 \vspace*{-0.8cm}
  1345 \vspace*{-0.8cm}
  1341 \caption{IND on graphs having an average degree of
  1346 \caption{Induced subgraph isomorphism problem on random graphs having an average degree of
  1342   35.}\label{fig:randIND35}
  1347   35}\label{fig:randIND35}
  1343 \end{figure}
  1348 \end{figure}
  1344 
  1349 
  1345 
  1350 
  1346 Based on these experiments, VF2++ is faster than VF2 Plus and able to
  1351 Based on these experiments, VF2++ is faster than VF2~Plus and able to
  1347 handle really large graphs in milliseconds. Note that when $IND$ was
  1352 handle really large graphs in milliseconds. Note that when $IND$ was
  1348 considered and the small graphs had proportionally few nodes ($\rho =
  1353 considered and the graph to be embedded had proportionally few nodes ($\rho =
  1349 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
  1354 0.05$, or $\rho = 0.1$), then VF2~Plus produced some inefficient node
  1350 orders (e.g. see the $\delta=10$ case on
  1355 orders (e.g. see the $\delta=10$ case on
  1351 Figure~\ref{fig:randIND10}). If these instances had been excluded, the
  1356 Figure~\ref{fig:randIND10}). If these instances had been excluded, the
  1352 charts would have seemed to be similar to the other ones.
  1357 charts would have looked similarly to the other ones.
  1353 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
  1358 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
  1354 Plus slow slightly down, but remain practically usable even on graphs
  1359 Plus slow slightly down, but remain practically usable even on graphs
  1355 having 10 000 nodes.
  1360 having 10 000 nodes.
  1356 
  1361 
  1357 
  1362 
  1358 
  1363 
  1359 
  1364 
  1360 
  1365 
  1361 \section{Conclusion}
  1366 \section{Conclusion}
  1362 This paper presented VF2++, a new graph matching algorithm based on VF2, called VF2++, and analyzed it from a practical viewpoint.
  1367 This paper presented VF2++, a new graph matching algorithm based on VF2, and analysed it from a practical viewpoint.
  1363 
  1368 
  1364 Recognizing the importance of the node order and determining an
  1369 Recognizing the importance of the node order and determining an
  1365 efficient one, VF2++ is able to match graphs of thousands of nodes in
  1370 efficient one, VF2++ is able to match graphs of thousands of nodes in
  1366 near practically linear time including preprocessing. In addition to
  1371 near practically linear time including preprocessing. In addition to
  1367 the proper order, VF2++ uses more efficient consistency and cutting
  1372 the proper order, VF2++ uses more efficient cutting
  1368 rules which are easy to compute and make the algorithm able to prune
  1373 rules which are easy to compute and make the algorithm able to prune
  1369 most of the unfruitful branches without going astray.
  1374 most of the unfruitful branches without going astray.
  1370 
  1375 
  1371 In order to show the efficiency of the new method, it has been
  1376 In order to show the efficiency of the new method, it has been
  1372 compared to VF2 Plus\cite{VF2Plus}, which is the best contemporary algorithm.
  1377 compared to VF2~Plus~\cite{VF2Plus}, which is the best contemporary algorithm.
  1373 .
  1378 
  1374 
  1379 The experiments show that VF2++ consistently outperforms VF2~Plus on
  1375 The experiments show that VF2++ consistently outperforms VF2 Plus on
       
  1376 biological graphs. It seems to be asymptotically faster on protein and
  1380 biological graphs. It seems to be asymptotically faster on protein and
  1377 on contact map graphs in the case of induced subgraph isomorphism,
  1381 on contact map graphs in the case of induced subgraph isomorphism problem,
  1378 while in the case of graph isomorphism, it has definitely better
  1382 while in the case of graph isomorphism problem, it has definitely better
  1379 asymptotic behaviour on protein graphs.
  1383 asymptotic behaviour on protein graphs.
  1380 
  1384 
  1381 Regarding random sparse graphs, not only has VF2++ proved itself to be
  1385 Regarding random sparse graphs, not only has VF2++ proved itself to be
  1382 faster than VF2 Plus, but it also has a practically linear behaviour both
  1386 faster than VF2~Plus, but it also has a practically linear behaviour both
  1383 in the case of induced subgraph- and graph isomorphism.
  1387 in the case of induced subgraph and graph isomorphism problems.
  1384 
  1388 
  1385 %%%%%%%%%%%%%%%%
  1389 %%%%%%%%%%%%%%%%
  1386 \section*{Acknowledgement} \label{sec:ack}
  1390 \section*{Acknowledgement} \label{sec:ack}
  1387 %%%%%%%%%%%%%%%%
  1391 %%%%%%%%%%%%%%%%
  1388 This research project was initiated and sponsored by QuantumBio
  1392 This research project was initiated and sponsored by QuantumBio
  1389 Inc.\cite{QUANTUMBIO}.
  1393 Inc.~\cite{QUANTUMBIO}.
  1390 
  1394 
  1391 The authors were supported by the Hungarian Scientific Research Fund -
  1395 The authors were supported by the Hungarian Scientific Research Fund -
  1392 OTKA, K109240 and by the J\'anos Bolyai Research Fellowship program of
  1396 OTKA, K109240 and by the J\'anos Bolyai Research Fellowship program of
  1393 the Hungarian Academy of Sciences.
  1397 the Hungarian Academy of Sciences.
  1394 
  1398