damecco.tex
changeset 2 20d1b0e5838f
parent 1 76dc74f824ba
child 3 550f81b2f81c
equal deleted inserted replaced
1:27ec99fc742d 2:3b7777a4db4b
    44 
    44 
    45 %% The lineno packages adds line numbers. Start line numbering with
    45 %% The lineno packages adds line numbers. Start line numbering with
    46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
    46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
    47 %% for the whole article with \linenumbers.
    47 %% for the whole article with \linenumbers.
    48 %% \usepackage{lineno}
    48 %% \usepackage{lineno}
       
    49 
       
    50 \usepackage{amsmath}
       
    51 %% \usepackage[pdftex]{graphicx}
       
    52 
       
    53 \usepackage{pgfplots}
       
    54 \pgfplotsset{width=9cm}
       
    55 \pgfplotsset{compat=1.8}
       
    56 
       
    57 \usepackage{caption}
       
    58 \usepackage{subcaption} 
       
    59 
       
    60 \usepackage{algorithm}
       
    61 \usepackage{algpseudocode}
       
    62 \usepackage{tikz}
       
    63 
       
    64 \usepackage{amsthm,amssymb}
       
    65 \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}
       
    66 
       
    67 \newtheorem{theorem}{Theorem}[subsection]
       
    68 \newtheorem{corollary}{Corollary}[theorem]
       
    69 \newtheorem{claim}[theorem]{Claim}
       
    70 
       
    71 \newtheorem{definition}{Definition}[subsection]
       
    72 \newtheorem{notation}{Notation}[subsection]
       
    73 \newtheorem{example}{Example}[subsection]
       
    74 \usetikzlibrary{decorations.markings}
       
    75 \let\oldproofname=\proofname
       
    76 %% \renewcommand{\proofname}{\rm\bf{Proof:}}
    49 
    77 
    50 \journal{Discrete Applied Mathematics}
    78 \journal{Discrete Applied Mathematics}
    51 
    79 
    52 \begin{document}
    80 \begin{document}
    53 
    81 
   145 \end{frontmatter}
   173 \end{frontmatter}
   146 
   174 
   147 %% \linenumbers
   175 %% \linenumbers
   148 
   176 
   149 %% main text
   177 %% main text
   150 \section{}
   178 \section{Introduction}
   151 \label{}
   179 \label{sec:intro}
       
   180 
       
   181 In the last decades, combinatorial structures, and especially graphs have been considered with ever increasing interest, and applied to the solution of several new and revised questions.
       
   182 The expressiveness, the simplicity and the studiedness of graphs make them practical for modelling and appear constantly in several seemingly independent fields.
       
   183 Bioinformatics and chemistry are amongst the most relevant and most important fields.
       
   184 
       
   185 Complex biological systems arise from the interaction and cooperation of plenty of molecular components. Getting acquainted with such systems at the molecular level has primary importance, since protein-protein interaction, DNA-protein interaction, metabolic interaction, transcription factor binding, neuronal networks, and hormone signaling networks can be understood only this way.
       
   186 
       
   187 For instance, a molecular structure can be considered as a graph, whose nodes correspond to atoms and whose edges to chemical bonds. The secondary structure of a protein can also be represented as a graph, where nodes are associated with aminoacids and the edges with hydrogen bonds. The nodes are often whole molecular components and the edges represent some relationships among them.
       
   188 The similarity and dissimilarity of objects corresponding to nodes are incorporated to the model by \emph{node labels}.
       
   189 Many other chemical and biological structures can easily be modeled in a similar way. Understanding such networks basically requires finding specific subgraphs, which can not avoid the application of graph matching algorithms.
       
   190 
       
   191 Finally, let some of the other real-world fields related to some variants of graph matching be briefly mentioned: pattern recognition and machine vision \cite{HorstBunkeApplications}, symbol recognition \cite{CordellaVentoSymbolRecognition}, face identification \cite{JianzhuangYongFaceIdentification}.
       
   192 \\
       
   193 
       
   194 Subgraph and induced subgraph matching problems are known to be NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is one of the few problems in NP neither known to be in P nor NP-Complete. Although polynomial time isomorphism algorithms are known for various graph classes, like trees and planar graphs\cite{PlanarGraphIso}, bounded valence graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso} or permutation graphs\cite{PermGraphIso}.
       
   195 
       
   196 In the following, some algorithms based on other approaches are summarized, which do not need any restrictions on the graphs. However, an overall polynomial behaviour is not expectable from such an alternative, it may often have good performance, even on a graph class for which polynomial algorithm is known. Note that this summary containing only exact matching algorithms is far not complete, neither does it cover all the recent algorithms.
       
   197 
       
   198 The first practically usable approach was due to \textbf{Ullmann}\cite{Ullmann} which is a commonly used depth-first search based algorithm with a complex heuristic for reducing the number of visited states. A major problem is its $\Theta(n^3)$ space complexity, which makes it impractical in the case of big sparse graphs.
       
   199 
       
   200 In a recent paper, \textbf{Ullmann}\cite{UllmannBit} presents an improved version of this algorithm based on a bit-vector solution for the binary Constraint Satisfaction Problem.
       
   201 
       
   202 The \textbf{Nauty} algorithm\cite{Nauty} transforms the two graphs to a canonical form before starting to check for the isomorphism. It has been considered as one of the fastest graph isomorphism algorithms, although graph categories were shown in which it takes exponentially many steps. This algorithm handles only the graph isomorphism problem.
       
   203 
       
   204 The \textbf{LAD} algorithm\cite{Lad} uses a depth-first search strategy and formulates the matching as a Constraint Satisfaction Problem to prune the search tree. The constraints are that the mapping has to be injective and edge-preserving, hence it is possible to handle new matching types as well.
       
   205 
       
   206 The \textbf{RI} algorithm\cite{RI} and its variations are based on a state space representation. After reordering the nodes of the graphs, it uses some fast executable heuristic checks without using any complex pruning rules. It seems to run really efficiently on graphs coming from biology, and won the International Contest on Pattern Search in Biological Databases\cite{Content}. 
       
   207 
       
   208 The currently most commonly used algorithm is the \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was designed for solving pattern matching and computer vision problems, and has been one of the best overall algorithms for more than a decade. Although, it can't be up to new specialized algorithms, it is still widely used due to its simplicity and space efficiency. VF2 uses a state space representation and checks some conditions in each state to prune the search tree.
       
   209 
       
   210 Our first graph matching algorithm was the first version of VF2 which recognizes the significance of the node ordering, more opportunities to increase the cutting efficiency and reduce its computational complexity. This project was initiated and sponsored by QuantumBio Inc.\cite{QUANTUMBIO} and the implementation --- along with a source code --- has been published as a part of LEMON\cite{LEMON} open source graph library.
       
   211 
       
   212 This thesis introduces \textbf{VF2++}, a new further improved algorithm for the graph and (induced)subgraph isomorphism problem, which uses efficient cutting rules and determines a node order in which VF2 runs significantly faster on practical inputs.
       
   213 
       
   214 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has been published. It is considered to be as efficient as the RI algorithm and has a strictly better behavior on large graphs.  The main idea of VF2 Plus is to precompute a heuristic node order of the small graph, in which the VF2 works more efficiently.\newline
       
   215 \newline
       
   216 
       
   217 \section{Problem Statement}
       
   218 This section provides a detailed description of the problems to be solved.
       
   219 \subsection{Definitions}
       
   220 
       
   221 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs.
       
   222 \begin{definition}\label{sec:ismorphic}
       
   223 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M: V_{small} \longrightarrow V_{large}$ bijection, for which the following is true:
       
   224 \begin{center}
       
   225 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (M(u),M(v))\in{E_{large}}$
       
   226 \end{center}
       
   227 \end{definition}
       
   228 For the sake of simplicity in this paper subgraphs and induced subgraphs are defined in a more general way than usual:
       
   229 \begin{definition}
       
   230 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I: V_{small}\longrightarrow V_{large}$ injection, for which the following is true:
       
   231 \begin{center}
       
   232 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$
       
   233 \end{center}
       
   234 \end{definition}
       
   235 
       
   236 \begin{definition} 
       
   237 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists I: V_{small}\longrightarrow V_{large}$ injection, for which the following is true:
       
   238 \begin{center}
       
   239 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (I(u),I(v))\in E_{large}$
       
   240 \end{center}
       
   241 \end{definition}
       
   242 
       
   243 \begin{definition}
       
   244 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node label function}, where K is an arbitrary set. The elements in K are the \textbf{node labels}. Two nodes, u and v are said to be \textbf{equivalent}, if $lab(u)=lab(v)$.
       
   245 \end{definition}
       
   246 
       
   247 When node labels are also given, the matched nodes must have the same labels.
       
   248 For example, the node labeled isomorphism is phrased by
       
   249 \begin{definition}
       
   250 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label function lab} if $\exists M: V_{small} \longrightarrow V_{large}$ bijection, for which the following is true:
       
   251 \begin{center}
       
   252 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (M(u),M(v))\in{E_{large}})$
       
   253  and $(\forall u\in{V_{small}} : lab(u)=lab(M(u)))$
       
   254 \end{center}
       
   255 \end{definition}
       
   256 
       
   257 The other two definitions can be extended in the same way.
       
   258 
       
   259 Note that edge label function can be defined similarly to node label function, and all the definitions can be extended with additional conditions, but it is out of the scope of this work.
       
   260 
       
   261 The equivalence of two nodes is usually defined by another relation, $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the definition given above if R is an equivalence relation, which does not mean restriction in biological and chemical applications.
       
   262 
       
   263 \subsection{Common problems}\label{sec:CommProb}
       
   264 
       
   265 The focus of this paper is on two extensively studied topics, the subgraph isomorphism and its variations. However, the following problems also appear in many applications.
       
   266 
       
   267 The \textbf{subgraph matching problem} is the following: is $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node label?
       
   268 
       
   269 The \textbf{induced subgraph matching problem} asks the same about the existence of an induced subgraph.
       
   270 
       
   271 The \textbf{graph isomorphism problem} can be defined as induced subgraph matching problem where the sizes of the two graphs are equal.
       
   272 
       
   273 In addition to existence, it may be needed to show such a subgraph, or it may be necessary to list all of them.
       
   274 
       
   275 It should be noted that some authors misleadingly refer to the term \emph{subgraph isomorphism problem} as an \emph{induced subgraph isomorphism problem}.
       
   276 
       
   277 The following sections give the descriptions of VF2, VF2++, VF2 Plus and a particular comparison.
       
   278 
       
   279 \section{The VF2 Algorithm}
       
   280 This algorithm is the basis of both the VF2++ and the VF2 Plus.
       
   281 VF2 is able to handle all the variations mentioned in \textbf{Section \ref{sec:CommProb})}.
       
   282 Although it can also handle directed graphs, for the sake of simplicity, only the undirected case will be discussed.
       
   283 
       
   284 
       
   285 \subsection{Common notations}
       
   286 \indent
       
   287 Assume $G_{small}$ is searched in $G_{large}$.
       
   288 The following definitions and notations will be used throughout the whole paper.
       
   289 \begin{definition}
       
   290 A set $M\subseteq V_{small}\times V_{large}$ is called \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears in more than one pair in M.
       
   291 That is, M uniquely associates some of the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice versa.
       
   292 \end{definition}
       
   293 
       
   294 \begin{definition}
       
   295 Mapping M \textbf{covers} a node v, if there exists a pair in M, which contains v.
       
   296 \end{definition}
       
   297 
       
   298 \begin{definition}
       
   299 A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the nodes in $V_{small}$.
       
   300 \end{definition}
       
   301 
       
   302 \begin{notation}
       
   303 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$.
       
   304 \end{notation}
       
   305 
       
   306 \begin{notation}
       
   307 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$ and $v\in V_{small}\cup V_{large}$.
       
   308 \end{notation}
       
   309 
       
   310 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique
       
   311 
       
   312 The definitions of the isomorphism types can be rephrased on the existence of a special whole mapping $M$, since it represents a bijection. For example 
       
   313 \begin{center}
       
   314 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (Pair(M,u),Pair(M,v))\in E_{large}$.
       
   315 \end{center}
       
   316 
       
   317 \begin{definition}
       
   318 A set of whole mappings is called \textbf{problem type}.
       
   319 \end{definition}
       
   320 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type which can be substituted by any problem type.
       
   321 
       
   322 A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using this notations, VF2 searches a whole mapping $W$ of type $PT$.
       
   323 
       
   324 For example the problem type of graph isomorphism problem is the following.
       
   325 A whole mapping $W$ is in $\mathbf{ISO}$, iff the bijection represented by $W$ satisfies \textbf{Definition \ref{sec:ismorphic})}.
       
   326 The subgraph- and induced subgraph matching problems can be formalized in a similar way. Let their problem types be denoted as $\mathbf{SUB}$ and $\mathbf{IND}$.
       
   327 
       
   328 \begin{definition}
       
   329 \label{expPT}
       
   330 $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of $G_{small}$ has to be mapped to an edge of $G_{large}$ for each mapping in $PT$.
       
   331 \end{definition}
       
   332 
       
   333 Note that $ISO$, $SUB$ and $IND$ are expanding problem types.
       
   334 
       
   335 This paper deals with the three problem types mentioned above only, but
       
   336 the following generic definitions make it possible to handle other types as well.
       
   337 Although it may be challenging to find a proper consistency function and an efficient
       
   338 cutting function.
       
   339 
       
   340 \begin{definition}
       
   341 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a \textbf{consistency function by } $\mathbf{PT}$, if the following holds. If there exists whole mapping $W$ of type PT for which $M\subseteq W$, then $Cons_{PT}(M)$ is true.
       
   342 \end{definition}
       
   343 
       
   344 \begin{definition} 
       
   345 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a \textbf{cutting function by } $\mathbf{PT}$, if the following holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a whole mapping W of type PT.
       
   346 \end{definition}
       
   347 
       
   348 \begin{definition}
       
   349 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if $Cons_{PT}(M)$ is true. 
       
   350 \end{definition}
       
   351 
       
   352 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
       
   353 \begin{notation}
       
   354 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping.
       
   355 \end{notation}
       
   356 
       
   357 $Cons_{PT}$ will be used to check the consistency of the already covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if no whole consistent mapping can contain the current mapping.
       
   358 
       
   359 \subsection{Overview of the algorithm}
       
   360 VF2 uses a state space representation of mappings, $Cons_{PT}$ for excluding inconsistency with the problem type and $Cut_{PT}$ for pruning the search tree.
       
   361 Each state $s$ of the matching process can be associated with a mapping $M(s)$.
       
   362 
       
   363 \textbf{Algorithm~\ref{alg:VF2Pseu})} is a high level description of the VF2 matching algorithm.
       
   364 
       
   365 
       
   366 \begin{algorithm}
       
   367 \algtext*{EndIf}%ne nyomtasson end if-et
       
   368 \algtext*{EndFor}%ne nyomtasson ..
       
   369 \algtext*{EndProcedure}%ne nyomtasson ..
       
   370 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
       
   371 \begin{algorithmic}[1]
       
   372 
       
   373 \Procedure{VF2}{State $s$, ProblemType $PT$}
       
   374   \If{$M(s$) covers $V_{small}$}
       
   375     \State Output($M(s)$)
       
   376   \Else
       
   377   
       
   378   \State Compute the set $P(s)$ of the pairs candidate for inclusion in $M(s)$
       
   379   \ForAll{$p\in{P(s)}$}
       
   380     \If{Cons$_{PT}$($p, M(s)$) $\wedge$ $\neg$Cut$_{PT}$($p, M(s)$)}
       
   381       \State Compute the nascent state $\tilde{s}$ by adding $p$ to $M(s)$
       
   382       \State \textbf{call} VF2($\tilde{s}$, $PT$)
       
   383     \EndIf
       
   384   \EndFor
       
   385   \EndIf
       
   386 \EndProcedure
       
   387 \end{algorithmic}
       
   388 \end{algorithm}
       
   389 
       
   390 
       
   391 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it starts with an empty mapping.
       
   392 
       
   393 For each state $s$, the algorithm computes $P(s)$, the set of candidate node pairs for adding to the current state $s$.
       
   394 
       
   395 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup \{p\}$ is computed, and the whole process is recursively applied to $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it can be proved that $s$ can not be extended to a whole mapping.
       
   396 
       
   397 In order to make sure of the correctness see
       
   398 \begin{claim}
       
   399 Through consistent mappings, only consistent whole mappings can be reached, and all of the whole mappings are reachable through consistent mappings.
       
   400 \end{claim}
       
   401 
       
   402 Note that a state may be reached in many different ways, since the order of insertions into M does not influence the nascent mapping. In fact, the number of different ways which lead to the same state can be exponentially large. If $G_{small}$ and $G_{large}$ are circles with n nodes and n different node labels, there exists exactly one graph isomorphism between them, but it will be reached in $n!$ different ways.
       
   403 
       
   404 However, one may observe
       
   405 
       
   406 \begin{claim}
       
   407 \label{claim:claimTotOrd}
       
   408 Let $\prec$ an arbitrary total ordering relation on $V_{small}$.
       
   409 If the algorithm ignores each $p=(u,v) \in P(s)$, for which
       
   410 \begin{center}
       
   411 $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$,
       
   412 \end{center}
       
   413 then no state can be reached more than ones and each state associated with a whole mapping remains reachable. 
       
   414 \end{claim}
       
   415 
       
   416 Note that the cornerstone of the improvements to VF2 is a proper choice of a total ordering.
       
   417 
       
   418 \subsection{The candidate set P(s)}
       
   419 \label{candidateComputingVF2}
       
   420 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.
       
   421 Suppose that $PT$ is an expanding problem type, see \textbf{Definition~\ref{expPT})}.
       
   422 
       
   423 \begin{notation}
       
   424 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}} \wedge \tilde{u}$ is covered by $M(s)\}$, and \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not covered by $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}} \wedge \tilde{v}$ is covered by $M(s)\}$
       
   425 \end{notation}
       
   426 
       
   427 The set $P(s)$ includes the pairs of uncovered neighbours of covered nodes and if there is not such a node pair, all the pairs containing two uncovered nodes are added. Formally, let
       
   428 \[
       
   429  P(s)\!=\!
       
   430   \begin{cases} 
       
   431    T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if } T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq \emptyset,\\
       
   432    (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s)) &\hspace{-0.15cm}otherwise.
       
   433   \end{cases}
       
   434 \]
       
   435 
       
   436 \subsection{Consistency}
       
   437 This section defines the consistency functions for the different problem types mentioned in \textbf{Section \ref{sec:CommProb})}.
       
   438 \begin{notation}
       
   439 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} : (u,\tilde{u})\in E_{small}\}$\\
       
   440 Let $\mathbf{\Gamma_{large} (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$
       
   441 \end{notation}
       
   442 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$,
       
   443 $s$ is a state of the matching procedure,
       
   444 $M(s)$ is consistent mapping by $PT$ and $lab(u)=lab(v)$.
       
   445 $Cons_{PT}(p,M(s))$ checks whether including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$.
       
   446 
       
   447 \subsubsection{Induced subgraph isomorphism}
       
   448 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small} \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline
       
   449 The following formulation gives an efficient way of calculating $Cons_{IND}$.
       
   450 \begin{claim}
       
   451 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v) \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge 
       
   452 (\forall \tilde{u}\in \Gamma_{small}(u) \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a consistency function in the case of $IND$.
       
   453 \end{claim}
       
   454 
       
   455 \subsubsection{Graph isomorphism}
       
   456 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$ $\Leftrightarrow$  $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$.
       
   457 \begin{claim}
       
   458 $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and only if it is a consistency function by $IND$.
       
   459 \end{claim}
       
   460 \subsubsection{Subgraph isomorphism}
       
   461 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small} \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.
       
   462 \newline
       
   463 The following formulation gives an efficient way of calculating $Cons_{SUB}$.
       
   464 \begin{claim}
       
   465 $Cons_{SUB}((u,v),M(s)):=
       
   466 (\forall \tilde{u}\in \Gamma_{small}(u) \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a consistency function by $SUB$.
       
   467 \end{claim}
       
   468 
       
   469 \subsection{Cutting rules}
       
   470 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can be true only if it is impossible to extended $M(s)\cup \{p\}$ to a whole mapping.
       
   471 \begin{notation}
       
   472 
       
   473 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash M_{small}(s))\backslash T_{small}(s)$, and \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash M_{large}(s))\backslash T_{large}(s)$.
       
   474 \end{notation}
       
   475 \subsubsection{Induced subgraph isomorphism}
       
   476 \begin{claim}
       
   477 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is a cutting function by $IND$.
       
   478 \end{claim}
       
   479 \subsubsection{Graph isomorphism}
       
   480 Note that the cutting function of induced subgraph isomorphism defined above is a cutting function by $ISO$, too, however it is less efficient than the following while their computational complexity is the same.
       
   481 \begin{claim}
       
   482 $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap  \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is a cutting function by $ISO$.
       
   483 \end{claim}
       
   484 
       
   485 \subsubsection{Subgraph isomorphism}
       
   486 \begin{claim}
       
   487 $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by $SUB$.
       
   488 \end{claim}
       
   489 Note that there is a significant difference between induced and non-induced subgraph isomorphism:
       
   490 
       
   491 \begin{claim}
       
   492 \label{claimSUB}
       
   493 $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is \textbf{not} a cutting function by $SUB$.
       
   494 \end{claim}
       
   495 \begin{proof}$ $\\
       
   496 \vspace*{-0.5cm}
       
   497 
       
   498 \begin{figure}
       
   499 \begin{center}
       
   500 \begin{tikzpicture}
       
   501   [scale=.8,auto=left,every node/.style={circle,fill=black!15}]
       
   502   \node[rectangle,fill=black!15] at (4,6) {$G_{small}$};
       
   503   \node (u4) at (2.5,10)  {$u_4$};
       
   504   \node (u3) at (5.5,10) {$u_3$};
       
   505   \node (u1) at (2.5,7)  {$u_1$};
       
   506   \node (u2) at (5.5,7) {$u_2$};
       
   507   
       
   508   \node[rectangle,fill=black!30] at (13.5,6) {$G_{large}$};
       
   509   \node[fill=black!30] (v4) at (12,10)  {$v_4$};
       
   510   \node[fill=black!30] (v3) at (15,10) {$v_3$};
       
   511   \node[fill=black!30] (v1) at (12,7)  {$v_1$};
       
   512   \node[fill=black!30] (v2) at (15,7) {$v_2$};
       
   513   
       
   514 
       
   515   \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1}
       
   516     \draw (\from) -- (\to);
       
   517   \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3}
       
   518     \draw (\from) -- (\to);
       
   519 %    \draw[dashed] (\from) -- (\to);
       
   520 \end{tikzpicture}
       
   521 \caption{Graphs for the proof of \textbf{Claim \ref{claimSUB}}} \label{fig:proofSUB}
       
   522 \end{center}
       
   523 \end{figure}
       
   524 Let the two graphs of \textbf{Figure \ref{fig:proofSUB})} be the input graphs.
       
   525 Suppose the total ordering relation is $u_1 \prec u_2 \prec u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add $(u_2,v_2)\in P(s)$.\newline
       
   526 $Cons_{SUB}((u_2,v_2),M(s))=true$, so $M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function $Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the tree.\newline
       
   527 On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true, since\\$0=|\Gamma_{large}(v_2)\cap \tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap \tilde{T}_{small}(s)|=1$ is true, but still the tree can not be pruned, because otherwise the $\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found.
       
   528 \end{proof}
       
   529 
       
   530 \newpage
       
   531 \section{The VF2++ Algorithm}
       
   532 Although any total ordering relation makes the search space of VF2 a tree, its
       
   533 choice turns out to dramatically influence the number of visited states. The goal is to determine an efficient one as quickly as possible.
       
   534 
       
   535 The main reason for VF2++' superiority over VF2 is twofold. Firstly, taking into account the structure and the node labeling of the graph, VF2++ determines a state order in which most of the unfruitful branches of the search space can be pruned immediately. Secondly, introducing more efficient --- nevertheless still easier to compute --- cutting rules reduces the chance of going astray even further.
       
   536 
       
   537 In addition to the usual subgraph isomorphism, specialized versions for induced subgraph isomorphism and for graph isomorphism have been designed. VF2++ has gained a runtime improvement of one order of magnitude respecting induced subgraph isomorphism and a better asymptotical behaviour in the case of graph isomorphism problem.
       
   538 
       
   539 Note that a weaker version of the cutting rules and the more efficient candidate
       
   540 set calculating were described in \cite{VF2Plus}, too.
       
   541 
       
   542 It should be noted that all the methods described in this section are extendable to handle directed graphs and edge labels as well.
       
   543 
       
   544 The basic ideas and the detailed description of VF2++ are provided in the following.
       
   545 
       
   546 \subsection{Preparations}
       
   547 \begin{claim}
       
   548 \label{claim:claimCoverFromLeft}
       
   549 The total ordering relation uniquely determines a node order, in which the nodes of $V_{small}$ will be covered by VF2. From the point of view of the matching procedure, this means, that always the same node of $G_{small}$ will be covered on the d-th level.
       
   550 \end{claim}
       
   551 \begin{proof}
       
   552 In order to make the search space a tree, the pairs in $\{(u,v)\in P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$.
       
   553 \newline
       
   554 Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} : \hat{u}\prec u\}$
       
   555 \newline
       
   556 The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} : \forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form $\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious, that only $\tilde{u}$ can be covered in $V_{small}$. Actually, $\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in $V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and $T_{small}(s)$ depends only on the covered nodes of $G_{small}$. 
       
   557 \newline
       
   558 Simple induction on $d$ shows that the set of covered nodes of $G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if $d$ is given.
       
   559 \end{proof}
       
   560 
       
   561 \begin{definition}
       
   562 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of $V_{small}$ is \textbf{matching order}, if exists $\prec$ total ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$.
       
   563 \end{definition}
       
   564 
       
   565 \begin{claim}\label{claim:MOclaim}
       
   566 A total ordering is matching order, iff the nodes of every component form an interval in the node sequence, and every node connects to a previous node in its component except the first node of the component. The order of the components is arbitrary.
       
   567 \\Formally spoken, an order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of $V_{small}$ is matching order $\Leftrightarrow$ $\forall G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in \{1,..,|V_{small}|\}$\newline
       
   568 \end{claim}
       
   569 \begin{proof}
       
   570 Suppose a matching order is given. It has to be shown that the node sequence has a structure described above.\\
       
   571 Let $G'_{small}=(V'_{small},E'_{small})$ be an arbitrary component and $i$ an arbitrary index. 
       
   572 \newline
       
   573 $(\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in V'_{small})\Rightarrow u_{\sigma(i)}$ is not the first covered node in $G_{small}\ \Rightarrow u_{\sigma(i)}$ is connected to a covered node $u_{\sigma(k)}$ where $k<i$, since $u_{\sigma(i)}\in T_{small}(s)$ for some $s$ and $T_{small}(s)\subseteq{V'_{small}}$ contains only nodes connected to at least one covered node. It's easy to see, that $\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}$, since $T_{small}(s)$ contains only nodes connected to some covered ones, while it is not empty, but if it were empty, then $u_{\sigma(k)}$ and $u_{\sigma(i)}$ were not in the same component.
       
   574 
       
   575 Now, let us show that if a node sequence has the special structure described above, it has to be matching order.\\ $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ is a total ordering, which determines the matching order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$.
       
   576 \end{proof}
       
   577 
       
   578 To summing up, a total ordering always uniquely determines a matching order, and every matching order can be determined by a total ordering, however, more than one different total orderings may determine the same matching order.
       
   579 \subsection{Idea behind the algorithm}
       
   580 The goal is to find a matching order in which the algorithm is able to recognize inconsistency or prune the infeasible branches on the highest levels and goes deep only if it is needed.
       
   581 
       
   582 \begin{notation}
       
   583 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the number of neighbours of u which are in H, where $u\in V_{small} $ and $H\subseteq V_{small}$.
       
   584 \end{notation}
       
   585 
       
   586 The principal question is the following. Suppose a state $s$ is given. For which node of $T_{small}(s)$ is the hardest to find a consistent pair in $G_{large}$? The more covered neighbours a node in $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has ---, the more rarely satisfiable consistency constraints for its pair are given.
       
   587 
       
   588 In biology, most of the graphs are sparse, thus several nodes in $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes reasonable to define a secondary and a tertiary order between them.
       
   589 The observation above proves itself to be as determining, that the secondary ordering prefers nodes with the most uncovered neighbours among which have the same $Conn_{M_{small}(s)}$ to increase $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible.
       
   590 The tertiary ordering prefers nodes having the rarest uncovered labels.
       
   591 
       
   592 Note that the secondary ordering is the same as the ordering by $deg$, which is a static data in front of the above used.
       
   593 
       
   594 These rules can easily result in a matching order which contains the nodes of a long path successively, whose nodes may have low $Conn$ and is easily matchable into $G_{large}$. To avoid that, a BFS order is used, which provides the shortest possible paths.
       
   595 \newline
       
   596 
       
   597 In the following, some examples on which the VF2 may be slow are described, although they are easily solvable by using a proper matching order.
       
   598 
       
   599 \begin{example}
       
   600 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways without node labels. Let $u\in V_{small}$ and $v\in V_{large}$.
       
   601 \newline
       
   602 $lab(u):=black$
       
   603 \newline
       
   604 $lab(v):=black$
       
   605 \newline
       
   606 $lab(\tilde{u}):=red \  \forall \tilde{u}\in (V_{small}\backslash \{u\})$
       
   607 \newline
       
   608 $lab(\tilde{v}):=red \  \forall \tilde{v}\in (V_{large}\backslash \{v\})$
       
   609 \newline
       
   610 
       
   611 Now, any mapping by the node label $lab$ must contain $(u,v)$, since $u$ is black and no node in $V_{large}$ has a black label except $v$. If unfortunately $u$ were the last node which will get covered, VF2 would check only in the last steps, whether $u$ can be matched to $v$.
       
   612 \newline
       
   613 However, had $u$ been the first matched node, u would have been matched immediately to v, so all the mappings would have been precluded in which node labels can not correspond.
       
   614 \end{example}
       
   615 
       
   616 \begin{example}
       
   617 Suppose there is no node label given, $G_{small}$ is a small graph and can not be mapped into $G_{large}$ and $u\in V_{small}$.
       
   618 \newline
       
   619 Let $G'_{small}:=(V_{small}\cup \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is, $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint from $G_{small}$ and one of its starting points is connected to $u\in V_{small}\}$.
       
   620 \newline
       
   621 Is there a subgraph of $G_{large}$, which is isomorph with $G'_{small}$?
       
   622 \newline
       
   623 If unfortunately the nodes of the path were the first $k$ nodes in the matching order, the algorithm would iterate through all the possible k long paths in $G_{large}$, and it would recognize that no path can be extended to $G'_{small}$.
       
   624 \newline
       
   625 However, had it started by the matching of $G_{small}$, it would not have matched any nodes of the path.
       
   626 \end{example}
       
   627 
       
   628 These examples may look artificial, but the same problems also appear in real-world examples, even though in a less obvious way.
       
   629 
       
   630 \subsection{Total ordering}
       
   631 Instead of the total ordering relation, the matching order will be searched directly.
       
   632 \begin{notation}
       
   633 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} : l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ , where $l$ is a label and $\mathcal{M}\subseteq V_{small}$.
       
   634 \end{notation}
       
   635 
       
   636 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
       
   637 \end{definition}
       
   638 
       
   639 \begin{algorithm}
       
   640 \algtext*{EndIf}%ne nyomtasson end if-et
       
   641 \algtext*{EndFor}%ne nyomtasson ..
       
   642 \algtext*{EndProcedure}%ne nyomtasson ..
       
   643 \algtext*{EndWhile}
       
   644 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
       
   645 \begin{algorithmic}[1]
       
   646 \Procedure{VF2++order}{}
       
   647   \State $\mathcal{M}$ := $\emptyset$ \Comment{matching order}
       
   648   \While{$V_{small}\backslash \mathcal{M} \neq\emptyset$}
       
   649   \State $r\in$ arg max$_{deg}$ (arg min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash \mathcal{M})$)\label{alg:findMin}
       
   650   \State Compute $T$, a BFS tree with root node $r$.
       
   651   \For{$d=0,1,...,depth(T)$}
       
   652   \State $V_d$:=nodes of the $d$-th level
       
   653   \State Process $V_d$ \Comment{See Algorithm \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2})}
       
   654   \EndFor
       
   655   \EndWhile
       
   656 \EndProcedure
       
   657 \end{algorithmic}
       
   658 \end{algorithm}
       
   659 
       
   660 \begin{algorithm}
       
   661 \algtext*{EndIf}%ne nyomtasson end if-et
       
   662 \algtext*{EndFor}%ne nyomtasson ..
       
   663 \algtext*{EndProcedure}%ne nyomtasson ..
       
   664 \algtext*{EndWhile}
       
   665 \caption{\hspace{.5cm}$A\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
       
   666 \begin{algorithmic}[1]
       
   667 \Procedure{VF2++ProcessLevel1}{$V_{d}$}
       
   668   \While{$V_d\neq\emptyset$}
       
   669   \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg max$_{Conn_{\mathcal{M}}}(V_{d})))$
       
   670   \State $V_d:=V_d\backslash m$
       
   671   \State Append node $m$ to the end of $\mathcal{M}$
       
   672   \State Refresh $F_\mathcal{M}$
       
   673   \EndWhile
       
   674 \EndProcedure
       
   675 \end{algorithmic}
       
   676 \end{algorithm}
       
   677 
       
   678 \begin{algorithm}
       
   679 \algtext*{EndIf}%ne nyomtasson end if-et
       
   680 \algtext*{EndFor}%ne nyomtasson ..
       
   681 \algtext*{EndProcedure}%ne nyomtasson ..
       
   682 \algtext*{EndWhile}
       
   683 \caption{\hspace{0.5cm}$Another\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess2}
       
   684 \begin{algorithmic}[1]
       
   685 \Procedure{VF2++ProcessLevel2}{$V_{d}$}
       
   686   \State Sort $V_d$ in descending lex. order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$
       
   687   \State Append the sorted $V_d$ to the end of $\mathcal{M}$
       
   688   \State Refresh $F_\mathcal{M}$
       
   689 \EndProcedure
       
   690 \end{algorithmic}
       
   691 \end{algorithm}
       
   692 \textbf{Algorithm~\ref{alg:VF2PPPseu})} is a high level description of the matching order procedure of VF2++. It computes a BFS tree for each component in ascending order of their rarest $lab$ and largest $deg$, whose root vertex is the component's minimal node. \textbf{Algorithm \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} are two different methods to process a level of the BFS tree.
       
   693 
       
   694 After sorting the nodes of the current level in descending lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$, \textbf{Algorithm \ref{alg:VF2PPProcess2})} appends them simultaneously to the matching order $\mathcal{M}$ and refreshes $F_\mathcal{M}$ just once, whereas \textbf{Algorithm \ref{alg:VF2PPProcess1})} appends the nodes separately to $\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it provides up-to-date label information and may result in a more efficient matching order.
       
   695 
       
   696 \textbf{Claim~\ref{claim:MOclaim})} shows that \textbf{Algorithm \ref{alg:VF2PPPseu})} provides a matching order.
       
   697 
       
   698 
       
   699 \subsection{Cutting rules}
       
   700 \label{VF2PPCuttingRules}
       
   701 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels.
       
   702 \begin{notation}
       
   703 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in V_{large}$ and $l$ is a label.
       
   704 \end{notation}
       
   705 
       
   706 \subsubsection{Induced subgraph isomorphism}
       
   707 \begin{claim}
       
   708 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND.
       
   709 \end{claim}
       
   710 \begin{proof}
       
   711 It has to be shown, that $LabCut_{IND}((u,v),M(s))=true\Rightarrow$ the mapping can not be extended to a whole mapping.\\
       
   712 $LabCut_{IND}((u,v),M(s))=true,$ iff the following holds. $\\\exists l: |\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$.
       
   713 
       
   714 Suppose that $|\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)|$. Each node of $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$ has to be matched to a node in $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$, so $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$ can not be smaller than $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$. That is why $M(s)$ can not be extended to a whole mapping.
       
   715 
       
   716 Otherwise $|\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$ has to be true. Similarly, each node of $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ has to be matched to a node in $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, i.e. $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ can not be smaller than $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, if $M(s)$ is extendible.
       
   717 \end{proof}
       
   718 The following claims can be proven similarly.
       
   719 \subsubsection{Graph isomorphism}
       
   720 \begin{claim}
       
   721 \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO.
       
   722 \end{claim}
       
   723 
       
   724 \subsubsection{Subgraph isomorphism}
       
   725 \begin{claim}
       
   726 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\] is a cutting function by SUB.
       
   727 \end{claim}
       
   728 
       
   729 
       
   730 
       
   731 \subsection{Implementation details}
       
   732 This section provides a detailed summary of an efficient implementation of VF2++.
       
   733 \subsubsection{Storing a mapping}
       
   734 After fixing an arbitrary node order ($u_0, u_1, .., u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store the current mapping in the following way.
       
   735 \[
       
   736  M[i] = 
       
   737   \begin{cases} 
       
   738    v & if\ (u_i,v)\ is\ in\ the\ mapping\\
       
   739    INVALID & if\ no\ node\ has\ been\ mapped\ to\ u_i.
       
   740   \end{cases}
       
   741 \]
       
   742 Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$ means "no node".
       
   743 \subsubsection{Avoiding the recurrence}
       
   744 Exploring the state space was described in a recursive fashion using sets (see \textbf{Algorithm~\ref{alg:VF2Pseu})}), which makes the algorithm easy to understand but does not show directly an efficient way of implementation. The following approach avoiding recursion and using lookup tables instead of sets is not only fast but has linear space complexity as well.
       
   745 
       
   746 The recursion of \textbf{Algorithm~\ref{alg:VF2Pseu})} can be realized as a while loop, which has a loop counter $depth$ denoting the all-time depth of the recursion.
       
   747 Fixing a matching order, let $M$ denote the array storing the all-time mapping.
       
   748 The initial state is associated with the empty mapping, which means that $\forall i: M[i]=INVALID$ and $depth=0$.
       
   749 In case of a recursive call, $depth$ has to be incremented, while in case of a return, it has to be decremented.
       
   750 Based on \textbf{Claim~\ref{claim:claimCoverFromLeft})}, $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before $depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and $\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes while the state is being processed, but the property is held before both stepping back to a predecessor state and exploring a successor state.
       
   751 
       
   752 The necessary part of the candidate set is easily maintainable or computable by following \textbf{Section~\ref{candidateComputingVF2})}. A much faster method has been designed for biological- and sparse graphs, see the next section for details.
       
   753 
       
   754 \subsubsection{Calculating the candidates for a node}
       
   755 Being aware of \textbf{Claim~\ref{claim:claimCoverFromLeft})}, the task is not to maintain the candidate set, but to generate the candidate nodes in $G_{large}$ for a given node $u\in V_{small}$.
       
   756 In case of an expanding problem type and $M$ mapping, if a node $v\in V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall u'\in V_{small} : (u,u')\in E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in E_{large}$. That is, each covered neighbour of $u$ has to be mapped to a covered neighbour of $v$.
       
   757 
       
   758 Having said that, an algorithm running in $\Theta(deg)$ time is describable if there exists a covered node in the component containing $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily --- such a node exists based on \textbf{Claim~\ref{claim:MOclaim})}. With all the candidates of $u$ being among the uncovered neighbours of $Pair(M,u')$, there are solely $deg(Pair(M,u'))$ nodes to check.
       
   759 
       
   760 An easy trick is to choose an $u'$, for which $|\{uncovered\ neighbours\ $ $of\ Pair(M,u')\}|$ is the smallest possible.
       
   761 
       
   762 Note that if $u$ is the first node of its component, then all the uncovered nodes of $G_{large}$ are candidates, so giving a sublinear method is impossible.
       
   763 
       
   764 
       
   765 \subsubsection{Determining the node order}
       
   766 This section describes how the node order preprocessing method of VF2++ can efficiently be implemented.
       
   767 
       
   768 For using lookup tables, the node labels are associated with the numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It enables $F_\mathcal{M}$ to be stored in an array, for which $F_\mathcal{M}[i]=F_\mathcal{M}(i)$ where $i=0,1,..,|K|-1$. At first, $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes in $V_{small}$ having label i, which is easy to compute in $\Theta(|V_{small}|)$ steps.
       
   769 
       
   770 $\mathcal{M}\subseteq V_{small}$ can be represented as an array of size $|V_{small}|$.
       
   771 
       
   772 The BFS tree is computed by using a FIFO data structure which is usually implemented as a linked list, but one can avoid it by using the array $\mathcal{M}$ itself. $\mathcal{M}$ contains all the nodes seen before, a pointer shows where the first node of the FIFO is, and another one shows where the next explored node has to be inserted. So the nodes of each level of the BFS tree can be processed by \textbf{Algorithm \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} in place by swapping nodes.
       
   773 
       
   774 After a node $u$ gets to the next place of the node order, $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is one less covered node in $V_{large}$ with label $lab(u)$, that is why min selection sort is preferred which gives the elements from left to right in descending order, see \textbf{Algorithm \ref{alg:VF2PPProcess1})}.
       
   775 
       
   776 Note that using a $\Theta(n^2)$ sort absolutely does not slow down the procedure on biological (and on sparse) graphs, since they have few nodes on a level. If a level had a large number of nodes, \textbf{Algorithm \ref{alg:VF2PPProcess2})} would seem to be a better choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not be immediately refreshed, so it is unable to provide up-to-date label information. 
       
   777 
       
   778 Note that the \textit{while loop} of \textbf{Algorithm \ref{alg:VF2PPPseu})} takes one iteration per graph component and the graphs in biology are mostly connected.
       
   779 \subsubsection{Cutting rules}
       
   780 In \textbf{Section \ref{VF2PPCuttingRules})}, the cutting rules were described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$ and $\tilde T_{large}$, which are dependent on the all-time mapping (i.e. on the all-time state). The aim is to check the labeled cutting rules of VF2++ in $\Theta(deg)$ time.
       
   781 
       
   782 Firstly, suppose that these four sets are given in such a way, that checking whether a node is in a certain set takes constant time, e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an initially zero integer lookup table of size $|K|$. After incrementing $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap T_{large}(s)$, the first part of the cutting rules is checkable in $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$ to zero takes $\Theta(deg)$ time again, which makes it possible to use the same table through the whole algorithm.
       
   783 The second part of the cutting rules can be verified using the same method with $\tilde T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and $T_{large}$. Thus, the overall complexity is $\Theta(deg)$.
       
   784 
       
   785 An other integer lookup table storing the number of covered neighbours of each node in $G_{large}$ gives all the information about the sets $T_{large}$ and $\tilde T_{large}$, which is maintainable in $\Theta(deg)$ time when a pair is added or substracted by incrementing or decrementing the proper indices. A further improvement is that the values of $L[lab(u')]$ in case of checking $u$ is dependent only on $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an array of pairs (label, number of such labels) can be stored to skip the maintaining operations. Note that these arrays are at most of size $deg$. Skipping this trick, the number of covered neighbours has to be stored for each node of $G_{small}$ as well to get the sets $T_{small}$ and $\tilde T_{small}$.
       
   786 
       
   787 Using similar tricks, the consistency function can be evaluated in $\Theta(deg)$ steps, as well.
       
   788 
       
   789 \section{The VF2 Plus Algorithm}
       
   790 The VF2 Plus algorithm is a recently improved version of VF2. It was compared with the state of the art algorithms in \cite{VF2Plus} and has proven itself to be competitive with RI, the best algorithm on biological graphs.
       
   791 \\
       
   792 A short summary of VF2 Plus follows, which uses the notation and the conventions of the original paper.
       
   793 
       
   794 \subsection{Ordering procedure}
       
   795 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$ with the lowest probability to find a pair in $V_{small}$ and the highest number of connections with the nodes already sorted by the algorithm.
       
   796 
       
   797 \begin{definition}
       
   798 $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$. 
       
   799 \end{definition}
       
   800 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in $V_{large}$
       
   801 \newline
       
   802 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in $V_{large}$
       
   803 \newline
       
   804 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\
       
   805 $M$ is the set of already sorted nodes, $T$ is the set of nodes candidate to be selected, and $degreeM$ of a node is the number of its neighbours in $M$.
       
   806 \begin{algorithm}
       
   807 \algtext*{EndIf}%ne nyomtasson end if-et
       
   808 \algtext*{EndFor}%ne nyomtasson ..
       
   809 \algtext*{EndProcedure}%ne nyomtasson ..
       
   810 \algtext*{EndWhile}
       
   811 \caption{}\label{alg:VF2PlusPseu}
       
   812 \begin{algorithmic}[1]
       
   813 \Procedure{VF2 Plus order}{}
       
   814   \State Select the node with the lowest $P$.
       
   815     \If {more nodes share the same $P$}
       
   816     \State select the one with maximum degree
       
   817     \EndIf
       
   818     \If {more nodes share the same $P$ and have the max degree}
       
   819     \State select the first
       
   820     \EndIf
       
   821   \State Put the selected node in the set $M$. \label{alg:putIn}
       
   822   \State Put all its unsorted neighbours in the set $T$.
       
   823   \If {$M\neq V_{small}$}
       
   824   \State From set $T$ select the node with maximum $degreeM$.
       
   825   \If {more nodes have maximum $degreeM$}
       
   826     \State Select the one with the lowest $P$
       
   827   \EndIf
       
   828   \If {more nodes have maximum $degreeM$ and $P$}
       
   829     \State Select the first.
       
   830   \EndIf
       
   831   \State \textbf{goto \ref{alg:putIn}.}
       
   832   \EndIf
       
   833 \EndProcedure
       
   834 \end{algorithmic}
       
   835 \end{algorithm}
       
   836 
       
   837 Using these notations, \textbf{Algorithm~\ref{alg:VF2PlusPseu})} provides the description of the sorting procedure.
       
   838 
       
   839 Note that $P(u)$ is not the exact probability of finding a consistent pair for $u$ by choosing a node of $V_{large}$ randomly, since $P_{lab}$ and $P_{deg}$ are not independent, though calculating the real probability would take quadratic time, which may be reduced by using fittingly lookup tables.
       
   840 
       
   841 \newpage
       
   842 \section{Experimental results}
       
   843 This section compares the performance of VF2++ and VF2 Plus. Both algorithms have run faster with orders of magnitude than VF2, thus its inclusion was not reasonable.
       
   844 \subsection{Biological graphs}
       
   845 The tests have been executed on a recent biological dataset created for the International Contest on Pattern Search in Biological Databases\cite{Content}, which has been constructed of Molecule, Protein and Contact Map graphs extracted from the Protein Data Bank\cite{ProteinDataBank}.
       
   846 
       
   847 The molecule dataset contains small graphs with less than 100 nodes and an average degree of less than 3. The protein dataset contains graphs having 500-10 000 nodes and an average degree of 4, while the contact map dataset contains graphs with 150-800 nodes and an average degree of 20.
       
   848 \\
       
   849 
       
   850 In the following, the induced subgraph isomorphism and the graph isomorphism will be examined.
       
   851 
       
   852 \subsubsection{Induced subgraph isomorphism}
       
   853 This dataset contains a set of graph pairs, and \textbf{all} the induced subgraph ismorphisms have to be found between them. \textbf{Figure \ref{fig:INDProt}), \ref{fig:INDContact}),} and \textbf{ \ref{fig:INDMolecule})} show the solution time of the problems in the problem set.
       
   854 
       
   855 \begin{figure}[H]
       
   856 \begin{center}
       
   857 \begin{tikzpicture}
       
   858   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
   859   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label        style={/pgf/number format/1000 sep = \thinspace}]
       
   860   %\addplot+[only marks] table {proteinsOrig.txt};
       
   861   \addplot[mark=*,mark size=1.2pt,color=blue] table {Orig/Proteins.256.txt};
       
   862   \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
       
   863   \end{axis}
       
   864   \end{tikzpicture}
       
   865 \end{center}
       
   866 \vspace*{-0.8cm}
       
   867 \caption{Both the algorithms have linear behaviour on protein graphs. VF2++ is more than 10 times faster than VF2 Plus.} \label{fig:INDProt}
       
   868 \end{figure}
       
   869 
       
   870 \begin{figure}[H]
       
   871 \begin{center}
       
   872 \begin{tikzpicture}
       
   873 \begin{axis}[title=Contact Maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
   874 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
   875 %\addplot+[only marks] table {proteinsOrig.txt};
       
   876 \addplot table {Orig/ContactMaps.128.txt};
       
   877 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/ContactMaps.128.txt};
       
   878 \end{axis}
       
   879 \end{tikzpicture}
       
   880 \end{center}
       
   881 \vspace*{-0.8cm}
       
   882 \caption{On Contact Maps, VF2++ runs in near constant time, while VF2 Plus has a near linear behaviour.} \label{fig:INDContact}
       
   883 \end{figure}
       
   884 
       
   885 \begin{figure}[H]
       
   886 \begin{center}
       
   887 \begin{tikzpicture}
       
   888 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
   889 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
   890 %\addplot+[only marks] table {proteinsOrig.txt};
       
   891 \addplot table {Orig/Molecules.32.txt};
       
   892 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
       
   893 \end{axis}
       
   894 \end{tikzpicture}
       
   895 \end{center}
       
   896 \vspace*{-0.8cm}
       
   897 \caption{In the case of Molecules, the algorithms seem to have a similar behaviour, but VF2++ is almost two times faster even on such small graphs.} \label{fig:INDMolecule}
       
   898 \end{figure}
       
   899 
       
   900 
       
   901 \subsubsection{Graph ismorphism}
       
   902 In this experiment, the nodes of each graph in the database have been shuffled and an isomorphism between the shuffled and the original graph has been searched. For runtime results, see \textbf{Figure \ref{fig:ISOProt}), \ref{fig:ISOContact}),} and \textbf{\ref{fig:ISOMolecule})}.
       
   903 \begin{figure}[H]
       
   904 \begin{center}
       
   905 \begin{tikzpicture}
       
   906 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
   907 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
   908 %\addplot+[only marks] table {proteinsOrig.txt};
       
   909 \addplot table {Orig/proteinsIso.txt};
       
   910 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
       
   911 \end{axis}
       
   912 \end{tikzpicture}
       
   913 \end{center}
       
   914 \vspace*{-0.8cm}
       
   915 \caption{On protein graphs, VF2 Plus has a super linear time complexity, while VF2++ runs in near constant time. The difference is about two order of magnitude on large graphs.}\label{fig:ISOProt}
       
   916 \end{figure}
       
   917 
       
   918 \begin{figure}[H]
       
   919 \begin{center}
       
   920 \begin{tikzpicture}
       
   921 \begin{axis}[title=Contact Maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
   922 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
   923 %\addplot+[only marks] table {proteinsOrig.txt};
       
   924 \addplot table {Orig/contactMapsIso.txt};
       
   925 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
       
   926 \end{axis}
       
   927 \end{tikzpicture}
       
   928 \end{center}
       
   929 \vspace*{-0.8cm}
       
   930 \caption{The results are closer to each other on Contact Maps, but VF2++ still performs consistently better.}\label{fig:ISOContact}
       
   931 \end{figure}
       
   932 
       
   933 \begin{figure}[H]
       
   934 \begin{center}
       
   935 \begin{tikzpicture}
       
   936 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
   937 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
   938 %\addplot+[only marks] table {proteinsOrig.txt};
       
   939 \addplot table {Orig/moleculesIso.txt};
       
   940 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
       
   941 \end{axis}
       
   942 \end{tikzpicture}
       
   943 \end{center}
       
   944 \vspace*{-0.8cm}
       
   945 \caption{In the case of Molecules, there is not such a significant difference, but VF2++ seems to be faster as the number of nodes increases.}\label{fig:ISOMolecule}
       
   946 \end{figure}
       
   947 
       
   948 
       
   949 \subsection{Random graphs}
       
   950 This section compares VF2++ with VF2 Plus on random graphs of a large size. The node labels are uniformly distributed.
       
   951 Let $\delta$ denote the average degree.
       
   952 For the parameters of problems solved in the experiments, please see the top of each chart.
       
   953 \subsubsection{Graph isomorphism}
       
   954 To evaluate the efficiency of the algorithms in the case of graph isomorphism, connected graphs of less than 20 000 nodes have been considered. Generating a random graph and shuffling its nodes, an isomorphism had to be found. \textbf{Figure \ref{fig:randISO5}), \ref{fig:randISO10}), \ref{fig:randISO15}), \ref{fig:randISO35}), \ref{fig:randISO45}),} and \textbf{\ref{fig:randISO100}) } show the runtime results on graph sets of various density.
       
   955 
       
   956 \begin{figure}[H]
       
   957 \begin{center}
       
   958 \begin{tikzpicture}
       
   959 \begin{axis}[title={Random ISO, $\delta = 5$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
   960 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
   961 %\addplot+[only marks] table {proteinsOrig.txt};
       
   962 \addplot table {randGraph/iso/vf2pIso5_1.txt};
       
   963 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso5_1.txt};
       
   964 \end{axis}
       
   965 \end{tikzpicture}
       
   966 \end{center}
       
   967 \vspace*{-0.8cm}
       
   968 \caption{}\label{fig:randISO5}
       
   969 \end{figure}
       
   970 
       
   971 \begin{figure}[H]
       
   972 \begin{center}
       
   973 \begin{tikzpicture}
       
   974 \begin{axis}[title={Random ISO, $\delta = 10$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
   975 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
   976 %\addplot+[only marks] table {proteinsOrig.txt};
       
   977 \addplot table {randGraph/iso/vf2pIso10_1.txt};
       
   978 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso10_1.txt};
       
   979 \end{axis}
       
   980 \end{tikzpicture}
       
   981 \end{center}
       
   982 \vspace*{-0.8cm}
       
   983 \caption{}\label{fig:randISO10}
       
   984 \end{figure}
       
   985 
       
   986 \begin{figure}[H]
       
   987 \begin{center}
       
   988 \begin{tikzpicture}
       
   989 \begin{axis}[title={Random ISO, $\delta = 15$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
   990 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
   991 %\addplot+[only marks] table {proteinsOrig.txt};
       
   992 \addplot table {randGraph/iso/vf2pIso15_1.txt};
       
   993 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso15_1.txt};
       
   994 \end{axis}
       
   995 \end{tikzpicture}
       
   996 \end{center}
       
   997 \vspace*{-0.8cm}
       
   998 \caption{}\label{fig:randISO15}
       
   999 \end{figure}
       
  1000 
       
  1001 \begin{figure}[H]
       
  1002 \begin{center}
       
  1003 \begin{tikzpicture}
       
  1004 \begin{axis}[title={Random ISO, $\delta = 35$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
  1005 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
  1006 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1007 \addplot table {randGraph/iso/vf2pIso35_1.txt};
       
  1008 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso35_1.txt};
       
  1009 \end{axis}
       
  1010 \end{tikzpicture}
       
  1011 \end{center}
       
  1012 \vspace*{-0.8cm}
       
  1013 \caption{}\label{fig:randISO35}
       
  1014 \end{figure}
       
  1015 
       
  1016 \begin{figure}[H]
       
  1017 \begin{center}
       
  1018 \begin{tikzpicture}
       
  1019 \begin{axis}[title={Random ISO, $\delta = 45$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
  1020 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
  1021 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1022 \addplot table {randGraph/iso/vf2pIso45_1.txt};
       
  1023 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso45_1.txt};
       
  1024 \end{axis}
       
  1025 \end{tikzpicture}
       
  1026 \end{center}
       
  1027 \vspace*{-0.8cm}
       
  1028 \caption{}\label{fig:randISO45}
       
  1029 \end{figure}
       
  1030 
       
  1031 \begin{figure}[H]
       
  1032 \begin{center}
       
  1033 \begin{tikzpicture}
       
  1034 \begin{axis}[title={Random ISO, $\delta = 100$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
       
  1035 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
  1036 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1037 \addplot table {randGraph/iso/vf2pIso100_1.txt};
       
  1038 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso100_1.txt};
       
  1039 \end{axis}
       
  1040 \end{tikzpicture}
       
  1041 \end{center}
       
  1042 \vspace*{-0.8cm}
       
  1043 \caption{}\label{fig:randISO100}
       
  1044 \end{figure}
       
  1045 
       
  1046 
       
  1047 Considering the graph isomorphism problem, VF2++ consistently outperforms its rival especially on sparse graphs. The reason for the slightly super linear behaviour of VF2++ on denser graphs is the larger number of nodes in the BFS tree constructed in \textbf{Algorithm \ref{alg:VF2PPPseu})}.
       
  1048 
       
  1049 \subsubsection{Induced subgraph isomorphism}
       
  1050 This section provides a comparison of VF2++ and VF2 Plus in the case of induced subgraph isomorphism. In addition to the size of the large graph, that of the small graph dramatically influences the hardness of a given problem too, so the overall picture is provided by examining small graphs of various size.
       
  1051 
       
  1052 For each chart, a number $0<\rho< 1$ has been fixed and the following has been executed 150 times. Generating a large graph $G_{large}$, choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes, and for all the 10 subgraphs find a mapping by using both the graph matching algorithms.
       
  1053 The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1, 0.3, 0.6, 0.8, 0.95$ cases have been examined (see \textbf{Figure \ref{fig:randIND5}), \ref{fig:randIND10})} and \textbf{\ref{fig:randIND35})}), and for each $\delta$, a cumulative chart is given as well, which excludes $\rho = 0.05$ and $0.1$ for the sake of perspicuity (see \textbf{Figure \ref{fig:randIND5Sum}), \ref{fig:randIND10Sum})} and \textbf{\ref{fig:randIND35Sum})}).
       
  1054 
       
  1055 
       
  1056 
       
  1057 
       
  1058 
       
  1059 \begin{figure}[H]
       
  1060 \vspace*{-0.8cm}
       
  1061 \begin{subfigure}[b]{0.55\textwidth}
       
  1062 \begin{center}
       
  1063 \begin{tikzpicture}
       
  1064 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1065 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1066 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1067 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
       
  1068 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.05.txt};
       
  1069 \end{axis}
       
  1070 \end{tikzpicture}
       
  1071 \end{center}
       
  1072      \end{subfigure}
       
  1073      \begin{subfigure}[b]{0.55\textwidth}
       
  1074 \begin{center}
       
  1075 \begin{tikzpicture}
       
  1076 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1077 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1078 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1079 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
       
  1080 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.1.txt};
       
  1081 \end{axis}
       
  1082 \end{tikzpicture}
       
  1083 \end{center}
       
  1084 \end{subfigure}
       
  1085 \hspace{1cm}
       
  1086 \begin{subfigure}[b]{0.55\textwidth}
       
  1087 \begin{center}
       
  1088 \begin{tikzpicture}
       
  1089 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1090 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1091 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1092 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
       
  1093 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.3.txt};
       
  1094 \end{axis}
       
  1095 \end{tikzpicture}
       
  1096 \end{center}
       
  1097      \end{subfigure}
       
  1098      \begin{subfigure}[b]{0.55\textwidth}
       
  1099 \begin{center}
       
  1100 \begin{tikzpicture}
       
  1101 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1102 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1103 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1104 \addplot table {randGraph/ind/vf2pInd5_0.6.txt};
       
  1105 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.6.txt};
       
  1106 \end{axis}
       
  1107 \end{tikzpicture}
       
  1108 \end{center}
       
  1109 \end{subfigure}
       
  1110 \begin{subfigure}[b]{0.55\textwidth}
       
  1111           
       
  1112 \begin{tikzpicture}
       
  1113 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1114 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1115 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1116 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
       
  1117 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.8.txt};
       
  1118 \end{axis}
       
  1119 \end{tikzpicture}
       
  1120      \end{subfigure}
       
  1121      \begin{subfigure}[b]{0.55\textwidth}
       
  1122 \begin{tikzpicture}
       
  1123 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1124 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
  1125 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1126 \addplot table {randGraph/ind/vf2pInd5_0.95.txt};
       
  1127 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.95.txt};
       
  1128 \end{axis}
       
  1129 \end{tikzpicture}
       
  1130 \end{subfigure}
       
  1131 \vspace*{-0.8cm}
       
  1132 \caption{IND on graphs having an average degree of 5.}\label{fig:randIND5}
       
  1133 \end{figure}
       
  1134 
       
  1135 \begin{figure}[H]
       
  1136 \begin{center}
       
  1137 \begin{tikzpicture}
       
  1138 \begin{axis}[title={Rand IND Summary, $\delta = 5$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
       
  1139 =major,mark size=1pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
  1140 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1141 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.3.txt};
       
  1142 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.3.txt};
       
  1143 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.6.txt};
       
  1144 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.6.txt};
       
  1145 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.8.txt};
       
  1146 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.8.txt};
       
  1147 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.95.txt};
       
  1148 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.95.txt};
       
  1149 \end{axis}
       
  1150 \end{tikzpicture}
       
  1151 \end{center}
       
  1152 \vspace*{-0.8cm}
       
  1153 \caption{Cummulative chart for $\delta=5$.}\label{fig:randIND5Sum}
       
  1154 \end{figure}
       
  1155 
       
  1156 
       
  1157 
       
  1158 \begin{figure}[H]
       
  1159 \vspace*{-0.8cm}
       
  1160 \begin{subfigure}[b]{0.55\textwidth}
       
  1161 \begin{center}
       
  1162 \begin{tikzpicture}
       
  1163 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1164 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1165 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1166 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
       
  1167 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.05.txt};
       
  1168 \end{axis}
       
  1169 \end{tikzpicture}
       
  1170 \end{center}
       
  1171      \end{subfigure}
       
  1172      \begin{subfigure}[b]{0.55\textwidth}
       
  1173 \begin{center}
       
  1174 \begin{tikzpicture}
       
  1175 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1176 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1177 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1178 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
       
  1179 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.1.txt};
       
  1180 \end{axis}
       
  1181 \end{tikzpicture}
       
  1182 \end{center}
       
  1183 \end{subfigure}
       
  1184 \hspace{1cm}
       
  1185 \begin{subfigure}[b]{0.55\textwidth}
       
  1186 \begin{center}
       
  1187 \begin{tikzpicture}
       
  1188 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1189 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1190 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1191 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
       
  1192 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.3.txt};
       
  1193 \end{axis}
       
  1194 \end{tikzpicture}
       
  1195 \end{center}
       
  1196      \end{subfigure}
       
  1197      \begin{subfigure}[b]{0.55\textwidth}
       
  1198 \begin{center}
       
  1199 \begin{tikzpicture}
       
  1200 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1201 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1202 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1203 \addplot table {randGraph/ind/vf2pInd10_0.6.txt};
       
  1204 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.6.txt};
       
  1205 \end{axis}
       
  1206 \end{tikzpicture}
       
  1207 \end{center}
       
  1208 \end{subfigure}
       
  1209 \begin{subfigure}[b]{0.55\textwidth}
       
  1210           
       
  1211 \begin{tikzpicture}
       
  1212 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1213 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1214 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1215 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
       
  1216 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.8.txt};
       
  1217 \end{axis}
       
  1218 \end{tikzpicture}
       
  1219      \end{subfigure}
       
  1220      \begin{subfigure}[b]{0.55\textwidth}
       
  1221 \begin{tikzpicture}
       
  1222 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1223 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
  1224 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1225 \addplot table {randGraph/ind/vf2pInd10_0.95.txt};
       
  1226 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.95.txt};
       
  1227 \end{axis}
       
  1228 \end{tikzpicture}
       
  1229 \end{subfigure}
       
  1230 \vspace*{-0.8cm}
       
  1231 \caption{IND on graphs having an average degree of 10.}\label{fig:randIND10}
       
  1232 \end{figure}
       
  1233 
       
  1234 \begin{figure}[H]
       
  1235 \begin{center}
       
  1236 \begin{tikzpicture}
       
  1237 \begin{axis}[title={Rand IND Summary, $\delta = 10$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
       
  1238 =major,mark size=1pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
  1239 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1240 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.3.txt};
       
  1241 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.3.txt};
       
  1242 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.6.txt};
       
  1243 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.6.txt};
       
  1244 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.8.txt};
       
  1245 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.8.txt};
       
  1246 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.95.txt};
       
  1247 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.95.txt};
       
  1248 \end{axis}
       
  1249 \end{tikzpicture}
       
  1250 \end{center}
       
  1251 \vspace*{-0.8cm}
       
  1252 \caption{Cummulative chart for $\delta=10$.}\label{fig:randIND10Sum}
       
  1253 \end{figure}
       
  1254 
       
  1255 
       
  1256 
       
  1257 \begin{figure}[H]
       
  1258 \vspace*{-0.8cm}
       
  1259 \begin{subfigure}[b]{0.55\textwidth}
       
  1260 \begin{center}
       
  1261 \begin{tikzpicture}
       
  1262 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1263 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1264 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1265 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
       
  1266 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.05.txt};
       
  1267 \end{axis}
       
  1268 \end{tikzpicture}
       
  1269 \end{center}
       
  1270      \end{subfigure}
       
  1271      \begin{subfigure}[b]{0.55\textwidth}
       
  1272 \begin{center}
       
  1273 \begin{tikzpicture}
       
  1274 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1275 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1276 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1277 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
       
  1278 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.1.txt};
       
  1279 \end{axis}
       
  1280 \end{tikzpicture}
       
  1281 \end{center}
       
  1282 \end{subfigure}
       
  1283 \hspace{1cm}
       
  1284 \begin{subfigure}[b]{0.55\textwidth}
       
  1285 \begin{center}
       
  1286 \begin{tikzpicture}
       
  1287 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1288 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1289 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1290 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
       
  1291 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.3.txt};
       
  1292 \end{axis}
       
  1293 \end{tikzpicture}
       
  1294 \end{center}
       
  1295      \end{subfigure}
       
  1296      \begin{subfigure}[b]{0.55\textwidth}
       
  1297 \begin{center}
       
  1298 \begin{tikzpicture}
       
  1299 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1300 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1301 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1302 \addplot table {randGraph/ind/vf2pInd35_0.6.txt};
       
  1303 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.6.txt};
       
  1304 \end{axis}
       
  1305 \end{tikzpicture}
       
  1306 \end{center}
       
  1307 \end{subfigure}
       
  1308 \begin{subfigure}[b]{0.55\textwidth}
       
  1309           
       
  1310 \begin{tikzpicture}
       
  1311 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1312 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}]
       
  1313 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1314 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
       
  1315 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.8.txt};
       
  1316 \end{axis}
       
  1317 \end{tikzpicture}
       
  1318      \end{subfigure}
       
  1319      \begin{subfigure}[b]{0.55\textwidth}
       
  1320 \begin{tikzpicture}
       
  1321 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
       
  1322 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
  1323 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1324 \addplot table {randGraph/ind/vf2pInd35_0.95.txt};
       
  1325 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.95.txt};
       
  1326 \end{axis}
       
  1327 \end{tikzpicture}
       
  1328 \end{subfigure}
       
  1329 \vspace*{-0.8cm}
       
  1330 \caption{IND on graphs having an average degree of 35.}\label{fig:randIND35}
       
  1331 \end{figure}
       
  1332 
       
  1333 \begin{figure}[H]
       
  1334 \begin{center}
       
  1335 \begin{tikzpicture}
       
  1336 \begin{axis}[title={Rand IND Summary, $\delta = 35$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
       
  1337 =major,mark size=1pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}]
       
  1338 %\addplot+[only marks] table {proteinsOrig.txt};
       
  1339 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.3.txt};
       
  1340 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.3.txt};
       
  1341 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.6.txt};
       
  1342 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.6.txt};
       
  1343 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.8.txt};
       
  1344 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.8.txt};
       
  1345 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.95.txt};
       
  1346 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.95.txt};
       
  1347 \end{axis}
       
  1348 \end{tikzpicture}
       
  1349 \end{center}
       
  1350 \vspace*{-0.8cm}
       
  1351 \caption{Cummulative chart for $\delta=35$.}\label{fig:randIND35Sum}
       
  1352 \end{figure}
       
  1353 
       
  1354 Based on these experiments, VF2++ is faster than VF2 Plus and able to handle really large graphs in milliseconds. Note that when $IND$ was considered and the small graphs had proportionally few nodes ($\rho = 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node orders(e.g. see the $\delta=10$ case on \textbf{Figure \ref{fig:randIND10})}). If these examples had been excluded, the charts would have seemed to be similar to the other ones.
       
  1355 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 Plus slow slightly down, but remain practically usable even on graphs having 10 000 nodes.
       
  1356 
       
  1357 
       
  1358 
       
  1359 
       
  1360 \newpage
       
  1361 \section{Conclusion}
       
  1362 In this thesis, after providing a short summary of the recent algorithms, a new graph matching algorithm based on VF2, called VF2++, has been presented and analyzed from a practical viewpoint.
       
  1363 
       
  1364 Recognizing the importance of the node order and determining an efficient one, VF2++ is able to match graphs of thousands of nodes in near practically linear time including preprocessing. In addition to the proper order, VF2++ uses more efficient consistency and cutting rules which are easy to compute and make the algorithm able to prune most of the unfruitful branches without going astray.
       
  1365 
       
  1366 In order to show the efficiency of the new method, it has been compared to VF2 Plus, which is the best concurrent algorithm based on \cite{VF2Plus}.
       
  1367 
       
  1368 The experiments show that VF2++ consistently outperforms VF2 Plus on biological graphs. It seems to be asymptotically faster on protein and on contact map graphs in the case of induced subgraph isomorphism, while in the case of graph isomorphism, it has definitely better asymptotic behaviour on protein graphs.
       
  1369 
       
  1370 Regarding random sparse graphs, not only has VF2++ proved itself to be faster than VF2 Plus, but it has a practically linear behaviour both in the case of induced subgraph- and graph isomorphism, as well.
       
  1371 
       
  1372 
   152 
  1373 
   153 %% The Appendices part is started with the command \appendix;
  1374 %% The Appendices part is started with the command \appendix;
   154 %% appendix sections are then done as normal sections
  1375 %% appendix sections are then done as normal sections
   155 %% \appendix
  1376 %% \appendix
   156 
  1377 
   158 %% \label{}
  1379 %% \label{}
   159 
  1380 
   160 %% If you have bibdatabase file and want bibtex to generate the
  1381 %% If you have bibdatabase file and want bibtex to generate the
   161 %% bibitems, please use
  1382 %% bibitems, please use
   162 %%
  1383 %%
   163 %%  \bibliographystyle{elsarticle-num} 
  1384 \bibliographystyle{elsarticle-num} 
   164 %%  \bibliography{<your bibdatabase>}
  1385 \bibliography{bibliography}
   165 
  1386 
   166 %% else use the following coding to input the bibitems directly in the
  1387 %% else use the following coding to input the bibitems directly in the
   167 %% TeX file.
  1388 %% TeX file.
   168 
  1389 
   169 \begin{thebibliography}{00}
  1390 %% \begin{thebibliography}{00}
   170 
  1391 
   171 %% \bibitem{label}
  1392 %% %% \bibitem{label}
   172 %% Text of bibliographic item
  1393 %% %% Text of bibliographic item
   173 
  1394 
   174 \bibitem{}
  1395 %% \bibitem{}
   175 
  1396 
   176 \end{thebibliography}
  1397 %% \end{thebibliography}
       
  1398 
   177 \end{document}
  1399 \end{document}
   178 \endinput
  1400 \endinput
   179 %%
  1401 %%
   180 %% End of file `elsarticle-template-num.tex'.
  1402 %% End of file `elsarticle-template-num.tex'.