damecco.tex
author Madarasi Peter
Wed, 23 Nov 2016 20:45:31 +0100
changeset 9 2ad9aa6d7f63
parent 8 49574b75404b
child 10 1e4a79cd8332
permissions -rw-r--r--
The implementation details part has been shortened
     1 %% 
     2 %% Copyright 2007, 2008, 2009 Elsevier Ltd
     3 %% 
     4 %% This file is part of the 'Elsarticle Bundle'.
     5 %% ---------------------------------------------
     6 %% 
     7 %% It may be distributed under the conditions of the LaTeX Project Public
     8 %% License, either version 1.2 of this license or (at your option) any
     9 %% later version.  The latest version of this license is in
    10 %%    http://www.latex-project.org/lppl.txt
    11 %% and version 1.2 or later is part of all distributions of LaTeX
    12 %% version 1999/12/01 or later.
    13 %% 
    14 %% The list of all files belonging to the 'Elsarticle Bundle' is
    15 %% given in the file `manifest.txt'.
    16 %% 
    17 
    18 %% Template article for Elsevier's document class `elsarticle'
    19 %% with numbered style bibliographic references
    20 %% SP 2008/03/01
    21 
    22 \documentclass[preprint,12pt]{elsarticle}
    23 
    24 %% Use the option review to obtain double line spacing
    25 %% \documentclass[authoryear,preprint,review,12pt]{elsarticle}
    26 
    27 %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
    28 %% for a journal layout:
    29 %% \documentclass[final,1p,times]{elsarticle}
    30 %% \documentclass[final,1p,times,twocolumn]{elsarticle}
    31 %% \documentclass[final,3p,times]{elsarticle}
    32 %% \documentclass[final,3p,times,twocolumn]{elsarticle}
    33 %% \documentclass[final,5p,times]{elsarticle}
    34 %% \documentclass[final,5p,times,twocolumn]{elsarticle}
    35 
    36 %% For including figures, graphicx.sty has been loaded in
    37 %% elsarticle.cls. If you prefer to use the old commands
    38 %% please give \usepackage{epsfig}
    39 
    40 %% The amssymb package provides various useful mathematical symbols
    41 \usepackage{amssymb}
    42 %% The amsthm package provides extended theorem environments
    43 %% \usepackage{amsthm}
    44 
    45 %% The lineno packages adds line numbers. Start line numbering with
    46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
    47 %% for the whole article with \linenumbers.
    48 %% \usepackage{lineno}
    49 
    50 \usepackage{amsmath}
    51 %% \usepackage[pdftex]{graphicx}
    52 
    53 \usepackage{pgfplots}
    54 \pgfplotsset{width=9cm}
    55 \pgfplotsset{compat=1.8}
    56 
    57 \usepackage{caption}
    58 \usepackage{subcaption} 
    59 
    60 \usepackage{algorithm}
    61 \usepackage{algpseudocode}
    62 \usepackage{tikz}
    63 
    64 \usepackage{amsthm,amssymb}
    65 \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}
    66 
    67 \newtheorem{theorem}{Theorem}[subsection]
    68 \newtheorem{corollary}{Corollary}[theorem]
    69 \newtheorem{claim}[theorem]{Claim}
    70 
    71 \newtheorem{definition}{Definition}[subsection]
    72 \newtheorem{notation}{Notation}[subsection]
    73 \newtheorem{example}{Example}[subsection]
    74 \usetikzlibrary{decorations.markings}
    75 \let\oldproofname=\proofname
    76 %% \renewcommand{\proofname}{\rm\bf{Proof:}}
    77 
    78 \captionsetup{font=normalsize}
    79 
    80 \journal{Discrete Applied Mathematics}
    81 
    82 \begin{document}
    83 
    84 \begin{frontmatter}
    85 
    86 %% Title, authors and addresses
    87 
    88 %% use the tnoteref command within \title for footnotes;
    89 %% use the tnotetext command for theassociated footnote;
    90 %% use the fnref command within \author or \address for footnotes;
    91 %% use the fntext command for theassociated footnote;
    92 %% use the corref command within \author for corresponding author footnotes;
    93 %% use the cortext command for theassociated footnote;
    94 %% use the ead command for the email address,
    95 %% and the form \ead[url] for the home page:
    96 %% \title{Title\tnoteref{label1}}
    97 %% \tnotetext[label1]{}
    98 %% \author{Name\corref{cor1}\fnref{label2}}
    99 %% \ead{email address}
   100 %% \ead[url]{home page}
   101 %% \fntext[label2]{}
   102 %% \cortext[cor1]{}
   103 %% \address{Address\fnref{label3}}
   104 %% \fntext[label3]{}
   105 
   106 \title{Improved Algorithms for Matching Biological Graphs}
   107 
   108 %% use optional labels to link authors explicitly to addresses:
   109 %% \author[label1,label2]{}
   110 %% \address[label1]{}
   111 %% \address[label2]{}
   112 
   113 \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi}
   114 
   115 \address{Dept of Operations Research, ELTE}
   116 
   117 \begin{abstract}
   118 Subgraph isomorphism is a well-known NP-Complete problem, while its
   119 special case, the graph isomorphism problem is one of the few problems
   120 in NP neither known to be in P nor NP-Complete. Their appearance in
   121 many fields of application such as pattern analysis, computer vision
   122 questions and the analysis of chemical and biological systems has
   123 fostered the design of various algorithms for handling special graph
   124 structures.
   125 
   126 The idea of using state space representation and checking some
   127 conditions in each state to prune the search tree has made the VF2
   128 algorithm one of the state of the art graph matching algorithms for
   129 more than a decade. Recently, biological questions of ever increasing
   130 importance have required more efficient, specialized algorithms.
   131 
   132 This paper presents VF2++, a new algorithm based on the original VF2,
   133 which runs significantly faster on most test cases and performs
   134 especially well on special graph classes stemming from biological
   135 questions. VF2++ handles graphs of thousands of nodes in practically
   136 near linear time including preprocessing. Not only is it an improved
   137 version of VF2, but in fact, it is by far the fastest existing
   138 algorithm regarding biological graphs.
   139 
   140 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking
   141 into account the structure and the node labeling of the graph, VF2++
   142 determines a state order in which most of the unfruitful branches of
   143 the search space can be pruned immediately. Secondly, introducing more
   144 efficient - nevertheless still easier to compute - cutting rules
   145 reduces the chance of going astray even further.
   146 
   147 In addition to the usual subgraph isomorphism, specialized versions
   148 for induced subgraph isomorphism and for graph isomorphism are
   149 presented. VF2++ has gained a runtime improvement of one order of
   150 magnitude respecting induced subgraph isomorphism and a better
   151 asymptotical behaviour in the case of graph isomorphism problem.
   152 
   153 After having provided the description of VF2++, in order to evaluate
   154 its effectiveness, an extensive comparison to the contemporary other
   155 algorithms is shown, using a wide range of inputs, including both real
   156 life biological and chemical datasets and standard randomly generated
   157 graph series.
   158 
   159 The work was motivated and sponsored by QuantumBio Inc., and all the
   160 developed algorithms are available as the part of the open source
   161 LEMON graph and network optimization library
   162 (http://lemon.cs.elte.hu).
   163 \end{abstract}
   164 
   165 \begin{keyword}
   166 %% keywords here, in the form: keyword \sep keyword
   167 
   168 %% PACS codes here, in the form: \PACS code \sep code
   169 
   170 %% MSC codes here, in the form: \MSC code \sep code
   171 %% or \MSC[2008] code \sep code (2000 is the default)
   172 
   173 \end{keyword}
   174 
   175 \end{frontmatter}
   176 
   177 %% \linenumbers
   178 
   179 %% main text
   180 \section{Introduction}
   181 \label{sec:intro}
   182 
   183 In the last decades, combinatorial structures, and especially graphs
   184 have been considered with ever increasing interest, and applied to the
   185 solution of several new and revised questions.  The expressiveness,
   186 the simplicity and the studiedness of graphs make them practical for
   187 modelling and appear constantly in several seemingly independent
   188 fields.  Bioinformatics and chemistry are amongst the most relevant
   189 and most important fields.
   190 
   191 Complex biological systems arise from the interaction and cooperation
   192 of plenty of molecular components. Getting acquainted with such
   193 systems at the molecular level has primary importance, since
   194 protein-protein interaction, DNA-protein interaction, metabolic
   195 interaction, transcription factor binding, neuronal networks, and
   196 hormone signaling networks can be understood only this way.
   197 
   198 For instance, a molecular structure can be considered as a graph,
   199 whose nodes correspond to atoms and whose edges to chemical bonds. The
   200 secondary structure of a protein can also be represented as a graph,
   201 where nodes are associated with aminoacids and the edges with hydrogen
   202 bonds. The nodes are often whole molecular components and the edges
   203 represent some relationships among them.  The similarity and
   204 dissimilarity of objects corresponding to nodes are incorporated to
   205 the model by \emph{node labels}.  Many other chemical and biological
   206 structures can easily be modeled in a similar way. Understanding such
   207 networks basically requires finding specific subgraphs, which can not
   208 avoid the application of graph matching algorithms.
   209 
   210 Finally, let some of the other real-world fields related to some
   211 variants of graph matching be briefly mentioned: pattern recognition
   212 and machine vision \cite{HorstBunkeApplications}, symbol recognition
   213 \cite{CordellaVentoSymbolRecognition}, face identification
   214 \cite{JianzhuangYongFaceIdentification}.  \\
   215 
   216 Subgraph and induced subgraph matching problems are known to be
   217 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is
   218 one of the few problems in NP neither known to be in P nor
   219 NP-Complete. Although polynomial time isomorphism algorithms are known
   220 for various graph classes, like trees and planar
   221 graphs\cite{PlanarGraphIso}, bounded valence
   222 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
   223 or permutation graphs\cite{PermGraphIso}.
   224 
   225 In the following, some algorithms based on other approaches are
   226 summarized, which do not need any restrictions on the graphs. However,
   227 an overall polynomial behaviour is not expectable from such an
   228 alternative, it may often have good performance, even on a graph class
   229 for which polynomial algorithm is known. Note that this summary
   230 containing only exact matching algorithms is far not complete, neither
   231 does it cover all the recent algorithms.
   232 
   233 The first practically usable approach was due to
   234 Ullmann\cite{Ullmann} which is a commonly used depth-first
   235 search based algorithm with a complex heuristic for reducing the
   236 number of visited states. A major problem is its $\Theta(n^3)$ space
   237 complexity, which makes it impractical in the case of big sparse
   238 graphs.
   239 
   240 In a recent paper, Ullmann\cite{UllmannBit} presents an
   241 improved version of this algorithm based on a bit-vector solution for
   242 the binary Constraint Satisfaction Problem.
   243 
   244 The Nauty algorithm\cite{Nauty} transforms the two graphs to
   245 a canonical form before starting to check for the isomorphism. It has
   246 been considered as one of the fastest graph isomorphism algorithms,
   247 although graph categories were shown in which it takes exponentially
   248 many steps. This algorithm handles only the graph isomorphism problem.
   249 
   250 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search
   251 strategy and formulates the matching as a Constraint Satisfaction
   252 Problem to prune the search tree. The constraints are that the mapping
   253 has to be injective and edge-preserving, hence it is possible to
   254 handle new matching types as well.
   255 
   256 The \textbf{RI} algorithm\cite{RI} and its variations are based on a
   257 state space representation. After reordering the nodes of the graphs,
   258 it uses some fast executable heuristic checks without using any
   259 complex pruning rules. It seems to run really efficiently on graphs
   260 coming from biology, and won the International Contest on Pattern
   261 Search in Biological Databases\cite{Content}.
   262 
   263 The currently most commonly used algorithm is the
   264 \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was
   265 designed for solving pattern matching and computer vision problems,
   266 and has been one of the best overall algorithms for more than a
   267 decade. Although, it can't be up to new specialized algorithms, it is
   268 still widely used due to its simplicity and space efficiency. VF2 uses
   269 a state space representation and checks some conditions in each state
   270 to prune the search tree.
   271 
   272 Our first graph matching algorithm was the first version of VF2 which
   273 recognizes the significance of the node ordering, more opportunities
   274 to increase the cutting efficiency and reduce its computational
   275 complexity. This project was initiated and sponsored by QuantumBio
   276 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
   277 code --- has been published as a part of LEMON\cite{LEMON} open source
   278 graph library.
   279 
   280 This paper introduces \textbf{VF2++}, a new further improved algorithm
   281 for the graph and (induced)subgraph isomorphism problem, which uses
   282 efficient cutting rules and determines a node order in which VF2 runs
   283 significantly faster on practical inputs.
   284 
   285 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has
   286 been published. It is considered to be as efficient as the RI
   287 algorithm and has a strictly better behavior on large graphs.  The
   288 main idea of VF2 Plus is to precompute a heuristic node order of the
   289 small graph, in which the VF2 works more efficiently.
   290 
   291 \section{Problem Statement}
   292 This section provides a detailed description of the problems to be
   293 solved.
   294 \subsection{Definitions}
   295 
   296 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and
   297 $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs.
   298 \begin{definition}\label{sec:ismorphic}
   299 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M:
   300   V_{small} \longrightarrow V_{large}$ bijection, for which the
   301   following is true:
   302 \begin{center}
   303 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
   304   (M(u),M(v))\in{E_{large}}$
   305 \end{center}
   306 \end{definition}
   307 For the sake of simplicity in this paper subgraphs and induced
   308 subgraphs are defined in a more general way than usual:
   309 \begin{definition}
   310 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I:
   311   V_{small}\longrightarrow V_{large}$ injection, for which the
   312   following is true:
   313 \begin{center}
   314 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$
   315 \end{center}
   316 \end{definition}
   317 
   318 \begin{definition} 
   319 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists
   320   I: V_{small}\longrightarrow V_{large}$ injection, for which the
   321   following is true:
   322 \begin{center}
   323 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
   324   (I(u),I(v))\in E_{large}$
   325 \end{center}
   326 \end{definition}
   327 
   328 \begin{definition}
   329 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node
   330     label function}, where K is an arbitrary set. The elements in K
   331   are the \textbf{node labels}. Two nodes, u and v are said to be
   332   \textbf{equivalent}, if $lab(u)=lab(v)$.
   333 \end{definition}
   334 
   335 When node labels are also given, the matched nodes must have the same
   336 labels.  For example, the node labeled isomorphism is phrased by
   337 \begin{definition}
   338 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label
   339     function lab} if $\exists M: V_{small} \longrightarrow V_{large}$
   340   bijection, for which the following is true:
   341 \begin{center}
   342 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
   343   (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} :
   344   lab(u)=lab(M(u)))$
   345 \end{center}
   346 \end{definition}
   347 
   348 The other two definitions can be extended in the same way.
   349 
   350 Note that edge label function can be defined similarly to node label
   351 function, and all the definitions can be extended with additional
   352 conditions, but it is out of the scope of this work.
   353 
   354 The equivalence of two nodes is usually defined by another relation,
   355 $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the
   356 definition given above if R is an equivalence relation, which does not
   357 mean restriction in biological and chemical applications.
   358 
   359 \subsection{Common problems}\label{sec:CommProb}
   360 
   361 The focus of this paper is on two extensively studied topics, the
   362 subgraph isomorphism and its variations. However, the following
   363 problems also appear in many applications.
   364 
   365 The \textbf{subgraph matching problem} is the following: is
   366 $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node
   367 label?
   368 
   369 The \textbf{induced subgraph matching problem} asks the same about the
   370 existence of an induced subgraph.
   371 
   372 The \textbf{graph isomorphism problem} can be defined as induced
   373 subgraph matching problem where the sizes of the two graphs are equal.
   374 
   375 In addition to existence, it may be needed to show such a subgraph, or
   376 it may be necessary to list all of them.
   377 
   378 It should be noted that some authors misleadingly refer to the term
   379 \emph{subgraph isomorphism problem} as an \emph{induced subgraph
   380   isomorphism problem}.
   381 
   382 The following sections give the descriptions of VF2, VF2++, VF2 Plus
   383 and a particular comparison.
   384 
   385 \section{The VF2 Algorithm}
   386 This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2
   387 is able to handle all the variations mentioned in Section
   388   \ref{sec:CommProb}.  Although it can also handle directed graphs,
   389 for the sake of simplicity, only the undirected case will be
   390 discussed.
   391 
   392 
   393 \subsection{Common notations}
   394 \indent Assume $G_{small}$ is searched in $G_{large}$.  The following
   395 definitions and notations will be used throughout the whole paper.
   396 \begin{definition}
   397 A set $M\subseteq V_{small}\times V_{large}$ is called
   398 \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears
   399 in more than one pair in M.  That is, M uniquely associates some of
   400 the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice
   401 versa.
   402 \end{definition}
   403 
   404 \begin{definition}
   405 Mapping M \textbf{covers} a node v, if there exists a pair in M, which
   406 contains v.
   407 \end{definition}
   408 
   409 \begin{definition}
   410 A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the
   411 nodes in $V_{small}$.
   412 \end{definition}
   413 
   414 \begin{notation}
   415 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in
   416 V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in
   417 V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$.
   418 \end{notation}
   419 
   420 \begin{notation}
   421 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node
   422 exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$
   423 and $v\in V_{small}\cup V_{large}$.
   424 \end{notation}
   425 
   426 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique
   427 
   428 The definitions of the isomorphism types can be rephrased on the
   429 existence of a special whole mapping $M$, since it represents a
   430 bijection. For example
   431 \begin{center}
   432 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph
   433   isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v
   434   \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
   435   (Pair(M,u),Pair(M,v))\in E_{large}$.
   436 \end{center}
   437 
   438 \begin{definition}
   439 A set of whole mappings is called \textbf{problem type}.
   440 \end{definition}
   441 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
   442 which can be substituted by any problem type.
   443 
   444 A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using
   445 this notations, VF2 searches a whole mapping $W$ of type $PT$.
   446 
   447 For example the problem type of graph isomorphism problem is the
   448 following.  A whole mapping $W$ is in $\mathbf{ISO}$, iff the
   449 bijection represented by $W$ satisfies Definition~\ref{sec:ismorphic}.
   450 The subgraph- and induced subgraph matching problems can be formalized
   451 in a similar way. Let their problem types be denoted as $\mathbf{SUB}$
   452 and $\mathbf{IND}$.
   453 
   454 \begin{definition}
   455 \label{expPT}
   456 $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in
   457 PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow
   458 (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of
   459 $G_{small}$ has to be mapped to an edge of $G_{large}$ for each
   460 mapping in $PT$.
   461 \end{definition}
   462 
   463 Note that $ISO$, $SUB$ and $IND$ are expanding problem types.
   464 
   465 This paper deals with the three problem types mentioned above only,
   466 but the following generic definitions make it possible to handle other
   467 types as well.  Although it may be challenging to find a proper
   468 consistency function and an efficient cutting function.
   469 
   470 \begin{definition}
   471 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
   472 \textbf{consistency function by } $\mathbf{PT}$, if the following
   473 holds. If there exists whole mapping $W$ of type PT for which
   474 $M\subseteq W$, then $Cons_{PT}(M)$ is true.
   475 \end{definition}
   476 
   477 \begin{definition} 
   478 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
   479 \textbf{cutting function by } $\mathbf{PT}$, if the following
   480 holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a
   481 whole mapping W of type PT.
   482 \end{definition}
   483 
   484 \begin{definition}
   485 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if
   486   $Cons_{PT}(M)$ is true.
   487 \end{definition}
   488 
   489 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
   490 \begin{notation}
   491 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and
   492 $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where
   493 $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping.
   494 \end{notation}
   495 
   496 $Cons_{PT}$ will be used to check the consistency of the already
   497 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
   498 no whole consistent mapping can contain the current mapping.
   499 
   500 \subsection{Overview of the algorithm}
   501 VF2 uses a state space representation of mappings, $Cons_{PT}$ for
   502 excluding inconsistency with the problem type and $Cut_{PT}$ for
   503 pruning the search tree.  Each state $s$ of the matching process can
   504 be associated with a mapping $M(s)$.
   505 
   506 Algorithm~\ref{alg:VF2Pseu} is a high level description of
   507 the VF2 matching algorithm.
   508 
   509 
   510 \begin{algorithm}
   511 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
   512 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
   513 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
   514 \begin{algorithmic}[1]
   515 
   516 \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers
   517   $V_{small}$} \State Output($M(s)$) \Else
   518   
   519   \State Compute the set $P(s)$ of the pairs candidate for inclusion
   520   in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$
   521     $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state
   522   $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call}
   523   VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
   524 \end{algorithmic}
   525 \end{algorithm}
   526 
   527 
   528 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it
   529 starts with an empty mapping.
   530 
   531 For each state $s$, the algorithm computes $P(s)$, the set of
   532 candidate node pairs for adding to the current state $s$.
   533 
   534 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and
   535 $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and
   536 $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup
   537 \{p\}$ is computed, and the whole process is recursively applied to
   538 $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it
   539 can be proved that $s$ can not be extended to a whole mapping.
   540 
   541 In order to make sure of the correctness see
   542 \begin{claim}
   543 Through consistent mappings, only consistent whole mappings can be
   544 reached, and all of the whole mappings are reachable through
   545 consistent mappings.
   546 \end{claim}
   547 
   548 Note that a state may be reached in many different ways, since the
   549 order of insertions into M does not influence the nascent mapping. In
   550 fact, the number of different ways which lead to the same state can be
   551 exponentially large. If $G_{small}$ and $G_{large}$ are circles with n
   552 nodes and n different node labels, there exists exactly one graph
   553 isomorphism between them, but it will be reached in $n!$ different
   554 ways.
   555 
   556 However, one may observe
   557 
   558 \begin{claim}
   559 \label{claim:claimTotOrd}
   560 Let $\prec$ an arbitrary total ordering relation on $V_{small}$.  If
   561 the algorithm ignores each $p=(u,v) \in P(s)$, for which
   562 \begin{center}
   563 $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$,
   564 \end{center}
   565 then no state can be reached more than ones and each state associated
   566 with a whole mapping remains reachable.
   567 \end{claim}
   568 
   569 Note that the cornerstone of the improvements to VF2 is a proper
   570 choice of a total ordering.
   571 
   572 \subsection{The candidate set P(s)}
   573 \label{candidateComputingVF2}
   574 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.
   575 Suppose that $PT$ is an expanding problem type, see
   576 Definition~\ref{expPT}.
   577 
   578 \begin{notation}
   579 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by
   580 $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}}
   581 \wedge \tilde{u}$ is covered by $M(s)\}$, and
   582 \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not
   583 covered by
   584 $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}}
   585 \wedge \tilde{v}$ is covered by $M(s)\}$
   586 \end{notation}
   587 
   588 The set $P(s)$ includes the pairs of uncovered neighbours of covered
   589 nodes and if there is not such a node pair, all the pairs containing
   590 two uncovered nodes are added. Formally, let
   591 \[
   592  P(s)\!=\!
   593   \begin{cases} 
   594    T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if }
   595    T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq
   596    \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s))
   597    &\hspace{-0.15cm}otherwise.
   598   \end{cases}
   599 \]
   600 
   601 \subsection{Consistency}
   602 This section defines the consistency functions for the different
   603 problem types mentioned in Section~\ref{sec:CommProb}.
   604 \begin{notation}
   605 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} :
   606 (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large}
   607   (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$
   608 \end{notation}
   609 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is
   610 a state of the matching procedure, $M(s)$ is consistent mapping by
   611 $PT$ and $lab(u)=lab(v)$.  $Cons_{PT}(p,M(s))$ checks whether
   612 including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$.
   613 
   614 \subsubsection{Induced subgraph isomorphism}
   615 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow
   616 (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small}
   617 \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline The
   618 following formulation gives an efficient way of calculating
   619 $Cons_{IND}$.
   620 \begin{claim}
   621 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v)
   622   \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge
   623   (\forall \tilde{u}\in \Gamma_{small}(u)
   624   \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
   625   consistency function in the case of $IND$.
   626 \end{claim}
   627 
   628 \subsubsection{Graph isomorphism}
   629 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$
   630 $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by
   631 $IND$.
   632 \begin{claim}
   633 $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and
   634   only if it is a consistency function by $IND$.
   635 \end{claim}
   636 \subsubsection{Subgraph isomorphism}
   637 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow
   638 (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small}
   639 \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.
   640 \newline
   641 The following formulation gives an efficient way of calculating
   642 $Cons_{SUB}$.
   643 \begin{claim}
   644 $Cons_{SUB}((u,v),M(s)):= (\forall \tilde{u}\in \Gamma_{small}(u)
   645   \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
   646   consistency function by $SUB$.
   647 \end{claim}
   648 
   649 \subsection{Cutting rules}
   650 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently
   651 verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can
   652 be true only if it is impossible to extended $M(s)\cup \{p\}$ to a
   653 whole mapping.
   654 \begin{notation}
   655 
   656 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash
   657 M_{small}(s))\backslash T_{small}(s)$, and
   658 \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash
   659 M_{large}(s))\backslash T_{large}(s)$.
   660 \end{notation}
   661 \subsubsection{Induced subgraph isomorphism}
   662 \begin{claim}
   663 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
   664   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
   665   \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap
   666   \tilde{T}_{small}(s)|$ is a cutting function by $IND$.
   667 \end{claim}
   668 \subsubsection{Graph isomorphism}
   669 Note that the cutting function of induced subgraph isomorphism defined
   670 above is a cutting function by $ISO$, too, however it is less
   671 efficient than the following while their computational complexity is
   672 the same.
   673 \begin{claim}
   674 $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq
   675   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
   676   \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap
   677   \tilde{T}_{small}(s)|$ is a cutting function by $ISO$.
   678 \end{claim}
   679 
   680 \subsubsection{Subgraph isomorphism}
   681 \begin{claim}
   682 $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
   683   |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by
   684   $SUB$.
   685 \end{claim}
   686 Note that there is a significant difference between induced and
   687 non-induced subgraph isomorphism:
   688 
   689 \begin{claim}
   690 \label{claimSUB}
   691 $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
   692 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
   693 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$
   694 is \textbf{not} a cutting function by $SUB$.
   695 \end{claim}
   696 \begin{proof}$ $\\
   697 \vspace*{-0.5cm}
   698 
   699 \begin{figure}
   700 \begin{center}
   701 \begin{tikzpicture}
   702   [scale=.8,auto=left,every node/.style={circle,fill=black!15}]
   703   \node[rectangle,fill=black!15] at (4,6) {$G_{small}$}; \node (u4) at
   704   (2.5,10) {$u_4$}; \node (u3) at (5.5,10) {$u_3$}; \node (u1) at
   705   (2.5,7) {$u_1$}; \node (u2) at (5.5,7) {$u_2$};
   706   
   707   \node[rectangle,fill=black!30] at (13.5,6) {$G_{large}$};
   708   \node[fill=black!30] (v4) at (12,10) {$v_4$}; \node[fill=black!30]
   709   (v3) at (15,10) {$v_3$}; \node[fill=black!30] (v1) at (12,7)
   710        {$v_1$}; \node[fill=black!30] (v2) at (15,7) {$v_2$};
   711   
   712 
   713   \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1} \draw (\from) --
   714   (\to); \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3} \draw
   715   (\from) -- (\to);
   716 %    \draw[dashed] (\from) -- (\to);
   717 \end{tikzpicture}
   718 \caption{Graphs for the proof of Claim~\ref{claimSUB}}
   719 \label{fig:proofSUB}
   720 \end{center}
   721 \end{figure}
   722 Let the two graphs of Figure~\ref{fig:proofSUB} be the input
   723 graphs.  Suppose the total ordering relation is $u_1 \prec u_2 \prec
   724 u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add
   725 $(u_2,v_2)\in P(s)$.\newline $Cons_{SUB}((u_2,v_2),M(s))=true$, so
   726 $M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function
   727 $Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the
   728 tree.\newline On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true,
   729 since\\$0=|\Gamma_{large}(v_2)\cap
   730 \tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap
   731 \tilde{T}_{small}(s)|=1$ is true, but still the tree can not be
   732 pruned, because otherwise the
   733 $\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found.
   734 \end{proof}
   735 
   736 \section{The VF2++ Algorithm}
   737 Although any total ordering relation makes the search space of VF2 a
   738 tree, its choice turns out to dramatically influence the number of
   739 visited states. The goal is to determine an efficient one as quickly
   740 as possible.
   741 
   742 The main reason for VF2++' superiority over VF2 is twofold. Firstly,
   743 taking into account the structure and the node labeling of the graph,
   744 VF2++ determines a state order in which most of the unfruitful
   745 branches of the search space can be pruned immediately. Secondly,
   746 introducing more efficient --- nevertheless still easier to compute
   747 --- cutting rules reduces the chance of going astray even further.
   748 
   749 In addition to the usual subgraph isomorphism, specialized versions
   750 for induced subgraph isomorphism and for graph isomorphism have been
   751 designed. VF2++ has gained a runtime improvement of one order of
   752 magnitude respecting induced subgraph isomorphism and a better
   753 asymptotical behaviour in the case of graph isomorphism problem.
   754 
   755 Note that a weaker version of the cutting rules and the more efficient
   756 candidate set calculating were described in \cite{VF2Plus}, too.
   757 
   758 It should be noted that all the methods described in this section are
   759 extendable to handle directed graphs and edge labels as well.
   760 
   761 The basic ideas and the detailed description of VF2++ are provided in
   762 the following.
   763 
   764 \subsection{Preparations}
   765 \begin{claim}
   766 \label{claim:claimCoverFromLeft}
   767 The total ordering relation uniquely determines a node order, in which
   768 the nodes of $V_{small}$ will be covered by VF2. From the point of
   769 view of the matching procedure, this means, that always the same node
   770 of $G_{small}$ will be covered on the d-th level.
   771 \end{claim}
   772 \begin{proof}
   773 In order to make the search space a tree, the pairs in $\{(u,v)\in
   774 P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$.
   775 \newline
   776 Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} :
   777 \hat{u}\prec u\}$
   778 \newline
   779 The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} :
   780 \forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form
   781 $\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious,
   782 that only $\tilde{u}$ can be covered in $V_{small}$. Actually,
   783 $\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in
   784 $V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and
   785 $T_{small}(s)$ depends only on the covered nodes of $G_{small}$.
   786 \newline
   787 Simple induction on $d$ shows that the set of covered nodes of
   788 $G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if
   789 $d$ is given.
   790 \end{proof}
   791 
   792 \begin{definition}
   793 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
   794 $V_{small}$ is \textbf{matching order}, if exists $\prec$ total
   795 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
   796 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$.
   797 \end{definition}
   798 
   799 \begin{claim}\label{claim:MOclaim}
   800 A total ordering is matching order, iff the nodes of every component
   801 form an interval in the node sequence, and every node connects to a
   802 previous node in its component except the first node of the
   803 component. The order of the components is arbitrary.  \\Formally
   804 spoken, an order
   805 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
   806 $V_{small}$ is matching order $\Leftrightarrow$ $\forall
   807 G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall
   808 i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
   809 V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq
   810 l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge
   811 (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in
   812 \{1,..,|V_{small}|\}$\newline
   813 \end{claim}
   814 \begin{proof}
   815 Suppose a matching order is given. It has to be shown that the node
   816 sequence has a structure described above.\\ Let
   817 $G'_{small}=(V'_{small},E'_{small})$ be an arbitrary component and $i$
   818 an arbitrary index.
   819 \newline
   820 $(\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
   821 V'_{small})\Rightarrow u_{\sigma(i)}$ is not the first covered node in
   822 $G_{small}\ \Rightarrow u_{\sigma(i)}$ is connected to a covered node
   823 $u_{\sigma(k)}$ where $k<i$, since $u_{\sigma(i)}\in T_{small}(s)$ for
   824 some $s$ and $T_{small}(s)\subseteq{V'_{small}}$ contains only nodes
   825 connected to at least one covered node. It's easy to see, that
   826 $\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}$, since
   827 $T_{small}(s)$ contains only nodes connected to some covered ones,
   828 while it is not empty, but if it were empty, then $u_{\sigma(k)}$ and
   829 $u_{\sigma(i)}$ were not in the same component.
   830 
   831 Now, let us show that if a node sequence has the special structure
   832 described above, it has to be matching
   833 order.\\ $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ is
   834 a total ordering, which determines the matching order
   835 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$.
   836 \end{proof}
   837 
   838 To summing up, a total ordering always uniquely determines a matching
   839 order, and every matching order can be determined by a total ordering,
   840 however, more than one different total orderings may determine the
   841 same matching order.
   842 \subsection{Idea behind the algorithm}
   843 The goal is to find a matching order in which the algorithm is able to
   844 recognize inconsistency or prune the infeasible branches on the
   845 highest levels and goes deep only if it is needed.
   846 
   847 \begin{notation}
   848 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the
   849 number of neighbours of u which are in H, where $u\in V_{small} $ and
   850 $H\subseteq V_{small}$.
   851 \end{notation}
   852 
   853 The principal question is the following. Suppose a state $s$ is
   854 given. For which node of $T_{small}(s)$ is the hardest to find a
   855 consistent pair in $G_{large}$? The more covered neighbours a node in
   856 $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has
   857 ---, the more rarely satisfiable consistency constraints for its pair
   858 are given.
   859 
   860 In biology, most of the graphs are sparse, thus several nodes in
   861 $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes
   862 reasonable to define a secondary and a tertiary order between them.
   863 The observation above proves itself to be as determining, that the
   864 secondary ordering prefers nodes with the most uncovered neighbours
   865 among which have the same $Conn_{M_{small}(s)}$ to increase
   866 $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible.  The
   867 tertiary ordering prefers nodes having the rarest uncovered labels.
   868 
   869 Note that the secondary ordering is the same as the ordering by $deg$,
   870 which is a static data in front of the above used.
   871 
   872 These rules can easily result in a matching order which contains the
   873 nodes of a long path successively, whose nodes may have low $Conn$ and
   874 is easily matchable into $G_{large}$. To avoid that, a BFS order is
   875 used, which provides the shortest possible paths.
   876 \newline
   877 
   878 In the following, some examples on which the VF2 may be slow are
   879 described, although they are easily solvable by using a proper
   880 matching order.
   881 
   882 \begin{example}
   883 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways
   884 without node labels. Let $u\in V_{small}$ and $v\in V_{large}$.
   885 \newline
   886 $lab(u):=black$
   887 \newline
   888 $lab(v):=black$
   889 \newline
   890 $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash
   891 \{u\})$
   892 \newline
   893 $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash
   894 \{v\})$
   895 \newline
   896 
   897 Now, any mapping by the node label $lab$ must contain $(u,v)$, since
   898 $u$ is black and no node in $V_{large}$ has a black label except
   899 $v$. If unfortunately $u$ were the last node which will get covered,
   900 VF2 would check only in the last steps, whether $u$ can be matched to
   901 $v$.
   902 \newline
   903 However, had $u$ been the first matched node, u would have been
   904 matched immediately to v, so all the mappings would have been
   905 precluded in which node labels can not correspond.
   906 \end{example}
   907 
   908 \begin{example}
   909 Suppose there is no node label given, $G_{small}$ is a small graph and
   910 can not be mapped into $G_{large}$ and $u\in V_{small}$.
   911 \newline
   912 Let $G'_{small}:=(V_{small}\cup
   913 \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup
   914 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
   915 $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint
   916 from $G_{small}$ and one of its starting points is connected to $u\in
   917 V_{small}\}$.
   918 \newline
   919 Is there a subgraph of $G_{large}$, which is isomorph with
   920 $G'_{small}$?
   921 \newline
   922 If unfortunately the nodes of the path were the first $k$ nodes in the
   923 matching order, the algorithm would iterate through all the possible k
   924 long paths in $G_{large}$, and it would recognize that no path can be
   925 extended to $G'_{small}$.
   926 \newline
   927 However, had it started by the matching of $G_{small}$, it would not
   928 have matched any nodes of the path.
   929 \end{example}
   930 
   931 These examples may look artificial, but the same problems also appear
   932 in real-world instances, even though in a less obvious way.
   933 
   934 \subsection{Total ordering}
   935 Instead of the total ordering relation, the matching order will be
   936 searched directly.
   937 \begin{notation}
   938 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} :
   939 l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ ,
   940 where $l$ is a label and $\mathcal{M}\subseteq V_{small}$.
   941 \end{notation}
   942 
   943 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
   944 \end{definition}
   945 
   946 \begin{algorithm}
   947 \algtext*{EndIf}
   948 \algtext*{EndProcedure}
   949 \algtext*{EndWhile}
   950 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
   951 \begin{algorithmic}[1]
   952 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$
   953 \Comment{matching order} \While{$V_{small}\backslash \mathcal{M}
   954   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
   955 min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash
   956 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
   957 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
   958 $d$-th level \State Process $V_d$ \Comment{See Algorithm
   959   \ref{alg:VF2PPProcess1}} \EndFor
   960 \EndWhile \EndProcedure
   961 \end{algorithmic}
   962 \end{algorithm}
   963 
   964 \begin{algorithm}
   965 \algtext*{EndIf}
   966 \algtext*{EndProcedure}%ne nyomtasson ..
   967 \algtext*{EndWhile}
   968 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
   969 \begin{algorithmic}[1]
   970 \Procedure{VF2++ProcessLevel1}{$V_{d}$} \While{$V_d\neq\emptyset$}
   971 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg
   972 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
   973 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
   974 $F_\mathcal{M}$ \EndWhile \EndProcedure
   975 \end{algorithmic}
   976 \end{algorithm}
   977 
   978 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
   979 matching order procedure of VF2++. It computes a BFS tree for each
   980 component in ascending order of their rarest $lab$ and largest $deg$,
   981 whose root vertex is the component's minimal
   982 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending
   983 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately
   984 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.
   985 
   986 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}
   987 provides a matching order.
   988 
   989 
   990 \subsection{Cutting rules}
   991 \label{VF2PPCuttingRules}
   992 This section presents the cutting rules of VF2++, which are improved
   993 by using extra information coming from the node labels.
   994 \begin{notation}
   995 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l
   996 \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and
   997 $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge
   998 \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in
   999 V_{large}$ and $l$ is a label.
  1000 \end{notation}
  1001 
  1002 \subsubsection{Induced subgraph isomorphism}
  1003 \begin{claim}
  1004 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND.
  1005 \end{claim}
  1006 \begin{proof}
  1007 It has to be shown, that $LabCut_{IND}((u,v),M(s))=true\Rightarrow$
  1008 the mapping can not be extended to a whole
  1009 mapping.\\ $LabCut_{IND}((u,v),M(s))=true,$ iff the following
  1010 holds. $\\\exists l: |\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| <
  1011 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)| \vee
  1012 |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| <
  1013 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$.
  1014 
  1015 Suppose that $|\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| <
  1016 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)|$. Each node of
  1017 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$ has to be matched to a
  1018 node in $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$, so
  1019 $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$ can not be smaller than
  1020 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$. That is why $M(s)$ can
  1021 not be extended to a whole mapping.
  1022 
  1023 Otherwise $|\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| <
  1024 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$ has to be
  1025 true. Similarly, each node of $\Gamma_{large}^{l}(v)\cap
  1026 \tilde{T}_{large}(s)$ has to be matched to a node in
  1027 $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$,
  1028 i.e. $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ can not be
  1029 smaller than $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, if
  1030 $M(s)$ is extendible.
  1031 \end{proof}
  1032 The following claims can be proven similarly.
  1033 \subsubsection{Graph isomorphism}
  1034 \begin{claim}
  1035 \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO.
  1036 \end{claim}
  1037 
  1038 \subsubsection{Subgraph isomorphism}
  1039 \begin{claim}
  1040 \[LabCut_{SUB}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\] is a cutting function by SUB.
  1041 \end{claim}
  1042 
  1043 
  1044 
  1045 \subsection{Implementation details}
  1046 This section provides a detailed summary of an efficient
  1047 implementation of VF2++.
  1048 \subsubsection{Storing a mapping}
  1049 After fixing an arbitrary node order ($u_0, u_1, ..,
  1050 u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store
  1051 the current mapping in the following way.
  1052 \[
  1053  M[i] =
  1054   \begin{cases} 
  1055    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INVALID &
  1056    if\ no\ node\ has\ been\ mapped\ to\ u_i.
  1057   \end{cases}
  1058 \]
  1059 Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$
  1060 means "no node".
  1061 \subsubsection{Avoiding the recurrence}
  1062 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
  1063 as a \textit{while loop}, which has a loop counter $depth$ denoting the
  1064 all-time depth of the recursion. Fixing a matching order, let $M$
  1065 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},
  1066 $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before
  1067 $depth$. $M[depth]$ changes
  1068 while the state is being processed, but the property is held before
  1069 both stepping back to a predecessor state and exploring a successor
  1070 state.
  1071 
  1072 The necessary part of the candidate set is easily maintainable or
  1073 computable by following
  1074 Section~\ref{candidateComputingVF2}. A much faster method
  1075 has been designed for biological- and sparse graphs, see the next
  1076 section for details.
  1077 
  1078 \subsubsection{Calculating the candidates for a node}
  1079 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
  1080 task is not to maintain the candidate set, but to generate the
  1081 candidate nodes in $G_{large}$ for a given node $u\in V_{small}$.  In
  1082 case of an expanding problem type and $M$ mapping, if a node $v\in
  1083 V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall
  1084 u'\in V_{small} : (u,u')\in
  1085 E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in
  1086 E_{large}$. That is, each covered neighbour of $u$ has to be mapped to
  1087 a covered neighbour of $v$.
  1088 
  1089 Having said that, an algorithm running in $\Theta(deg)$ time is
  1090 describable if there exists a covered node in the component containing
  1091 $u$, and a linear one other wise.
  1092 
  1093 
  1094 \subsubsection{Determining the node order}
  1095 This section describes how the node order preprocessing method of
  1096 VF2++ can efficiently be implemented.
  1097 
  1098 For using lookup tables, the node labels are associated with the
  1099 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
  1100 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order
  1101 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
  1102 in $V_{small}$ having label i, which is easy to compute in
  1103 $\Theta(|V_{small}|)$ steps.
  1104 
  1105 Representing $\mathcal{M}\subseteq V_{small}$ as an array of
  1106 size $|V_{small}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.
  1107 
  1108 After a node $u$ gets to the next place of the node order,
  1109 $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is
  1110 one less covered node in $V_{large}$ with label $lab(u)$, that is why
  1111 min selection sort is preferred which gives the elements from left to
  1112 right in descending order, see Algorithm~\ref{alg:VF2PPProcess1}.
  1113 
  1114 \subsubsection{Cutting rules}
  1115 In Section~\ref{VF2PPCuttingRules}, the cutting rules were
  1116 described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$
  1117 and $\tilde T_{large}$, which are dependent on the all-time mapping
  1118 (i.e. on the all-time state). The aim is to check the labeled cutting
  1119 rules of VF2++ in $\Theta(deg)$ time.
  1120 
  1121 Firstly, suppose that these four sets are given in such a way, that
  1122 checking whether a node is in a certain set takes constant time,
  1123 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
  1124 initially zero integer lookup table of size $|K|$. After incrementing
  1125 $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and
  1126 decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap
  1127 T_{large}(s)$, the first part of the cutting rules is checkable in
  1128 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
  1129 to zero takes $\Theta(deg)$ time again, which makes it possible to use
  1130 the same table through the whole algorithm. The second part of the
  1131 cutting rules can be verified using the same method with $\tilde
  1132 T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and
  1133 $T_{large}$. Thus, the overall complexity is $\Theta(deg)$.
  1134 
  1135 An other integer lookup table storing the number of covered neighbours
  1136 of each node in $G_{large}$ gives all the information about the sets
  1137 $T_{large}$ and $\tilde T_{large}$, which is maintainable in
  1138 $\Theta(deg)$ time when a pair is added or substracted by incrementing
  1139 or decrementing the proper indices. A further improvement is that the
  1140 values of $L[lab(u')]$ in case of checking $u$ is dependent only on
  1141 $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an
  1142 array of pairs (label, number of such labels) can be stored to skip
  1143 the maintaining operations. Note that these arrays are at most of size
  1144 $deg$. Skipping this trick, the number of covered neighbours has to be
  1145 stored for each node of $G_{small}$ as well to get the sets
  1146 $T_{small}$ and $\tilde T_{small}$.
  1147 
  1148 Using similar tricks, the consistency function can be evaluated in
  1149 $\Theta(deg)$ steps, as well.
  1150 
  1151 \section{The VF2 Plus Algorithm}
  1152 The VF2 Plus algorithm is a recently improved version of VF2. It was
  1153 compared with the state of the art algorithms in \cite{VF2Plus} and
  1154 has proven itself to be competitive with RI, the best algorithm on
  1155 biological graphs.  \\ A short summary of VF2 Plus follows, which uses
  1156 the notation and the conventions of the original paper.
  1157 
  1158 \subsection{Ordering procedure}
  1159 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$
  1160 with the lowest probability to find a pair in $V_{small}$ and the
  1161 highest number of connections with the nodes already sorted by the
  1162 algorithm.
  1163 
  1164 \begin{definition}
  1165 $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and
  1166   $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$.
  1167 \end{definition}
  1168 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in
  1169 $V_{large}$
  1170 \newline
  1171 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in
  1172 $V_{large}$
  1173 \newline
  1174 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of
  1175 already sorted nodes, $T$ is the set of nodes candidate to be
  1176 selected, and $degreeM$ of a node is the number of its neighbours in
  1177 $M$.
  1178 \begin{algorithm}
  1179 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
  1180 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
  1181 \algtext*{EndWhile}
  1182 \caption{}\label{alg:VF2PlusPseu}
  1183 \begin{algorithmic}[1]
  1184 \Procedure{VF2 Plus order}{} \State Select the node with the lowest
  1185 $P$.  \If {more nodes share the same $P$} \State select the one with
  1186 maximum degree \EndIf \If {more nodes share the same $P$ and have the
  1187   max degree} \State select the first \EndIf \State Put the selected
  1188 node in the set $M$. \label{alg:putIn} \State Put all its unsorted
  1189 neighbours in the set $T$.  \If {$M\neq V_{small}$} \State From set
  1190 $T$ select the node with maximum $degreeM$.  \If {more nodes have
  1191   maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf
  1192 \If {more nodes have maximum $degreeM$ and $P$} \State Select the
  1193 first.  \EndIf \State \textbf{goto \ref{alg:putIn}.}  \EndIf
  1194 \EndProcedure
  1195 \end{algorithmic}
  1196 \end{algorithm}
  1197 
  1198 Using these notations, Algorithm~\ref{alg:VF2PlusPseu}
  1199 provides the description of the sorting procedure.
  1200 
  1201 Note that $P(u)$ is not the exact probability of finding a consistent
  1202 pair for $u$ by choosing a node of $V_{large}$ randomly, since
  1203 $P_{lab}$ and $P_{deg}$ are not independent, though calculating the
  1204 real probability would take quadratic time, which may be reduced by
  1205 using fittingly lookup tables.
  1206 
  1207 \section{Experimental results}
  1208 This section compares the performance of VF2++ and VF2 Plus. Both
  1209 algorithms have run faster with orders of magnitude than VF2, thus its
  1210 inclusion was not reasonable.
  1211 \subsection{Biological graphs}
  1212 The tests have been executed on a recent biological dataset created
  1213 for the International Contest on Pattern Search in Biological
  1214 Databases\cite{Content}, which has been constructed of molecule,
  1215 protein and contact map graphs extracted from the Protein Data
  1216 Bank\cite{ProteinDataBank}.
  1217 
  1218 The molecule dataset contains small graphs with less than 100 nodes
  1219 and an average degree of less than 3. The protein dataset contains
  1220 graphs having 500-10 000 nodes and an average degree of 4, while the
  1221 contact map dataset contains graphs with 150-800 nodes and an average
  1222 degree of 20.  \\
  1223 
  1224 In the following, the induced subgraph isomorphism and the graph
  1225 isomorphism will be examined.
  1226 
  1227 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For run time results, please see Figure~\ref{fig:bioIND}.
  1228 
  1229 In an other experiment, the nodes of each graph in the database had been
  1230 shuffled, and an isomorphism between the shuffled and the original
  1231 graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}.
  1232 
  1233 
  1234 
  1235 \begin{figure}[H]
  1236 \vspace*{-2cm}
  1237 \hspace*{-1.5cm}
  1238 \begin{subfigure}[b]{0.55\textwidth}
  1239 \begin{figure}[H]
  1240 \begin{tikzpicture}[trim axis left, trim axis right]
  1241 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1242 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1243   west},scaled x ticks = false,x tick label style={/pgf/number
  1244   format/1000 sep = \thinspace}]
  1245 %\addplot+[only marks] table {proteinsOrig.txt};
  1246 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark
  1247   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
  1248 \end{axis}
  1249 \end{tikzpicture}
  1250 \caption{In the case of molecules, there is not such a significant
  1251   difference, but VF2++ seems to be faster as the number of nodes
  1252   increases.}\label{fig:ISOMolecule}
  1253 \end{figure}
  1254 \end{subfigure}
  1255 \hspace*{1.5cm}
  1256 \begin{subfigure}[b]{0.55\textwidth}
  1257 \begin{figure}[H]
  1258 \begin{tikzpicture}[trim axis left, trim axis right]
  1259 \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1260 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1261   west},scaled x ticks = false,x tick label style={/pgf/number
  1262   format/1000 sep = \thinspace}]
  1263 %\addplot+[only marks] table {proteinsOrig.txt};
  1264 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark
  1265   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
  1266 \end{axis}
  1267 \end{tikzpicture}
  1268 \caption{The results are closer to each other on contact maps, but
  1269   VF2++ still performs consistently better.}\label{fig:ISOContact}
  1270 \end{figure}
  1271 \end{subfigure}
  1272 
  1273 \begin{center}
  1274 \vspace*{-0.5cm}
  1275 \begin{subfigure}[b]{0.55\textwidth}
  1276 \begin{figure}[H]
  1277 \begin{tikzpicture}[trim axis left, trim axis right]
  1278 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1279 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1280   west},scaled x ticks = false,x tick label style={/pgf/number
  1281   format/1000 sep = \thinspace}]
  1282 %\addplot+[only marks] table {proteinsOrig.txt};
  1283 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark
  1284   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
  1285 \end{axis}
  1286 \end{tikzpicture}
  1287 \caption{On protein graphs, VF2 Plus has a super linear time
  1288   complexity, while VF2++ runs in near constant time. The difference
  1289   is about two order of magnitude on large graphs.}\label{fig:ISOProt}
  1290 \end{figure}
  1291 \end{subfigure}
  1292 \end{center}
  1293 \vspace*{-0.6cm}
  1294 \caption{\normalsize{Graph isomomorphism on biological graphs}}\label{fig:bioISO}
  1295 \end{figure}
  1296 
  1297 
  1298 \begin{figure}[H]
  1299 \vspace*{-2cm}
  1300 \hspace*{-1.5cm}
  1301 \begin{subfigure}[b]{0.55\textwidth}
  1302 \begin{figure}[H]
  1303 \begin{tikzpicture}[trim axis left, trim axis right]
  1304 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1305 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1306   west},scaled x ticks = false,x tick label style={/pgf/number
  1307   format/1000 sep = \thinspace}]
  1308 %\addplot+[only marks] table {proteinsOrig.txt};
  1309 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark
  1310   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
  1311 \end{axis}
  1312 \end{tikzpicture}
  1313 \caption{In the case of molecules, the algorithms have
  1314   similar behaviour, but VF2++ is almost two times faster even on such
  1315   small graphs.} \label{fig:INDMolecule}
  1316 \end{figure}
  1317 \end{subfigure}
  1318 \hspace*{1.5cm}
  1319 \begin{subfigure}[b]{0.55\textwidth}
  1320 \begin{figure}[H]
  1321 \begin{tikzpicture}[trim axis left, trim axis right]
  1322 \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1323 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1324   west},scaled x ticks = false,x tick label style={/pgf/number
  1325   format/1000 sep = \thinspace}]
  1326 %\addplot+[only marks] table {proteinsOrig.txt};
  1327 \addplot table {Orig/ContactMaps.128.txt};
  1328 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1329         {VF2PPLabel/ContactMaps.128.txt};
  1330 \end{axis}
  1331 \end{tikzpicture}
  1332 \caption{On contact maps, VF2++ runs in near constant time, while VF2
  1333   Plus has a near linear behaviour.} \label{fig:INDContact}
  1334 \end{figure}
  1335 \end{subfigure}
  1336 
  1337 \begin{center}
  1338 \vspace*{-0.5cm}
  1339 \begin{subfigure}[b]{0.55\textwidth}
  1340 \begin{figure}[H]
  1341 \begin{tikzpicture}[trim axis left, trim axis right]
  1342   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1343   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1344     west},scaled x ticks = false,x tick label style={/pgf/number
  1345     format/1000 sep = \thinspace}] %\addplot+[only marks] table
  1346     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]
  1347     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark
  1348       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
  1349   \end{axis}
  1350   \end{tikzpicture}
  1351 \caption{Both the algorithms have linear behaviour on protein
  1352   graphs. VF2++ is more than 10 times faster than VF2
  1353   Plus.} \label{fig:INDProt}
  1354 \end{figure}
  1355 \end{subfigure}
  1356 \end{center}
  1357 \vspace*{-0.5cm}
  1358 \caption{\normalsize{Graph isomomorphism on biological graphs}}\label{fig:bioIND}
  1359 \end{figure}
  1360 
  1361 
  1362 
  1363 
  1364 
  1365 \subsection{Random graphs}
  1366 This section compares VF2++ with VF2 Plus on random graphs of a large
  1367 size. The node labels are uniformly distributed.  Let $\delta$ denote
  1368 the average degree.  For the parameters of problems solved in the
  1369 experiments, please see the top of each chart.
  1370 \subsubsection{Graph isomorphism}
  1371 To evaluate the efficiency of the algorithms in the case of graph
  1372 isomorphism, connected graphs of less than 20 000 nodes have been
  1373 considered. Generating a random graph and shuffling its nodes, an
  1374 isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results
  1375 on graph sets of various density.
  1376 
  1377 
  1378 
  1379 
  1380 \begin{figure}[H]
  1381 \vspace*{-1.5cm}
  1382 \hspace*{-1.5cm}
  1383 \begin{subfigure}[b]{0.55\textwidth}
  1384 \begin{center}
  1385 \begin{tikzpicture}
  1386 \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1387 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1388   west},scaled x ticks = false,x tick label style={/pgf/number
  1389   format/1000 sep = \space}]
  1390 %\addplot+[only marks] table {proteinsOrig.txt};
  1391 \addplot table {randGraph/iso/vf2pIso5_1.txt};
  1392 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1393         {randGraph/iso/vf2ppIso5_1.txt};
  1394 \end{axis}
  1395 \end{tikzpicture}
  1396 \end{center}
  1397 \end{subfigure}
  1398 %\hspace{1cm}
  1399 \begin{subfigure}[b]{0.55\textwidth}
  1400 \begin{center}
  1401 \begin{tikzpicture}
  1402 \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1403 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1404   west},scaled x ticks = false,x tick label style={/pgf/number
  1405   format/1000 sep = \space}]
  1406 %\addplot+[only marks] table {proteinsOrig.txt};
  1407 \addplot table {randGraph/iso/vf2pIso10_1.txt};
  1408 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1409         {randGraph/iso/vf2ppIso10_1.txt};
  1410 \end{axis}
  1411 \end{tikzpicture}
  1412 \end{center}
  1413 \end{subfigure}
  1414 %%\hspace{1cm}
  1415 \hspace*{-1.5cm}
  1416 \begin{subfigure}[b]{0.55\textwidth}
  1417 \begin{center}
  1418 \begin{tikzpicture}
  1419 \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1420 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1421   west},scaled x ticks = false,x tick label style={/pgf/number
  1422   format/1000 sep = \space}]
  1423 %\addplot+[only marks] table {proteinsOrig.txt};
  1424 \addplot table {randGraph/iso/vf2pIso15_1.txt};
  1425 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1426         {randGraph/iso/vf2ppIso15_1.txt};
  1427 \end{axis}
  1428 \end{tikzpicture}
  1429 \end{center}
  1430      \end{subfigure}
  1431      \begin{subfigure}[b]{0.55\textwidth}
  1432 \begin{center}
  1433 \begin{tikzpicture}
  1434 \begin{axis}[title={Random ISO, $\delta = 35$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1435 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1436   west},scaled x ticks = false,x tick label style={/pgf/number
  1437   format/1000 sep = \space}]
  1438 %\addplot+[only marks] table {proteinsOrig.txt};
  1439 \addplot table {randGraph/iso/vf2pIso35_1.txt};
  1440 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1441         {randGraph/iso/vf2ppIso35_1.txt};
  1442 \end{axis}
  1443 \end{tikzpicture}
  1444 \end{center}
  1445 \end{subfigure}
  1446 \begin{subfigure}[b]{0.55\textwidth}
  1447 \hspace*{-1.5cm}
  1448 \begin{tikzpicture}
  1449 \begin{axis}[title={Random ISO, $\delta = 45$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1450 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1451   west},scaled x ticks = false,x tick label style={/pgf/number
  1452   format/1000 sep = \space}]
  1453 %\addplot+[only marks] table {proteinsOrig.txt};
  1454 \addplot table {randGraph/iso/vf2pIso45_1.txt};
  1455 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1456         {randGraph/iso/vf2ppIso45_1.txt};
  1457 \end{axis}
  1458 \end{tikzpicture}
  1459 \end{subfigure}
  1460 \hspace*{-1.5cm}
  1461 \begin{subfigure}[b]{0.55\textwidth}
  1462 \begin{tikzpicture}
  1463 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1464 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1465   west},scaled x ticks = false,x tick label style={/pgf/number
  1466   format/1000 sep = \thinspace}]
  1467 %\addplot+[only marks] table {proteinsOrig.txt};
  1468 \addplot table {randGraph/iso/vf2pIso100_1.txt};
  1469 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1470         {randGraph/iso/vf2ppIso100_1.txt};
  1471 \end{axis}
  1472 \end{tikzpicture}
  1473 \end{subfigure}
  1474 \vspace*{-0.8cm}
  1475 \caption{IND on graphs having an average degree of
  1476   5.}\label{fig:randISO}
  1477 \end{figure}
  1478 
  1479 
  1480 
  1481 
  1482 
  1483 
  1484 
  1485 
  1486 
  1487 
  1488 Considering the graph isomorphism problem, VF2++ consistently
  1489 outperforms its rival especially on sparse graphs. The reason for the
  1490 slightly super linear behaviour of VF2++ on denser graphs is the
  1491 larger number of nodes in the BFS tree constructed in
  1492 Algorithm~\ref{alg:VF2PPPseu}.
  1493 
  1494 \subsubsection{Induced subgraph isomorphism}
  1495 This section provides a comparison of VF2++ and VF2 Plus in the case
  1496 of induced subgraph isomorphism. In addition to the size of the large
  1497 graph, that of the small graph dramatically influences the hardness of
  1498 a given problem too, so the overall picture is provided by examining
  1499 small graphs of various size.
  1500 
  1501 For each chart, a number $0<\rho< 1$ has been fixed and the following
  1502 has been executed 150 times. Generating a large graph $G_{large}$,
  1503 choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes,
  1504 and for all the 10 subgraphs find a mapping by using both the graph
  1505 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
  1506 0.3, 0.6, 0.8, 0.95$ cases have been examined (see
  1507 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
  1508 \ref{fig:randIND35}, and for each $\delta$, a cumulative chart is
  1509 given as well, which excludes $\rho = 0.05$ and $0.1$ for the sake of
  1510 perspicuity (see Figure~\ref{fig:randIND5Sum}, \ref{fig:randIND10Sum}
  1511 and \ref{fig:randIND35Sum}).
  1512 
  1513 
  1514 
  1515 
  1516 
  1517 \begin{figure}[H]
  1518 \vspace*{-1.5cm}
  1519 \hspace*{-1.5cm}
  1520 \begin{subfigure}[b]{0.55\textwidth}
  1521 \begin{center}
  1522 \begin{tikzpicture}
  1523 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1524 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1525   west},scaled x ticks = false,x tick label style={/pgf/number
  1526   format/1000 sep = \space}]
  1527 %\addplot+[only marks] table {proteinsOrig.txt};
  1528 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
  1529 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1530         {randGraph/ind/vf2ppInd5_0.05.txt};
  1531 \end{axis}
  1532 \end{tikzpicture}
  1533 \end{center}
  1534      \end{subfigure}
  1535      \begin{subfigure}[b]{0.55\textwidth}
  1536 \begin{center}
  1537 \begin{tikzpicture}
  1538 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1539 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1540   west},scaled x ticks = false,x tick label style={/pgf/number
  1541   format/1000 sep = \space}]
  1542 %\addplot+[only marks] table {proteinsOrig.txt};
  1543 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
  1544 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1545         {randGraph/ind/vf2ppInd5_0.1.txt};
  1546 \end{axis}
  1547 \end{tikzpicture}
  1548 \end{center}
  1549 \end{subfigure}
  1550 \hspace*{-1.5cm}
  1551 \begin{subfigure}[b]{0.55\textwidth}
  1552 \begin{center}
  1553 \begin{tikzpicture}
  1554 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1555 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1556   west},scaled x ticks = false,x tick label style={/pgf/number
  1557   format/1000 sep = \space}]
  1558 %\addplot+[only marks] table {proteinsOrig.txt};
  1559 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
  1560 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1561         {randGraph/ind/vf2ppInd5_0.3.txt};
  1562 \end{axis}
  1563 \end{tikzpicture}
  1564 \end{center}
  1565      \end{subfigure}
  1566      \begin{subfigure}[b]{0.55\textwidth}
  1567 \begin{center}
  1568 \begin{tikzpicture}
  1569 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1570 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1571   west},scaled x ticks = false,x tick label style={/pgf/number
  1572   format/1000 sep = \space}]
  1573 %\addplot+[only marks] table {proteinsOrig.txt};
  1574 \addplot table {randGraph/ind/vf2pInd5_0.6.txt};
  1575 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1576         {randGraph/ind/vf2ppInd5_0.6.txt};
  1577 \end{axis}
  1578 \end{tikzpicture}
  1579 \end{center}
  1580 \end{subfigure}
  1581 \begin{subfigure}[b]{0.55\textwidth}
  1582 \hspace*{-1.5cm}
  1583 \begin{tikzpicture}
  1584 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1585 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1586   west},scaled x ticks = false,x tick label style={/pgf/number
  1587   format/1000 sep = \space}]
  1588 %\addplot+[only marks] table {proteinsOrig.txt};
  1589 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
  1590 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1591         {randGraph/ind/vf2ppInd5_0.8.txt};
  1592 \end{axis}
  1593 \end{tikzpicture}
  1594      \end{subfigure}
  1595      \hspace*{-1.5cm}
  1596      \begin{subfigure}[b]{0.55\textwidth}
  1597 \begin{tikzpicture}
  1598 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1599 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1600   west},scaled x ticks = false,x tick label style={/pgf/number
  1601   format/1000 sep = \thinspace}]
  1602 %\addplot+[only marks] table {proteinsOrig.txt};
  1603 \addplot table {randGraph/ind/vf2pInd5_0.95.txt};
  1604 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1605         {randGraph/ind/vf2ppInd5_0.95.txt};
  1606 \end{axis}
  1607 \end{tikzpicture}
  1608 \end{subfigure}
  1609 \vspace*{-0.8cm}
  1610 \caption{IND on graphs having an average degree of
  1611   5.}\label{fig:randIND5}
  1612 \end{figure}
  1613 
  1614 \begin{figure}[H]
  1615 \begin{center}
  1616 \hspace*{-2cm}
  1617 \begin{tikzpicture}
  1618 \begin{axis}[title={Rand IND Summary, $\delta = 5$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1619 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
  1620   west},scaled x ticks = false,x tick label style={/pgf/number
  1621   format/1000 sep = \thinspace}]
  1622 %\addplot+[only marks] table {proteinsOrig.txt};
  1623 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1624         {randGraph/ind/vf2pInd5_0.3.txt}; \addplot[mark=triangle*,mark
  1625           size=1.8pt,color=red] table
  1626         {randGraph/ind/vf2ppInd5_0.3.txt}; \addplot[mark=*,mark
  1627           size=1.5pt,color=blue] table
  1628         {randGraph/ind/vf2pInd5_0.6.txt}; \addplot[mark=triangle*,mark
  1629           size=1.8pt,color=red] table
  1630         {randGraph/ind/vf2ppInd5_0.6.txt}; \addplot[mark=*,mark
  1631           size=1.5pt,color=blue] table
  1632         {randGraph/ind/vf2pInd5_0.8.txt}; \addplot[mark=triangle*,mark
  1633           size=1.8pt,color=red] table
  1634         {randGraph/ind/vf2ppInd5_0.8.txt}; \addplot[mark=*,mark
  1635           size=1.5pt,color=blue] table
  1636         {randGraph/ind/vf2pInd5_0.95.txt};
  1637         \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1638                 {randGraph/ind/vf2ppInd5_0.95.txt};
  1639 \end{axis}
  1640 \end{tikzpicture}
  1641 \end{center}
  1642 \vspace*{-0.8cm}
  1643 \caption{Cummulative chart for $\delta=5$.}\label{fig:randIND5Sum}
  1644 \end{figure}
  1645 
  1646 
  1647 
  1648 \begin{figure}[H]
  1649 \vspace*{-1.5cm}
  1650 \hspace*{-1.5cm}
  1651 \begin{subfigure}[b]{0.55\textwidth}
  1652 \begin{center}
  1653 \hspace*{-0.5cm}
  1654 \begin{tikzpicture}
  1655 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1656 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1657   west},scaled x ticks = false,x tick label style={/pgf/number
  1658   format/1000 sep = \space}]
  1659 %\addplot+[only marks] table {proteinsOrig.txt};
  1660 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
  1661 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1662         {randGraph/ind/vf2ppInd10_0.05.txt};
  1663 \end{axis}
  1664 \end{tikzpicture}
  1665 \end{center}
  1666      \end{subfigure}
  1667      \begin{subfigure}[b]{0.55\textwidth}
  1668 \begin{center}
  1669      \hspace*{-0.5cm}
  1670 \begin{tikzpicture}
  1671 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1672 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1673   west},scaled x ticks = false,x tick label style={/pgf/number
  1674   format/1000 sep = \space}]
  1675 %\addplot+[only marks] table {proteinsOrig.txt};
  1676 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
  1677 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1678         {randGraph/ind/vf2ppInd10_0.1.txt};
  1679 \end{axis}
  1680 \end{tikzpicture}
  1681 \end{center}
  1682 \end{subfigure}
  1683 \hspace*{-1.5cm}
  1684 \begin{subfigure}[b]{0.55\textwidth}
  1685 \begin{center}
  1686 \begin{tikzpicture}
  1687 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1688 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1689   west},scaled x ticks = false,x tick label style={/pgf/number
  1690   format/1000 sep = \space}]
  1691 %\addplot+[only marks] table {proteinsOrig.txt};
  1692 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
  1693 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1694         {randGraph/ind/vf2ppInd10_0.3.txt};
  1695 \end{axis}
  1696 \end{tikzpicture}
  1697 \end{center}
  1698      \end{subfigure}
  1699      \begin{subfigure}[b]{0.55\textwidth}
  1700 \begin{center}
  1701 \begin{tikzpicture}
  1702 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1703 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1704   west},scaled x ticks = false,x tick label style={/pgf/number
  1705   format/1000 sep = \space}]
  1706 %\addplot+[only marks] table {proteinsOrig.txt};
  1707 \addplot table {randGraph/ind/vf2pInd10_0.6.txt};
  1708 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1709         {randGraph/ind/vf2ppInd10_0.6.txt};
  1710 \end{axis}
  1711 \end{tikzpicture}
  1712 \end{center}
  1713 \end{subfigure}
  1714 \hspace*{-1.5cm}
  1715 \begin{subfigure}[b]{0.55\textwidth}
  1716 \begin{tikzpicture}
  1717 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1718 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1719   west},scaled x ticks = false,x tick label style={/pgf/number
  1720   format/1000 sep = \space}]
  1721 %\addplot+[only marks] table {proteinsOrig.txt};
  1722 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
  1723 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1724         {randGraph/ind/vf2ppInd10_0.8.txt};
  1725 \end{axis}
  1726 \end{tikzpicture}
  1727      \end{subfigure}
  1728      \begin{subfigure}[b]{0.55\textwidth}
  1729 \begin{tikzpicture}
  1730 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1731 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1732   west},scaled x ticks = false,x tick label style={/pgf/number
  1733   format/1000 sep = \thinspace}]
  1734 %\addplot+[only marks] table {proteinsOrig.txt};
  1735 \addplot table {randGraph/ind/vf2pInd10_0.95.txt};
  1736 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1737         {randGraph/ind/vf2ppInd10_0.95.txt};
  1738 \end{axis}
  1739 \end{tikzpicture}
  1740 \end{subfigure}
  1741 \vspace*{-0.8cm}
  1742 \caption{IND on graphs having an average degree of
  1743   10.}\label{fig:randIND10}
  1744 \end{figure}
  1745 
  1746 \begin{figure}[H]
  1747 \begin{center}
  1748 \hspace*{-2cm}
  1749 \begin{tikzpicture}
  1750 \begin{axis}[title={Rand IND Summary, $\delta = 10$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1751 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
  1752   west},scaled x ticks = false,x tick label style={/pgf/number
  1753   format/1000 sep = \thinspace}]
  1754 %\addplot+[only marks] table {proteinsOrig.txt};
  1755 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1756         {randGraph/ind/vf2pInd10_0.3.txt};
  1757         \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1758                 {randGraph/ind/vf2ppInd10_0.3.txt};
  1759                 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1760                         {randGraph/ind/vf2pInd10_0.6.txt};
  1761                         \addplot[mark=triangle*,mark
  1762                           size=1.8pt,color=red] table
  1763                                 {randGraph/ind/vf2ppInd10_0.6.txt};
  1764                                 \addplot[mark=*,mark
  1765                                   size=1.5pt,color=blue] table
  1766                                         {randGraph/ind/vf2pInd10_0.8.txt};
  1767                                         \addplot[mark=triangle*,mark
  1768                                           size=1.8pt,color=red] table
  1769                                                 {randGraph/ind/vf2ppInd10_0.8.txt};
  1770                                                 \addplot[mark=*,mark
  1771                                                   size=1.5pt,color=blue]
  1772                                                 table
  1773                                                 {randGraph/ind/vf2pInd10_0.95.txt};
  1774                                                 \addplot[mark=triangle*,mark
  1775                                                   size=1.8pt,color=red]
  1776                                                 table
  1777                                                 {randGraph/ind/vf2ppInd10_0.95.txt};
  1778 \end{axis}
  1779 \end{tikzpicture}
  1780 \end{center}
  1781 \vspace*{-0.8cm}
  1782 \caption{Cummulative chart for $\delta=10$.}\label{fig:randIND10Sum}
  1783 \end{figure}
  1784 
  1785 
  1786 
  1787 \begin{figure}[H]
  1788 \vspace*{-1.5cm}
  1789 \hspace*{-1.5cm}
  1790 \begin{subfigure}[b]{0.55\textwidth}
  1791 \begin{center}
  1792 \begin{tikzpicture}
  1793 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1794 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1795   west},scaled x ticks = false,x tick label style={/pgf/number
  1796   format/1000 sep = \space}]
  1797 %\addplot+[only marks] table {proteinsOrig.txt};
  1798 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
  1799 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1800         {randGraph/ind/vf2ppInd35_0.05.txt};
  1801 \end{axis}
  1802 \end{tikzpicture}
  1803 \end{center}
  1804      \end{subfigure}
  1805      \begin{subfigure}[b]{0.55\textwidth}
  1806 \begin{center}
  1807 \begin{tikzpicture}
  1808 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1809 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1810   west},scaled x ticks = false,x tick label style={/pgf/number
  1811   format/1000 sep = \space}]
  1812 %\addplot+[only marks] table {proteinsOrig.txt};
  1813 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
  1814 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1815         {randGraph/ind/vf2ppInd35_0.1.txt};
  1816 \end{axis}
  1817 \end{tikzpicture}
  1818 \end{center}
  1819 \end{subfigure}
  1820 \hspace*{-1.5cm}
  1821 \begin{subfigure}[b]{0.55\textwidth}
  1822 \begin{center}
  1823 \begin{tikzpicture}
  1824 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1825 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1826   west},scaled x ticks = false,x tick label style={/pgf/number
  1827   format/1000 sep = \space}]
  1828 %\addplot+[only marks] table {proteinsOrig.txt};
  1829 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
  1830 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1831         {randGraph/ind/vf2ppInd35_0.3.txt};
  1832 \end{axis}
  1833 \end{tikzpicture}
  1834 \end{center}
  1835      \end{subfigure}
  1836      \begin{subfigure}[b]{0.55\textwidth}
  1837 \begin{center}
  1838 \begin{tikzpicture}
  1839 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1840 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1841   west},scaled x ticks = false,x tick label style={/pgf/number
  1842   format/1000 sep = \space}]
  1843 %\addplot+[only marks] table {proteinsOrig.txt};
  1844 \addplot table {randGraph/ind/vf2pInd35_0.6.txt};
  1845 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1846         {randGraph/ind/vf2ppInd35_0.6.txt};
  1847 \end{axis}
  1848 \end{tikzpicture}
  1849 \end{center}
  1850 \end{subfigure}
  1851 \hspace*{-1.5cm}
  1852 \begin{subfigure}[b]{0.55\textwidth}
  1853 \begin{tikzpicture}
  1854 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1855 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1856   west},scaled x ticks = false,x tick label style={/pgf/number
  1857   format/1000 sep = \space}]
  1858 %\addplot+[only marks] table {proteinsOrig.txt};
  1859 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
  1860 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1861         {randGraph/ind/vf2ppInd35_0.8.txt};
  1862 \end{axis}
  1863 \end{tikzpicture}
  1864      \end{subfigure}
  1865      \begin{subfigure}[b]{0.55\textwidth}
  1866 \begin{tikzpicture}
  1867 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1868 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1869   west},scaled x ticks = false,x tick label style={/pgf/number
  1870   format/1000 sep = \thinspace}]
  1871 %\addplot+[only marks] table {proteinsOrig.txt};
  1872 \addplot table {randGraph/ind/vf2pInd35_0.95.txt};
  1873 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1874         {randGraph/ind/vf2ppInd35_0.95.txt};
  1875 \end{axis}
  1876 \end{tikzpicture}
  1877 \end{subfigure}
  1878 \vspace*{-0.8cm}
  1879 \caption{IND on graphs having an average degree of
  1880   35.}\label{fig:randIND35}
  1881 \end{figure}
  1882 
  1883 \begin{figure}[H]
  1884 \begin{center}
  1885 \hspace*{-2cm}
  1886 \begin{tikzpicture}
  1887 \begin{axis}[title={Rand IND Summary, $\delta = 35$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1888 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
  1889   west},scaled x ticks = false,x tick label style={/pgf/number
  1890   format/1000 sep = \thinspace}]
  1891 %\addplot+[only marks] table {proteinsOrig.txt};
  1892 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1893         {randGraph/ind/vf2pInd35_0.3.txt};
  1894         \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1895                 {randGraph/ind/vf2ppInd35_0.3.txt};
  1896                 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1897                         {randGraph/ind/vf2pInd35_0.6.txt};
  1898                         \addplot[mark=triangle*,mark
  1899                           size=1.8pt,color=red] table
  1900                                 {randGraph/ind/vf2ppInd35_0.6.txt};
  1901                                 \addplot[mark=*,mark
  1902                                   size=1.5pt,color=blue] table
  1903                                         {randGraph/ind/vf2pInd35_0.8.txt};
  1904                                         \addplot[mark=triangle*,mark
  1905                                           size=1.8pt,color=red] table
  1906                                                 {randGraph/ind/vf2ppInd35_0.8.txt};
  1907                                                 \addplot[mark=*,mark
  1908                                                   size=1.5pt,color=blue]
  1909                                                 table
  1910                                                 {randGraph/ind/vf2pInd35_0.95.txt};
  1911                                                 \addplot[mark=triangle*,mark
  1912                                                   size=1.8pt,color=red]
  1913                                                 table
  1914                                                 {randGraph/ind/vf2ppInd35_0.95.txt};
  1915 \end{axis}
  1916 \end{tikzpicture}
  1917 \end{center}
  1918 \vspace*{-0.8cm}
  1919 \caption{Cummulative chart for $\delta=35$.}\label{fig:randIND35Sum}
  1920 \end{figure}
  1921 
  1922 Based on these experiments, VF2++ is faster than VF2 Plus and able to
  1923 handle really large graphs in milliseconds. Note that when $IND$ was
  1924 considered and the small graphs had proportionally few nodes ($\rho =
  1925 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
  1926 orders (e.g. see the $\delta=10$ case on
  1927 Figure~\ref{fig:randIND10}). If these examples had been excluded, the
  1928 charts would have seemed to be similar to the other ones.
  1929 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
  1930 Plus slow slightly down, but remain practically usable even on graphs
  1931 having 10 000 nodes.
  1932 
  1933 
  1934 
  1935 
  1936 
  1937 \section{Conclusion}
  1938 In this paper, after providing a short summary of the recent
  1939 algorithms, a new graph matching algorithm based on VF2, called VF2++,
  1940 has been presented and analyzed from a practical viewpoint.
  1941 
  1942 Recognizing the importance of the node order and determining an
  1943 efficient one, VF2++ is able to match graphs of thousands of nodes in
  1944 near practically linear time including preprocessing. In addition to
  1945 the proper order, VF2++ uses more efficient consistency and cutting
  1946 rules which are easy to compute and make the algorithm able to prune
  1947 most of the unfruitful branches without going astray.
  1948 
  1949 In order to show the efficiency of the new method, it has been
  1950 compared to VF2 Plus, which is the best concurrent algorithm based on
  1951 \cite{VF2Plus}.
  1952 
  1953 The experiments show that VF2++ consistently outperforms VF2 Plus on
  1954 biological graphs. It seems to be asymptotically faster on protein and
  1955 on contact map graphs in the case of induced subgraph isomorphism,
  1956 while in the case of graph isomorphism, it has definitely better
  1957 asymptotic behaviour on protein graphs.
  1958 
  1959 Regarding random sparse graphs, not only has VF2++ proved itself to be
  1960 faster than VF2 Plus, but it has a practically linear behaviour both
  1961 in the case of induced subgraph- and graph isomorphism, as well.
  1962 
  1963 
  1964 
  1965 %% The Appendices part is started with the command \appendix;
  1966 %% appendix sections are then done as normal sections
  1967 %% \appendix
  1968 
  1969 %% \section{}
  1970 %% \label{}
  1971 
  1972 %% If you have bibdatabase file and want bibtex to generate the
  1973 %% bibitems, please use
  1974 %%
  1975 \bibliographystyle{elsarticle-num} \bibliography{bibliography}
  1976 
  1977 %% else use the following coding to input the bibitems directly in the
  1978 %% TeX file.
  1979 
  1980 %% \begin{thebibliography}{00}
  1981 
  1982 %% %% \bibitem{label}
  1983 %% %% Text of bibliographic item
  1984 
  1985 %% \bibitem{}
  1986 
  1987 %% \end{thebibliography}
  1988 
  1989 \end{document}
  1990 \endinput
  1991 %%
  1992 %% End of file `elsarticle-template-num.tex'.