damecco.tex
author Alpar Juttner <alpar@cs.elte.hu>
Tue, 22 Nov 2016 14:11:49 +0100
changeset 4 c682d3237141
parent 3 550f81b2f81c
child 5 c04b029e623a
permissions -rw-r--r--
Reformatting II
     1 %% 
     2 %% Copyright 2007, 2008, 2009 Elsevier Ltd
     3 %% 
     4 %% This file is part of the 'Elsarticle Bundle'.
     5 %% ---------------------------------------------
     6 %% 
     7 %% It may be distributed under the conditions of the LaTeX Project Public
     8 %% License, either version 1.2 of this license or (at your option) any
     9 %% later version.  The latest version of this license is in
    10 %%    http://www.latex-project.org/lppl.txt
    11 %% and version 1.2 or later is part of all distributions of LaTeX
    12 %% version 1999/12/01 or later.
    13 %% 
    14 %% The list of all files belonging to the 'Elsarticle Bundle' is
    15 %% given in the file `manifest.txt'.
    16 %% 
    17 
    18 %% Template article for Elsevier's document class `elsarticle'
    19 %% with numbered style bibliographic references
    20 %% SP 2008/03/01
    21 
    22 \documentclass[preprint,12pt]{elsarticle}
    23 
    24 %% Use the option review to obtain double line spacing
    25 %% \documentclass[authoryear,preprint,review,12pt]{elsarticle}
    26 
    27 %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
    28 %% for a journal layout:
    29 %% \documentclass[final,1p,times]{elsarticle}
    30 %% \documentclass[final,1p,times,twocolumn]{elsarticle}
    31 %% \documentclass[final,3p,times]{elsarticle}
    32 %% \documentclass[final,3p,times,twocolumn]{elsarticle}
    33 %% \documentclass[final,5p,times]{elsarticle}
    34 %% \documentclass[final,5p,times,twocolumn]{elsarticle}
    35 
    36 %% For including figures, graphicx.sty has been loaded in
    37 %% elsarticle.cls. If you prefer to use the old commands
    38 %% please give \usepackage{epsfig}
    39 
    40 %% The amssymb package provides various useful mathematical symbols
    41 \usepackage{amssymb}
    42 %% The amsthm package provides extended theorem environments
    43 %% \usepackage{amsthm}
    44 
    45 %% The lineno packages adds line numbers. Start line numbering with
    46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
    47 %% for the whole article with \linenumbers.
    48 %% \usepackage{lineno}
    49 
    50 \usepackage{amsmath}
    51 %% \usepackage[pdftex]{graphicx}
    52 
    53 \usepackage{pgfplots}
    54 \pgfplotsset{width=9cm}
    55 \pgfplotsset{compat=1.8}
    56 
    57 \usepackage{caption}
    58 \usepackage{subcaption} 
    59 
    60 \usepackage{algorithm}
    61 \usepackage{algpseudocode}
    62 \usepackage{tikz}
    63 
    64 \usepackage{amsthm,amssymb}
    65 \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}
    66 
    67 \newtheorem{theorem}{Theorem}[subsection]
    68 \newtheorem{corollary}{Corollary}[theorem]
    69 \newtheorem{claim}[theorem]{Claim}
    70 
    71 \newtheorem{definition}{Definition}[subsection]
    72 \newtheorem{notation}{Notation}[subsection]
    73 \newtheorem{example}{Example}[subsection]
    74 \usetikzlibrary{decorations.markings}
    75 \let\oldproofname=\proofname
    76 %% \renewcommand{\proofname}{\rm\bf{Proof:}}
    77 
    78 \journal{Discrete Applied Mathematics}
    79 
    80 \begin{document}
    81 
    82 \begin{frontmatter}
    83 
    84 %% Title, authors and addresses
    85 
    86 %% use the tnoteref command within \title for footnotes;
    87 %% use the tnotetext command for theassociated footnote;
    88 %% use the fnref command within \author or \address for footnotes;
    89 %% use the fntext command for theassociated footnote;
    90 %% use the corref command within \author for corresponding author footnotes;
    91 %% use the cortext command for theassociated footnote;
    92 %% use the ead command for the email address,
    93 %% and the form \ead[url] for the home page:
    94 %% \title{Title\tnoteref{label1}}
    95 %% \tnotetext[label1]{}
    96 %% \author{Name\corref{cor1}\fnref{label2}}
    97 %% \ead{email address}
    98 %% \ead[url]{home page}
    99 %% \fntext[label2]{}
   100 %% \cortext[cor1]{}
   101 %% \address{Address\fnref{label3}}
   102 %% \fntext[label3]{}
   103 
   104 \title{Improved Algorithms for Matching Biological Graphs}
   105 
   106 %% use optional labels to link authors explicitly to addresses:
   107 %% \author[label1,label2]{}
   108 %% \address[label1]{}
   109 %% \address[label2]{}
   110 
   111 \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi}
   112 
   113 \address{Dept of Operations Research, ELTE}
   114 
   115 \begin{abstract}
   116 Subgraph isomorphism is a well-known NP-Complete problem, while its
   117 special case, the graph isomorphism problem is one of the few problems
   118 in NP neither known to be in P nor NP-Complete. Their appearance in
   119 many fields of application such as pattern analysis, computer vision
   120 questions and the analysis of chemical and biological systems has
   121 fostered the design of various algorithms for handling special graph
   122 structures.
   123 
   124 The idea of using state space representation and checking some
   125 conditions in each state to prune the search tree has made the VF2
   126 algorithm one of the state of the art graph matching algorithms for
   127 more than a decade. Recently, biological questions of ever increasing
   128 importance have required more efficient, specialized algorithms.
   129 
   130 This paper presents VF2++, a new algorithm based on the original VF2,
   131 which runs significantly faster on most test cases and performs
   132 especially well on special graph classes stemming from biological
   133 questions. VF2++ handles graphs of thousands of nodes in practically
   134 near linear time including preprocessing. Not only is it an improved
   135 version of VF2, but in fact, it is by far the fastest existing
   136 algorithm regarding biological graphs.
   137 
   138 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking
   139 into account the structure and the node labeling of the graph, VF2++
   140 determines a state order in which most of the unfruitful branches of
   141 the search space can be pruned immediately. Secondly, introducing more
   142 efficient - nevertheless still easier to compute - cutting rules
   143 reduces the chance of going astray even further.
   144 
   145 In addition to the usual subgraph isomorphism, specialized versions
   146 for induced subgraph isomorphism and for graph isomorphism are
   147 presented. VF2++ has gained a runtime improvement of one order of
   148 magnitude respecting induced subgraph isomorphism and a better
   149 asymptotical behaviour in the case of graph isomorphism problem.
   150 
   151 After having provided the description of VF2++, in order to evaluate
   152 its effectiveness, an extensive comparison to the contemporary other
   153 algorithms is shown, using a wide range of inputs, including both real
   154 life biological and chemical datasets and standard randomly generated
   155 graph series.
   156 
   157 The work was motivated and sponsored by QuantumBio Inc., and all the
   158 developed algorithms are available as the part of the open source
   159 LEMON graph and network optimization library
   160 (http://lemon.cs.elte.hu).
   161 \end{abstract}
   162 
   163 \begin{keyword}
   164 %% keywords here, in the form: keyword \sep keyword
   165 
   166 %% PACS codes here, in the form: \PACS code \sep code
   167 
   168 %% MSC codes here, in the form: \MSC code \sep code
   169 %% or \MSC[2008] code \sep code (2000 is the default)
   170 
   171 \end{keyword}
   172 
   173 \end{frontmatter}
   174 
   175 %% \linenumbers
   176 
   177 %% main text
   178 \section{Introduction}
   179 \label{sec:intro}
   180 
   181 In the last decades, combinatorial structures, and especially graphs
   182 have been considered with ever increasing interest, and applied to the
   183 solution of several new and revised questions.  The expressiveness,
   184 the simplicity and the studiedness of graphs make them practical for
   185 modelling and appear constantly in several seemingly independent
   186 fields.  Bioinformatics and chemistry are amongst the most relevant
   187 and most important fields.
   188 
   189 Complex biological systems arise from the interaction and cooperation
   190 of plenty of molecular components. Getting acquainted with such
   191 systems at the molecular level has primary importance, since
   192 protein-protein interaction, DNA-protein interaction, metabolic
   193 interaction, transcription factor binding, neuronal networks, and
   194 hormone signaling networks can be understood only this way.
   195 
   196 For instance, a molecular structure can be considered as a graph,
   197 whose nodes correspond to atoms and whose edges to chemical bonds. The
   198 secondary structure of a protein can also be represented as a graph,
   199 where nodes are associated with aminoacids and the edges with hydrogen
   200 bonds. The nodes are often whole molecular components and the edges
   201 represent some relationships among them.  The similarity and
   202 dissimilarity of objects corresponding to nodes are incorporated to
   203 the model by \emph{node labels}.  Many other chemical and biological
   204 structures can easily be modeled in a similar way. Understanding such
   205 networks basically requires finding specific subgraphs, which can not
   206 avoid the application of graph matching algorithms.
   207 
   208 Finally, let some of the other real-world fields related to some
   209 variants of graph matching be briefly mentioned: pattern recognition
   210 and machine vision \cite{HorstBunkeApplications}, symbol recognition
   211 \cite{CordellaVentoSymbolRecognition}, face identification
   212 \cite{JianzhuangYongFaceIdentification}.  \\
   213 
   214 Subgraph and induced subgraph matching problems are known to be
   215 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is
   216 one of the few problems in NP neither known to be in P nor
   217 NP-Complete. Although polynomial time isomorphism algorithms are known
   218 for various graph classes, like trees and planar
   219 graphs\cite{PlanarGraphIso}, bounded valence
   220 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
   221 or permutation graphs\cite{PermGraphIso}.
   222 
   223 In the following, some algorithms based on other approaches are
   224 summarized, which do not need any restrictions on the graphs. However,
   225 an overall polynomial behaviour is not expectable from such an
   226 alternative, it may often have good performance, even on a graph class
   227 for which polynomial algorithm is known. Note that this summary
   228 containing only exact matching algorithms is far not complete, neither
   229 does it cover all the recent algorithms.
   230 
   231 The first practically usable approach was due to
   232 Ullmann\cite{Ullmann} which is a commonly used depth-first
   233 search based algorithm with a complex heuristic for reducing the
   234 number of visited states. A major problem is its $\Theta(n^3)$ space
   235 complexity, which makes it impractical in the case of big sparse
   236 graphs.
   237 
   238 In a recent paper, Ullmann\cite{UllmannBit} presents an
   239 improved version of this algorithm based on a bit-vector solution for
   240 the binary Constraint Satisfaction Problem.
   241 
   242 The Nauty algorithm\cite{Nauty} transforms the two graphs to
   243 a canonical form before starting to check for the isomorphism. It has
   244 been considered as one of the fastest graph isomorphism algorithms,
   245 although graph categories were shown in which it takes exponentially
   246 many steps. This algorithm handles only the graph isomorphism problem.
   247 
   248 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search
   249 strategy and formulates the matching as a Constraint Satisfaction
   250 Problem to prune the search tree. The constraints are that the mapping
   251 has to be injective and edge-preserving, hence it is possible to
   252 handle new matching types as well.
   253 
   254 The \textbf{RI} algorithm\cite{RI} and its variations are based on a
   255 state space representation. After reordering the nodes of the graphs,
   256 it uses some fast executable heuristic checks without using any
   257 complex pruning rules. It seems to run really efficiently on graphs
   258 coming from biology, and won the International Contest on Pattern
   259 Search in Biological Databases\cite{Content}.
   260 
   261 The currently most commonly used algorithm is the
   262 \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was
   263 designed for solving pattern matching and computer vision problems,
   264 and has been one of the best overall algorithms for more than a
   265 decade. Although, it can't be up to new specialized algorithms, it is
   266 still widely used due to its simplicity and space efficiency. VF2 uses
   267 a state space representation and checks some conditions in each state
   268 to prune the search tree.
   269 
   270 Our first graph matching algorithm was the first version of VF2 which
   271 recognizes the significance of the node ordering, more opportunities
   272 to increase the cutting efficiency and reduce its computational
   273 complexity. This project was initiated and sponsored by QuantumBio
   274 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
   275 code --- has been published as a part of LEMON\cite{LEMON} open source
   276 graph library.
   277 
   278 This paper introduces \textbf{VF2++}, a new further improved algorithm
   279 for the graph and (induced)subgraph isomorphism problem, which uses
   280 efficient cutting rules and determines a node order in which VF2 runs
   281 significantly faster on practical inputs.
   282 
   283 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has
   284 been published. It is considered to be as efficient as the RI
   285 algorithm and has a strictly better behavior on large graphs.  The
   286 main idea of VF2 Plus is to precompute a heuristic node order of the
   287 small graph, in which the VF2 works more efficiently.
   288 
   289 \section{Problem Statement}
   290 This section provides a detailed description of the problems to be
   291 solved.
   292 \subsection{Definitions}
   293 
   294 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and
   295 $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs.
   296 \begin{definition}\label{sec:ismorphic}
   297 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M:
   298   V_{small} \longrightarrow V_{large}$ bijection, for which the
   299   following is true:
   300 \begin{center}
   301 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
   302   (M(u),M(v))\in{E_{large}}$
   303 \end{center}
   304 \end{definition}
   305 For the sake of simplicity in this paper subgraphs and induced
   306 subgraphs are defined in a more general way than usual:
   307 \begin{definition}
   308 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I:
   309   V_{small}\longrightarrow V_{large}$ injection, for which the
   310   following is true:
   311 \begin{center}
   312 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$
   313 \end{center}
   314 \end{definition}
   315 
   316 \begin{definition} 
   317 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists
   318   I: V_{small}\longrightarrow V_{large}$ injection, for which the
   319   following is true:
   320 \begin{center}
   321 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
   322   (I(u),I(v))\in E_{large}$
   323 \end{center}
   324 \end{definition}
   325 
   326 \begin{definition}
   327 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node
   328     label function}, where K is an arbitrary set. The elements in K
   329   are the \textbf{node labels}. Two nodes, u and v are said to be
   330   \textbf{equivalent}, if $lab(u)=lab(v)$.
   331 \end{definition}
   332 
   333 When node labels are also given, the matched nodes must have the same
   334 labels.  For example, the node labeled isomorphism is phrased by
   335 \begin{definition}
   336 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label
   337     function lab} if $\exists M: V_{small} \longrightarrow V_{large}$
   338   bijection, for which the following is true:
   339 \begin{center}
   340 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
   341   (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} :
   342   lab(u)=lab(M(u)))$
   343 \end{center}
   344 \end{definition}
   345 
   346 The other two definitions can be extended in the same way.
   347 
   348 Note that edge label function can be defined similarly to node label
   349 function, and all the definitions can be extended with additional
   350 conditions, but it is out of the scope of this work.
   351 
   352 The equivalence of two nodes is usually defined by another relation,
   353 $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the
   354 definition given above if R is an equivalence relation, which does not
   355 mean restriction in biological and chemical applications.
   356 
   357 \subsection{Common problems}\label{sec:CommProb}
   358 
   359 The focus of this paper is on two extensively studied topics, the
   360 subgraph isomorphism and its variations. However, the following
   361 problems also appear in many applications.
   362 
   363 The \textbf{subgraph matching problem} is the following: is
   364 $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node
   365 label?
   366 
   367 The \textbf{induced subgraph matching problem} asks the same about the
   368 existence of an induced subgraph.
   369 
   370 The \textbf{graph isomorphism problem} can be defined as induced
   371 subgraph matching problem where the sizes of the two graphs are equal.
   372 
   373 In addition to existence, it may be needed to show such a subgraph, or
   374 it may be necessary to list all of them.
   375 
   376 It should be noted that some authors misleadingly refer to the term
   377 \emph{subgraph isomorphism problem} as an \emph{induced subgraph
   378   isomorphism problem}.
   379 
   380 The following sections give the descriptions of VF2, VF2++, VF2 Plus
   381 and a particular comparison.
   382 
   383 \section{The VF2 Algorithm}
   384 This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2
   385 is able to handle all the variations mentioned in Section
   386   \ref{sec:CommProb}.  Although it can also handle directed graphs,
   387 for the sake of simplicity, only the undirected case will be
   388 discussed.
   389 
   390 
   391 \subsection{Common notations}
   392 \indent Assume $G_{small}$ is searched in $G_{large}$.  The following
   393 definitions and notations will be used throughout the whole paper.
   394 \begin{definition}
   395 A set $M\subseteq V_{small}\times V_{large}$ is called
   396 \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears
   397 in more than one pair in M.  That is, M uniquely associates some of
   398 the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice
   399 versa.
   400 \end{definition}
   401 
   402 \begin{definition}
   403 Mapping M \textbf{covers} a node v, if there exists a pair in M, which
   404 contains v.
   405 \end{definition}
   406 
   407 \begin{definition}
   408 A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the
   409 nodes in $V_{small}$.
   410 \end{definition}
   411 
   412 \begin{notation}
   413 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in
   414 V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in
   415 V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$.
   416 \end{notation}
   417 
   418 \begin{notation}
   419 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node
   420 exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$
   421 and $v\in V_{small}\cup V_{large}$.
   422 \end{notation}
   423 
   424 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique
   425 
   426 The definitions of the isomorphism types can be rephrased on the
   427 existence of a special whole mapping $M$, since it represents a
   428 bijection. For example
   429 \begin{center}
   430 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph
   431   isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v
   432   \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
   433   (Pair(M,u),Pair(M,v))\in E_{large}$.
   434 \end{center}
   435 
   436 \begin{definition}
   437 A set of whole mappings is called \textbf{problem type}.
   438 \end{definition}
   439 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
   440 which can be substituted by any problem type.
   441 
   442 A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using
   443 this notations, VF2 searches a whole mapping $W$ of type $PT$.
   444 
   445 For example the problem type of graph isomorphism problem is the
   446 following.  A whole mapping $W$ is in $\mathbf{ISO}$, iff the
   447 bijection represented by $W$ satisfies Definition~\ref{sec:ismorphic}.
   448 The subgraph- and induced subgraph matching problems can be formalized
   449 in a similar way. Let their problem types be denoted as $\mathbf{SUB}$
   450 and $\mathbf{IND}$.
   451 
   452 \begin{definition}
   453 \label{expPT}
   454 $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in
   455 PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow
   456 (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of
   457 $G_{small}$ has to be mapped to an edge of $G_{large}$ for each
   458 mapping in $PT$.
   459 \end{definition}
   460 
   461 Note that $ISO$, $SUB$ and $IND$ are expanding problem types.
   462 
   463 This paper deals with the three problem types mentioned above only,
   464 but the following generic definitions make it possible to handle other
   465 types as well.  Although it may be challenging to find a proper
   466 consistency function and an efficient cutting function.
   467 
   468 \begin{definition}
   469 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
   470 \textbf{consistency function by } $\mathbf{PT}$, if the following
   471 holds. If there exists whole mapping $W$ of type PT for which
   472 $M\subseteq W$, then $Cons_{PT}(M)$ is true.
   473 \end{definition}
   474 
   475 \begin{definition} 
   476 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
   477 \textbf{cutting function by } $\mathbf{PT}$, if the following
   478 holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a
   479 whole mapping W of type PT.
   480 \end{definition}
   481 
   482 \begin{definition}
   483 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if
   484   $Cons_{PT}(M)$ is true.
   485 \end{definition}
   486 
   487 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
   488 \begin{notation}
   489 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and
   490 $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where
   491 $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping.
   492 \end{notation}
   493 
   494 $Cons_{PT}$ will be used to check the consistency of the already
   495 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
   496 no whole consistent mapping can contain the current mapping.
   497 
   498 \subsection{Overview of the algorithm}
   499 VF2 uses a state space representation of mappings, $Cons_{PT}$ for
   500 excluding inconsistency with the problem type and $Cut_{PT}$ for
   501 pruning the search tree.  Each state $s$ of the matching process can
   502 be associated with a mapping $M(s)$.
   503 
   504 Algorithm~\ref{alg:VF2Pseu} is a high level description of
   505 the VF2 matching algorithm.
   506 
   507 
   508 \begin{algorithm}
   509 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
   510 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
   511 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
   512 \begin{algorithmic}[1]
   513 
   514 \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers
   515   $V_{small}$} \State Output($M(s)$) \Else
   516   
   517   \State Compute the set $P(s)$ of the pairs candidate for inclusion
   518   in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$
   519     $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state
   520   $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call}
   521   VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
   522 \end{algorithmic}
   523 \end{algorithm}
   524 
   525 
   526 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it
   527 starts with an empty mapping.
   528 
   529 For each state $s$, the algorithm computes $P(s)$, the set of
   530 candidate node pairs for adding to the current state $s$.
   531 
   532 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and
   533 $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and
   534 $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup
   535 \{p\}$ is computed, and the whole process is recursively applied to
   536 $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it
   537 can be proved that $s$ can not be extended to a whole mapping.
   538 
   539 In order to make sure of the correctness see
   540 \begin{claim}
   541 Through consistent mappings, only consistent whole mappings can be
   542 reached, and all of the whole mappings are reachable through
   543 consistent mappings.
   544 \end{claim}
   545 
   546 Note that a state may be reached in many different ways, since the
   547 order of insertions into M does not influence the nascent mapping. In
   548 fact, the number of different ways which lead to the same state can be
   549 exponentially large. If $G_{small}$ and $G_{large}$ are circles with n
   550 nodes and n different node labels, there exists exactly one graph
   551 isomorphism between them, but it will be reached in $n!$ different
   552 ways.
   553 
   554 However, one may observe
   555 
   556 \begin{claim}
   557 \label{claim:claimTotOrd}
   558 Let $\prec$ an arbitrary total ordering relation on $V_{small}$.  If
   559 the algorithm ignores each $p=(u,v) \in P(s)$, for which
   560 \begin{center}
   561 $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$,
   562 \end{center}
   563 then no state can be reached more than ones and each state associated
   564 with a whole mapping remains reachable.
   565 \end{claim}
   566 
   567 Note that the cornerstone of the improvements to VF2 is a proper
   568 choice of a total ordering.
   569 
   570 \subsection{The candidate set P(s)}
   571 \label{candidateComputingVF2}
   572 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.
   573 Suppose that $PT$ is an expanding problem type, see
   574 Definition~\ref{expPT}.
   575 
   576 \begin{notation}
   577 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by
   578 $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}}
   579 \wedge \tilde{u}$ is covered by $M(s)\}$, and
   580 \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not
   581 covered by
   582 $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}}
   583 \wedge \tilde{v}$ is covered by $M(s)\}$
   584 \end{notation}
   585 
   586 The set $P(s)$ includes the pairs of uncovered neighbours of covered
   587 nodes and if there is not such a node pair, all the pairs containing
   588 two uncovered nodes are added. Formally, let
   589 \[
   590  P(s)\!=\!
   591   \begin{cases} 
   592    T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if }
   593    T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq
   594    \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s))
   595    &\hspace{-0.15cm}otherwise.
   596   \end{cases}
   597 \]
   598 
   599 \subsection{Consistency}
   600 This section defines the consistency functions for the different
   601 problem types mentioned in Section~\ref{sec:CommProb}.
   602 \begin{notation}
   603 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} :
   604 (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large}
   605   (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$
   606 \end{notation}
   607 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is
   608 a state of the matching procedure, $M(s)$ is consistent mapping by
   609 $PT$ and $lab(u)=lab(v)$.  $Cons_{PT}(p,M(s))$ checks whether
   610 including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$.
   611 
   612 \subsubsection{Induced subgraph isomorphism}
   613 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow
   614 (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small}
   615 \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline The
   616 following formulation gives an efficient way of calculating
   617 $Cons_{IND}$.
   618 \begin{claim}
   619 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v)
   620   \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge
   621   (\forall \tilde{u}\in \Gamma_{small}(u)
   622   \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
   623   consistency function in the case of $IND$.
   624 \end{claim}
   625 
   626 \subsubsection{Graph isomorphism}
   627 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$
   628 $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by
   629 $IND$.
   630 \begin{claim}
   631 $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and
   632   only if it is a consistency function by $IND$.
   633 \end{claim}
   634 \subsubsection{Subgraph isomorphism}
   635 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow
   636 (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small}
   637 \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.
   638 \newline
   639 The following formulation gives an efficient way of calculating
   640 $Cons_{SUB}$.
   641 \begin{claim}
   642 $Cons_{SUB}((u,v),M(s)):= (\forall \tilde{u}\in \Gamma_{small}(u)
   643   \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
   644   consistency function by $SUB$.
   645 \end{claim}
   646 
   647 \subsection{Cutting rules}
   648 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently
   649 verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can
   650 be true only if it is impossible to extended $M(s)\cup \{p\}$ to a
   651 whole mapping.
   652 \begin{notation}
   653 
   654 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash
   655 M_{small}(s))\backslash T_{small}(s)$, and
   656 \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash
   657 M_{large}(s))\backslash T_{large}(s)$.
   658 \end{notation}
   659 \subsubsection{Induced subgraph isomorphism}
   660 \begin{claim}
   661 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
   662   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
   663   \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap
   664   \tilde{T}_{small}(s)|$ is a cutting function by $IND$.
   665 \end{claim}
   666 \subsubsection{Graph isomorphism}
   667 Note that the cutting function of induced subgraph isomorphism defined
   668 above is a cutting function by $ISO$, too, however it is less
   669 efficient than the following while their computational complexity is
   670 the same.
   671 \begin{claim}
   672 $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq
   673   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
   674   \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap
   675   \tilde{T}_{small}(s)|$ is a cutting function by $ISO$.
   676 \end{claim}
   677 
   678 \subsubsection{Subgraph isomorphism}
   679 \begin{claim}
   680 $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
   681   |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by
   682   $SUB$.
   683 \end{claim}
   684 Note that there is a significant difference between induced and
   685 non-induced subgraph isomorphism:
   686 
   687 \begin{claim}
   688 \label{claimSUB}
   689 $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
   690 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
   691 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$
   692 is \textbf{not} a cutting function by $SUB$.
   693 \end{claim}
   694 \begin{proof}$ $\\
   695 \vspace*{-0.5cm}
   696 
   697 \begin{figure}
   698 \begin{center}
   699 \begin{tikzpicture}
   700   [scale=.8,auto=left,every node/.style={circle,fill=black!15}]
   701   \node[rectangle,fill=black!15] at (4,6) {$G_{small}$}; \node (u4) at
   702   (2.5,10) {$u_4$}; \node (u3) at (5.5,10) {$u_3$}; \node (u1) at
   703   (2.5,7) {$u_1$}; \node (u2) at (5.5,7) {$u_2$};
   704   
   705   \node[rectangle,fill=black!30] at (13.5,6) {$G_{large}$};
   706   \node[fill=black!30] (v4) at (12,10) {$v_4$}; \node[fill=black!30]
   707   (v3) at (15,10) {$v_3$}; \node[fill=black!30] (v1) at (12,7)
   708        {$v_1$}; \node[fill=black!30] (v2) at (15,7) {$v_2$};
   709   
   710 
   711   \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1} \draw (\from) --
   712   (\to); \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3} \draw
   713   (\from) -- (\to);
   714 %    \draw[dashed] (\from) -- (\to);
   715 \end{tikzpicture}
   716 \caption{Graphs for the proof of Claim~\ref{claimSUB}}
   717 \label{fig:proofSUB}
   718 \end{center}
   719 \end{figure}
   720 Let the two graphs of Figure~\ref{fig:proofSUB} be the input
   721 graphs.  Suppose the total ordering relation is $u_1 \prec u_2 \prec
   722 u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add
   723 $(u_2,v_2)\in P(s)$.\newline $Cons_{SUB}((u_2,v_2),M(s))=true$, so
   724 $M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function
   725 $Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the
   726 tree.\newline On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true,
   727 since\\$0=|\Gamma_{large}(v_2)\cap
   728 \tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap
   729 \tilde{T}_{small}(s)|=1$ is true, but still the tree can not be
   730 pruned, because otherwise the
   731 $\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found.
   732 \end{proof}
   733 
   734 \section{The VF2++ Algorithm}
   735 Although any total ordering relation makes the search space of VF2 a
   736 tree, its choice turns out to dramatically influence the number of
   737 visited states. The goal is to determine an efficient one as quickly
   738 as possible.
   739 
   740 The main reason for VF2++' superiority over VF2 is twofold. Firstly,
   741 taking into account the structure and the node labeling of the graph,
   742 VF2++ determines a state order in which most of the unfruitful
   743 branches of the search space can be pruned immediately. Secondly,
   744 introducing more efficient --- nevertheless still easier to compute
   745 --- cutting rules reduces the chance of going astray even further.
   746 
   747 In addition to the usual subgraph isomorphism, specialized versions
   748 for induced subgraph isomorphism and for graph isomorphism have been
   749 designed. VF2++ has gained a runtime improvement of one order of
   750 magnitude respecting induced subgraph isomorphism and a better
   751 asymptotical behaviour in the case of graph isomorphism problem.
   752 
   753 Note that a weaker version of the cutting rules and the more efficient
   754 candidate set calculating were described in \cite{VF2Plus}, too.
   755 
   756 It should be noted that all the methods described in this section are
   757 extendable to handle directed graphs and edge labels as well.
   758 
   759 The basic ideas and the detailed description of VF2++ are provided in
   760 the following.
   761 
   762 \subsection{Preparations}
   763 \begin{claim}
   764 \label{claim:claimCoverFromLeft}
   765 The total ordering relation uniquely determines a node order, in which
   766 the nodes of $V_{small}$ will be covered by VF2. From the point of
   767 view of the matching procedure, this means, that always the same node
   768 of $G_{small}$ will be covered on the d-th level.
   769 \end{claim}
   770 \begin{proof}
   771 In order to make the search space a tree, the pairs in $\{(u,v)\in
   772 P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$.
   773 \newline
   774 Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} :
   775 \hat{u}\prec u\}$
   776 \newline
   777 The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} :
   778 \forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form
   779 $\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious,
   780 that only $\tilde{u}$ can be covered in $V_{small}$. Actually,
   781 $\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in
   782 $V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and
   783 $T_{small}(s)$ depends only on the covered nodes of $G_{small}$.
   784 \newline
   785 Simple induction on $d$ shows that the set of covered nodes of
   786 $G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if
   787 $d$ is given.
   788 \end{proof}
   789 
   790 \begin{definition}
   791 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
   792 $V_{small}$ is \textbf{matching order}, if exists $\prec$ total
   793 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
   794 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$.
   795 \end{definition}
   796 
   797 \begin{claim}\label{claim:MOclaim}
   798 A total ordering is matching order, iff the nodes of every component
   799 form an interval in the node sequence, and every node connects to a
   800 previous node in its component except the first node of the
   801 component. The order of the components is arbitrary.  \\Formally
   802 spoken, an order
   803 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
   804 $V_{small}$ is matching order $\Leftrightarrow$ $\forall
   805 G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall
   806 i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
   807 V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq
   808 l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge
   809 (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in
   810 \{1,..,|V_{small}|\}$\newline
   811 \end{claim}
   812 \begin{proof}
   813 Suppose a matching order is given. It has to be shown that the node
   814 sequence has a structure described above.\\ Let
   815 $G'_{small}=(V'_{small},E'_{small})$ be an arbitrary component and $i$
   816 an arbitrary index.
   817 \newline
   818 $(\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
   819 V'_{small})\Rightarrow u_{\sigma(i)}$ is not the first covered node in
   820 $G_{small}\ \Rightarrow u_{\sigma(i)}$ is connected to a covered node
   821 $u_{\sigma(k)}$ where $k<i$, since $u_{\sigma(i)}\in T_{small}(s)$ for
   822 some $s$ and $T_{small}(s)\subseteq{V'_{small}}$ contains only nodes
   823 connected to at least one covered node. It's easy to see, that
   824 $\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}$, since
   825 $T_{small}(s)$ contains only nodes connected to some covered ones,
   826 while it is not empty, but if it were empty, then $u_{\sigma(k)}$ and
   827 $u_{\sigma(i)}$ were not in the same component.
   828 
   829 Now, let us show that if a node sequence has the special structure
   830 described above, it has to be matching
   831 order.\\ $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ is
   832 a total ordering, which determines the matching order
   833 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$.
   834 \end{proof}
   835 
   836 To summing up, a total ordering always uniquely determines a matching
   837 order, and every matching order can be determined by a total ordering,
   838 however, more than one different total orderings may determine the
   839 same matching order.
   840 \subsection{Idea behind the algorithm}
   841 The goal is to find a matching order in which the algorithm is able to
   842 recognize inconsistency or prune the infeasible branches on the
   843 highest levels and goes deep only if it is needed.
   844 
   845 \begin{notation}
   846 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the
   847 number of neighbours of u which are in H, where $u\in V_{small} $ and
   848 $H\subseteq V_{small}$.
   849 \end{notation}
   850 
   851 The principal question is the following. Suppose a state $s$ is
   852 given. For which node of $T_{small}(s)$ is the hardest to find a
   853 consistent pair in $G_{large}$? The more covered neighbours a node in
   854 $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has
   855 ---, the more rarely satisfiable consistency constraints for its pair
   856 are given.
   857 
   858 In biology, most of the graphs are sparse, thus several nodes in
   859 $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes
   860 reasonable to define a secondary and a tertiary order between them.
   861 The observation above proves itself to be as determining, that the
   862 secondary ordering prefers nodes with the most uncovered neighbours
   863 among which have the same $Conn_{M_{small}(s)}$ to increase
   864 $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible.  The
   865 tertiary ordering prefers nodes having the rarest uncovered labels.
   866 
   867 Note that the secondary ordering is the same as the ordering by $deg$,
   868 which is a static data in front of the above used.
   869 
   870 These rules can easily result in a matching order which contains the
   871 nodes of a long path successively, whose nodes may have low $Conn$ and
   872 is easily matchable into $G_{large}$. To avoid that, a BFS order is
   873 used, which provides the shortest possible paths.
   874 \newline
   875 
   876 In the following, some examples on which the VF2 may be slow are
   877 described, although they are easily solvable by using a proper
   878 matching order.
   879 
   880 \begin{example}
   881 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways
   882 without node labels. Let $u\in V_{small}$ and $v\in V_{large}$.
   883 \newline
   884 $lab(u):=black$
   885 \newline
   886 $lab(v):=black$
   887 \newline
   888 $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash
   889 \{u\})$
   890 \newline
   891 $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash
   892 \{v\})$
   893 \newline
   894 
   895 Now, any mapping by the node label $lab$ must contain $(u,v)$, since
   896 $u$ is black and no node in $V_{large}$ has a black label except
   897 $v$. If unfortunately $u$ were the last node which will get covered,
   898 VF2 would check only in the last steps, whether $u$ can be matched to
   899 $v$.
   900 \newline
   901 However, had $u$ been the first matched node, u would have been
   902 matched immediately to v, so all the mappings would have been
   903 precluded in which node labels can not correspond.
   904 \end{example}
   905 
   906 \begin{example}
   907 Suppose there is no node label given, $G_{small}$ is a small graph and
   908 can not be mapped into $G_{large}$ and $u\in V_{small}$.
   909 \newline
   910 Let $G'_{small}:=(V_{small}\cup
   911 \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup
   912 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
   913 $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint
   914 from $G_{small}$ and one of its starting points is connected to $u\in
   915 V_{small}\}$.
   916 \newline
   917 Is there a subgraph of $G_{large}$, which is isomorph with
   918 $G'_{small}$?
   919 \newline
   920 If unfortunately the nodes of the path were the first $k$ nodes in the
   921 matching order, the algorithm would iterate through all the possible k
   922 long paths in $G_{large}$, and it would recognize that no path can be
   923 extended to $G'_{small}$.
   924 \newline
   925 However, had it started by the matching of $G_{small}$, it would not
   926 have matched any nodes of the path.
   927 \end{example}
   928 
   929 These examples may look artificial, but the same problems also appear
   930 in real-world examples, even though in a less obvious way.
   931 
   932 \subsection{Total ordering}
   933 Instead of the total ordering relation, the matching order will be
   934 searched directly.
   935 \begin{notation}
   936 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} :
   937 l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ ,
   938 where $l$ is a label and $\mathcal{M}\subseteq V_{small}$.
   939 \end{notation}
   940 
   941 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
   942 \end{definition}
   943 
   944 \begin{algorithm}
   945 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
   946 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
   947 \algtext*{EndWhile}
   948 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
   949 \begin{algorithmic}[1]
   950 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$
   951 \Comment{matching order} \While{$V_{small}\backslash \mathcal{M}
   952   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
   953 min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash
   954 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
   955 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
   956 $d$-th level \State Process $V_d$ \Comment{See Algorithm
   957   \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2}} \EndFor
   958 \EndWhile \EndProcedure
   959 \end{algorithmic}
   960 \end{algorithm}
   961 
   962 \begin{algorithm}
   963 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
   964 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
   965 \algtext*{EndWhile}
   966 \caption{\hspace{.5cm}$A\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
   967 \begin{algorithmic}[1]
   968 \Procedure{VF2++ProcessLevel1}{$V_{d}$} \While{$V_d\neq\emptyset$}
   969 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg
   970 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
   971 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
   972 $F_\mathcal{M}$ \EndWhile \EndProcedure
   973 \end{algorithmic}
   974 \end{algorithm}
   975 
   976 \begin{algorithm}
   977 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
   978 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
   979 \algtext*{EndWhile}
   980 \caption{\hspace{0.5cm}$Another\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess2}
   981 \begin{algorithmic}[1]
   982 \Procedure{VF2++ProcessLevel2}{$V_{d}$} \State Sort $V_d$ in
   983 descending lex. order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$
   984 \State Append the sorted $V_d$ to the end of $\mathcal{M}$ \State
   985 Refresh $F_\mathcal{M}$ \EndProcedure
   986 \end{algorithmic}
   987 \end{algorithm}
   988 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
   989 matching order procedure of VF2++. It computes a BFS tree for each
   990 component in ascending order of their rarest $lab$ and largest $deg$,
   991 whose root vertex is the component's minimal
   992 node. Algorithms~\ref{alg:VF2PPProcess1} and \ref{alg:VF2PPProcess2}
   993 are two different methods to process a level of the BFS tree.
   994 
   995 After sorting the nodes of the current level in descending
   996 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$,
   997 Algorithm~\ref{alg:VF2PPProcess2} appends them simultaneously to the
   998 matching order $\mathcal{M}$ and refreshes $F_\mathcal{M}$ just once,
   999 whereas Algorithm~\ref{alg:VF2PPProcess1} appends the nodes separately
  1000 to $\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it
  1001 provides up-to-date label information and may result in a more
  1002 efficient matching order.
  1003 
  1004 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}
  1005 provides a matching order.
  1006 
  1007 
  1008 \subsection{Cutting rules}
  1009 \label{VF2PPCuttingRules}
  1010 This section presents the cutting rules of VF2++, which are improved
  1011 by using extra information coming from the node labels.
  1012 \begin{notation}
  1013 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l
  1014 \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and
  1015 $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge
  1016 \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in
  1017 V_{large}$ and $l$ is a label.
  1018 \end{notation}
  1019 
  1020 \subsubsection{Induced subgraph isomorphism}
  1021 \begin{claim}
  1022 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND.
  1023 \end{claim}
  1024 \begin{proof}
  1025 It has to be shown, that $LabCut_{IND}((u,v),M(s))=true\Rightarrow$
  1026 the mapping can not be extended to a whole
  1027 mapping.\\ $LabCut_{IND}((u,v),M(s))=true,$ iff the following
  1028 holds. $\\\exists l: |\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| <
  1029 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)| \vee
  1030 |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| <
  1031 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$.
  1032 
  1033 Suppose that $|\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| <
  1034 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)|$. Each node of
  1035 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$ has to be matched to a
  1036 node in $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$, so
  1037 $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$ can not be smaller than
  1038 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$. That is why $M(s)$ can
  1039 not be extended to a whole mapping.
  1040 
  1041 Otherwise $|\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| <
  1042 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$ has to be
  1043 true. Similarly, each node of $\Gamma_{large}^{l}(v)\cap
  1044 \tilde{T}_{large}(s)$ has to be matched to a node in
  1045 $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$,
  1046 i.e. $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ can not be
  1047 smaller than $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, if
  1048 $M(s)$ is extendible.
  1049 \end{proof}
  1050 The following claims can be proven similarly.
  1051 \subsubsection{Graph isomorphism}
  1052 \begin{claim}
  1053 \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO.
  1054 \end{claim}
  1055 
  1056 \subsubsection{Subgraph isomorphism}
  1057 \begin{claim}
  1058 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\] is a cutting function by SUB.
  1059 \end{claim}
  1060 
  1061 
  1062 
  1063 \subsection{Implementation details}
  1064 This section provides a detailed summary of an efficient
  1065 implementation of VF2++.
  1066 \subsubsection{Storing a mapping}
  1067 After fixing an arbitrary node order ($u_0, u_1, ..,
  1068 u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store
  1069 the current mapping in the following way.
  1070 \[
  1071  M[i] =
  1072   \begin{cases} 
  1073    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INVALID &
  1074    if\ no\ node\ has\ been\ mapped\ to\ u_i.
  1075   \end{cases}
  1076 \]
  1077 Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$
  1078 means "no node".
  1079 \subsubsection{Avoiding the recurrence}
  1080 Exploring the state space was described in a recursive fashion using
  1081 sets (see Algorithm~\ref{alg:VF2Pseu}), which makes the
  1082 algorithm easy to understand but does not show directly an efficient
  1083 way of implementation. The following approach avoiding recursion and
  1084 using lookup tables instead of sets is not only fast but has linear
  1085 space complexity as well.
  1086 
  1087 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
  1088 as a while loop, which has a loop counter $depth$ denoting the
  1089 all-time depth of the recursion.  Fixing a matching order, let $M$
  1090 denote the array storing the all-time mapping.  The initial state is
  1091 associated with the empty mapping, which means that $\forall i:
  1092 M[i]=INVALID$ and $depth=0$.  In case of a recursive call, $depth$ has
  1093 to be incremented, while in case of a return, it has to be
  1094 decremented.  Based on Claim~\ref{claim:claimCoverFromLeft},
  1095 $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before
  1096 $depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and
  1097 $\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes
  1098 while the state is being processed, but the property is held before
  1099 both stepping back to a predecessor state and exploring a successor
  1100 state.
  1101 
  1102 The necessary part of the candidate set is easily maintainable or
  1103 computable by following
  1104 Section~\ref{candidateComputingVF2}. A much faster method
  1105 has been designed for biological- and sparse graphs, see the next
  1106 section for details.
  1107 
  1108 \subsubsection{Calculating the candidates for a node}
  1109 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
  1110 task is not to maintain the candidate set, but to generate the
  1111 candidate nodes in $G_{large}$ for a given node $u\in V_{small}$.  In
  1112 case of an expanding problem type and $M$ mapping, if a node $v\in
  1113 V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall
  1114 u'\in V_{small} : (u,u')\in
  1115 E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in
  1116 E_{large}$. That is, each covered neighbour of $u$ has to be mapped to
  1117 a covered neighbour of $v$.
  1118 
  1119 Having said that, an algorithm running in $\Theta(deg)$ time is
  1120 describable if there exists a covered node in the component containing
  1121 $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily
  1122 --- such a node exists based on
  1123 Claim~\ref{claim:MOclaim}. With all the candidates of $u$
  1124 being among the uncovered neighbours of $Pair(M,u')$, there are solely
  1125 $deg(Pair(M,u'))$ nodes to check.
  1126 
  1127 An easy trick is to choose an $u'$, for which
  1128 $|\{uncovered\ neighbours\ $ $of\ Pair(M,u')\}|$ is the smallest
  1129 possible.
  1130 
  1131 Note that if $u$ is the first node of its component, then all the
  1132 uncovered nodes of $G_{large}$ are candidates, so giving a sublinear
  1133 method is impossible.
  1134 
  1135 
  1136 \subsubsection{Determining the node order}
  1137 This section describes how the node order preprocessing method of
  1138 VF2++ can efficiently be implemented.
  1139 
  1140 For using lookup tables, the node labels are associated with the
  1141 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
  1142 enables $F_\mathcal{M}$ to be stored in an array, for which
  1143 $F_\mathcal{M}[i]=F_\mathcal{M}(i)$ where $i=0,1,..,|K|-1$. At first,
  1144 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
  1145 in $V_{small}$ having label i, which is easy to compute in
  1146 $\Theta(|V_{small}|)$ steps.
  1147 
  1148 $\mathcal{M}\subseteq V_{small}$ can be represented as an array of
  1149 size $|V_{small}|$.
  1150 
  1151 The BFS tree is computed by using a FIFO data structure which is
  1152 usually implemented as a linked list, but one can avoid it by using
  1153 the array $\mathcal{M}$ itself. $\mathcal{M}$ contains all the nodes
  1154 seen before, a pointer shows where the first node of the FIFO is, and
  1155 another one shows where the next explored node has to be inserted. So
  1156 the nodes of each level of the BFS tree can be processed by
  1157 Algorithms~\ref{alg:VF2PPProcess1} and
  1158 \ref{alg:VF2PPProcess2} in place by swapping nodes.
  1159 
  1160 After a node $u$ gets to the next place of the node order,
  1161 $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is
  1162 one less covered node in $V_{large}$ with label $lab(u)$, that is why
  1163 min selection sort is preferred which gives the elements from left to
  1164 right in descending order, see Algorithm~\ref{alg:VF2PPProcess1}.
  1165 
  1166 Note that using a $\Theta(n^2)$ sort absolutely does not slow down the
  1167 procedure on biological (and on sparse) graphs, since they have few
  1168 nodes on a level. If a level had a large number of nodes,
  1169 Algorithm~\ref{alg:VF2PPProcess2} would seem to be a better
  1170 choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the
  1171 efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not
  1172 be immediately refreshed, so it is unable to provide up-to-date label
  1173 information.
  1174 
  1175 Note that the \textit{while loop} of Algorithm~\ref{alg:VF2PPPseu}
  1176 takes one iteration per graph component and the graphs in biology are
  1177 mostly connected.
  1178 
  1179 \subsubsection{Cutting rules}
  1180 In Section~\ref{VF2PPCuttingRules}, the cutting rules were
  1181 described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$
  1182 and $\tilde T_{large}$, which are dependent on the all-time mapping
  1183 (i.e. on the all-time state). The aim is to check the labeled cutting
  1184 rules of VF2++ in $\Theta(deg)$ time.
  1185 
  1186 Firstly, suppose that these four sets are given in such a way, that
  1187 checking whether a node is in a certain set takes constant time,
  1188 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
  1189 initially zero integer lookup table of size $|K|$. After incrementing
  1190 $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and
  1191 decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap
  1192 T_{large}(s)$, the first part of the cutting rules is checkable in
  1193 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
  1194 to zero takes $\Theta(deg)$ time again, which makes it possible to use
  1195 the same table through the whole algorithm.  The second part of the
  1196 cutting rules can be verified using the same method with $\tilde
  1197 T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and
  1198 $T_{large}$. Thus, the overall complexity is $\Theta(deg)$.
  1199 
  1200 An other integer lookup table storing the number of covered neighbours
  1201 of each node in $G_{large}$ gives all the information about the sets
  1202 $T_{large}$ and $\tilde T_{large}$, which is maintainable in
  1203 $\Theta(deg)$ time when a pair is added or substracted by incrementing
  1204 or decrementing the proper indices. A further improvement is that the
  1205 values of $L[lab(u')]$ in case of checking $u$ is dependent only on
  1206 $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an
  1207 array of pairs (label, number of such labels) can be stored to skip
  1208 the maintaining operations. Note that these arrays are at most of size
  1209 $deg$. Skipping this trick, the number of covered neighbours has to be
  1210 stored for each node of $G_{small}$ as well to get the sets
  1211 $T_{small}$ and $\tilde T_{small}$.
  1212 
  1213 Using similar tricks, the consistency function can be evaluated in
  1214 $\Theta(deg)$ steps, as well.
  1215 
  1216 \section{The VF2 Plus Algorithm}
  1217 The VF2 Plus algorithm is a recently improved version of VF2. It was
  1218 compared with the state of the art algorithms in \cite{VF2Plus} and
  1219 has proven itself to be competitive with RI, the best algorithm on
  1220 biological graphs.  \\ A short summary of VF2 Plus follows, which uses
  1221 the notation and the conventions of the original paper.
  1222 
  1223 \subsection{Ordering procedure}
  1224 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$
  1225 with the lowest probability to find a pair in $V_{small}$ and the
  1226 highest number of connections with the nodes already sorted by the
  1227 algorithm.
  1228 
  1229 \begin{definition}
  1230 $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and
  1231   $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$.
  1232 \end{definition}
  1233 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in
  1234 $V_{large}$
  1235 \newline
  1236 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in
  1237 $V_{large}$
  1238 \newline
  1239 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of
  1240 already sorted nodes, $T$ is the set of nodes candidate to be
  1241 selected, and $degreeM$ of a node is the number of its neighbours in
  1242 $M$.
  1243 \begin{algorithm}
  1244 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
  1245 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
  1246 \algtext*{EndWhile}
  1247 \caption{}\label{alg:VF2PlusPseu}
  1248 \begin{algorithmic}[1]
  1249 \Procedure{VF2 Plus order}{} \State Select the node with the lowest
  1250 $P$.  \If {more nodes share the same $P$} \State select the one with
  1251 maximum degree \EndIf \If {more nodes share the same $P$ and have the
  1252   max degree} \State select the first \EndIf \State Put the selected
  1253 node in the set $M$. \label{alg:putIn} \State Put all its unsorted
  1254 neighbours in the set $T$.  \If {$M\neq V_{small}$} \State From set
  1255 $T$ select the node with maximum $degreeM$.  \If {more nodes have
  1256   maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf
  1257 \If {more nodes have maximum $degreeM$ and $P$} \State Select the
  1258 first.  \EndIf \State \textbf{goto \ref{alg:putIn}.}  \EndIf
  1259 \EndProcedure
  1260 \end{algorithmic}
  1261 \end{algorithm}
  1262 
  1263 Using these notations, Algorithm~\ref{alg:VF2PlusPseu}
  1264 provides the description of the sorting procedure.
  1265 
  1266 Note that $P(u)$ is not the exact probability of finding a consistent
  1267 pair for $u$ by choosing a node of $V_{large}$ randomly, since
  1268 $P_{lab}$ and $P_{deg}$ are not independent, though calculating the
  1269 real probability would take quadratic time, which may be reduced by
  1270 using fittingly lookup tables.
  1271 
  1272 \section{Experimental results}
  1273 This section compares the performance of VF2++ and VF2 Plus. Both
  1274 algorithms have run faster with orders of magnitude than VF2, thus its
  1275 inclusion was not reasonable.
  1276 \subsection{Biological graphs}
  1277 The tests have been executed on a recent biological dataset created
  1278 for the International Contest on Pattern Search in Biological
  1279 Databases\cite{Content}, which has been constructed of Molecule,
  1280 Protein and Contact Map graphs extracted from the Protein Data
  1281 Bank\cite{ProteinDataBank}.
  1282 
  1283 The molecule dataset contains small graphs with less than 100 nodes
  1284 and an average degree of less than 3. The protein dataset contains
  1285 graphs having 500-10 000 nodes and an average degree of 4, while the
  1286 contact map dataset contains graphs with 150-800 nodes and an average
  1287 degree of 20.  \\
  1288 
  1289 In the following, the induced subgraph isomorphism and the graph
  1290 isomorphism will be examined.
  1291 
  1292 \subsubsection{Induced subgraph isomorphism}
  1293 This dataset contains a set of graph pairs, and \textbf{all} the
  1294 induced subgraph ismorphisms have to be found between
  1295 them. Figure~\ref{fig:INDProt}), \ref{fig:INDContact}), and
  1296 \ref{fig:INDMolecule} show the solution time of the
  1297 problems in the problem set.
  1298 
  1299 \begin{figure}[H]
  1300 \begin{center}
  1301 \begin{tikzpicture}
  1302   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1303   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1304     west},scaled x ticks = false,x tick label style={/pgf/number
  1305     format/1000 sep = \thinspace}] %\addplot+[only marks] table
  1306     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]
  1307     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark
  1308       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
  1309   \end{axis}
  1310   \end{tikzpicture}
  1311 \end{center}
  1312 \vspace*{-0.8cm}
  1313 \caption{Both the algorithms have linear behaviour on protein
  1314   graphs. VF2++ is more than 10 times faster than VF2
  1315   Plus.} \label{fig:INDProt}
  1316 \end{figure}
  1317 
  1318 \begin{figure}[H]
  1319 \begin{center}
  1320 \begin{tikzpicture}
  1321 \begin{axis}[title=Contact Maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1322 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1323   west},scaled x ticks = false,x tick label style={/pgf/number
  1324   format/1000 sep = \thinspace}]
  1325 %\addplot+[only marks] table {proteinsOrig.txt};
  1326 \addplot table {Orig/ContactMaps.128.txt};
  1327 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1328         {VF2PPLabel/ContactMaps.128.txt};
  1329 \end{axis}
  1330 \end{tikzpicture}
  1331 \end{center}
  1332 \vspace*{-0.8cm}
  1333 \caption{On Contact Maps, VF2++ runs in near constant time, while VF2
  1334   Plus has a near linear behaviour.} \label{fig:INDContact}
  1335 \end{figure}
  1336 
  1337 \begin{figure}[H]
  1338 \begin{center}
  1339 \begin{tikzpicture}
  1340 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1341 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1342   west},scaled x ticks = false,x tick label style={/pgf/number
  1343   format/1000 sep = \thinspace}]
  1344 %\addplot+[only marks] table {proteinsOrig.txt};
  1345 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark
  1346   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
  1347 \end{axis}
  1348 \end{tikzpicture}
  1349 \end{center}
  1350 \vspace*{-0.8cm}
  1351 \caption{In the case of Molecules, the algorithms seem to have a
  1352   similar behaviour, but VF2++ is almost two times faster even on such
  1353   small graphs.} \label{fig:INDMolecule}
  1354 \end{figure}
  1355 
  1356 
  1357 \subsubsection{Graph ismorphism}
  1358 In this experiment, the nodes of each graph in the database have been
  1359 shuffled and an isomorphism between the shuffled and the original
  1360 graph has been searched. For runtime results, see
  1361 Figure~\ref{fig:ISOProt}, \ref{fig:ISOContact}, and
  1362 \ref{fig:ISOMolecule}.
  1363   
  1364 \begin{figure}[H]
  1365 \begin{center}
  1366 \begin{tikzpicture}
  1367 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1368 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1369   west},scaled x ticks = false,x tick label style={/pgf/number
  1370   format/1000 sep = \thinspace}]
  1371 %\addplot+[only marks] table {proteinsOrig.txt};
  1372 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark
  1373   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
  1374 \end{axis}
  1375 \end{tikzpicture}
  1376 \end{center}
  1377 \vspace*{-0.8cm}
  1378 \caption{On protein graphs, VF2 Plus has a super linear time
  1379   complexity, while VF2++ runs in near constant time. The difference
  1380   is about two order of magnitude on large graphs.}\label{fig:ISOProt}
  1381 \end{figure}
  1382 
  1383 \begin{figure}[H]
  1384 \begin{center}
  1385 \begin{tikzpicture}
  1386 \begin{axis}[title=Contact Maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1387 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1388   west},scaled x ticks = false,x tick label style={/pgf/number
  1389   format/1000 sep = \thinspace}]
  1390 %\addplot+[only marks] table {proteinsOrig.txt};
  1391 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark
  1392   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
  1393 \end{axis}
  1394 \end{tikzpicture}
  1395 \end{center}
  1396 \vspace*{-0.8cm}
  1397 \caption{The results are closer to each other on Contact Maps, but
  1398   VF2++ still performs consistently better.}\label{fig:ISOContact}
  1399 \end{figure}
  1400 
  1401 \begin{figure}[H]
  1402 \begin{center}
  1403 \begin{tikzpicture}
  1404 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1405 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1406   west},scaled x ticks = false,x tick label style={/pgf/number
  1407   format/1000 sep = \thinspace}]
  1408 %\addplot+[only marks] table {proteinsOrig.txt};
  1409 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark
  1410   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
  1411 \end{axis}
  1412 \end{tikzpicture}
  1413 \end{center}
  1414 \vspace*{-0.8cm}
  1415 \caption{In the case of Molecules, there is not such a significant
  1416   difference, but VF2++ seems to be faster as the number of nodes
  1417   increases.}\label{fig:ISOMolecule}
  1418 \end{figure}
  1419 
  1420 
  1421 \subsection{Random graphs}
  1422 This section compares VF2++ with VF2 Plus on random graphs of a large
  1423 size. The node labels are uniformly distributed.  Let $\delta$ denote
  1424 the average degree.  For the parameters of problems solved in the
  1425 experiments, please see the top of each chart.
  1426 \subsubsection{Graph isomorphism}
  1427 To evaluate the efficiency of the algorithms in the case of graph
  1428 isomorphism, connected graphs of less than 20 000 nodes have been
  1429 considered. Generating a random graph and shuffling its nodes, an
  1430 isomorphism had to be found. Figure \ref{fig:randISO5},
  1431 \ref{fig:randISO10}, \ref{fig:randISO15}, \ref{fig:randISO35},
  1432 \ref{fig:randISO45}, and \ref{fig:randISO100} show the runtime results
  1433 on graph sets of various density.
  1434 
  1435 \begin{figure}[H]
  1436 \begin{center}
  1437 \begin{tikzpicture}
  1438 \begin{axis}[title={Random ISO, $\delta = 5$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1439 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1440   west},scaled x ticks = false,x tick label style={/pgf/number
  1441   format/1000 sep = \thinspace}]
  1442 %\addplot+[only marks] table {proteinsOrig.txt};
  1443 \addplot table {randGraph/iso/vf2pIso5_1.txt};
  1444 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1445         {randGraph/iso/vf2ppIso5_1.txt};
  1446 \end{axis}
  1447 \end{tikzpicture}
  1448 \end{center}
  1449 \vspace*{-0.8cm}
  1450 \caption{}\label{fig:randISO5}
  1451 \end{figure}
  1452 
  1453 \begin{figure}[H]
  1454 \begin{center}
  1455 \begin{tikzpicture}
  1456 \begin{axis}[title={Random ISO, $\delta = 10$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1457 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1458   west},scaled x ticks = false,x tick label style={/pgf/number
  1459   format/1000 sep = \thinspace}]
  1460 %\addplot+[only marks] table {proteinsOrig.txt};
  1461 \addplot table {randGraph/iso/vf2pIso10_1.txt};
  1462 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1463         {randGraph/iso/vf2ppIso10_1.txt};
  1464 \end{axis}
  1465 \end{tikzpicture}
  1466 \end{center}
  1467 \vspace*{-0.8cm}
  1468 \caption{}\label{fig:randISO10}
  1469 \end{figure}
  1470 
  1471 \begin{figure}[H]
  1472 \begin{center}
  1473 \begin{tikzpicture}
  1474 \begin{axis}[title={Random ISO, $\delta = 15$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1475 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1476   west},scaled x ticks = false,x tick label style={/pgf/number
  1477   format/1000 sep = \thinspace}]
  1478 %\addplot+[only marks] table {proteinsOrig.txt};
  1479 \addplot table {randGraph/iso/vf2pIso15_1.txt};
  1480 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1481         {randGraph/iso/vf2ppIso15_1.txt};
  1482 \end{axis}
  1483 \end{tikzpicture}
  1484 \end{center}
  1485 \vspace*{-0.8cm}
  1486 \caption{}\label{fig:randISO15}
  1487 \end{figure}
  1488 
  1489 \begin{figure}[H]
  1490 \begin{center}
  1491 \begin{tikzpicture}
  1492 \begin{axis}[title={Random ISO, $\delta = 35$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1493 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1494   west},scaled x ticks = false,x tick label style={/pgf/number
  1495   format/1000 sep = \thinspace}]
  1496 %\addplot+[only marks] table {proteinsOrig.txt};
  1497 \addplot table {randGraph/iso/vf2pIso35_1.txt};
  1498 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1499         {randGraph/iso/vf2ppIso35_1.txt};
  1500 \end{axis}
  1501 \end{tikzpicture}
  1502 \end{center}
  1503 \vspace*{-0.8cm}
  1504 \caption{}\label{fig:randISO35}
  1505 \end{figure}
  1506 
  1507 \begin{figure}[H]
  1508 \begin{center}
  1509 \begin{tikzpicture}
  1510 \begin{axis}[title={Random ISO, $\delta = 45$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1511 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1512   west},scaled x ticks = false,x tick label style={/pgf/number
  1513   format/1000 sep = \thinspace}]
  1514 %\addplot+[only marks] table {proteinsOrig.txt};
  1515 \addplot table {randGraph/iso/vf2pIso45_1.txt};
  1516 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1517         {randGraph/iso/vf2ppIso45_1.txt};
  1518 \end{axis}
  1519 \end{tikzpicture}
  1520 \end{center}
  1521 \vspace*{-0.8cm}
  1522 \caption{}\label{fig:randISO45}
  1523 \end{figure}
  1524 
  1525 \begin{figure}[H]
  1526 \begin{center}
  1527 \begin{tikzpicture}
  1528 \begin{axis}[title={Random ISO, $\delta = 100$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1529 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1530   west},scaled x ticks = false,x tick label style={/pgf/number
  1531   format/1000 sep = \thinspace}]
  1532 %\addplot+[only marks] table {proteinsOrig.txt};
  1533 \addplot table {randGraph/iso/vf2pIso100_1.txt};
  1534 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1535         {randGraph/iso/vf2ppIso100_1.txt};
  1536 \end{axis}
  1537 \end{tikzpicture}
  1538 \end{center}
  1539 \vspace*{-0.8cm}
  1540 \caption{}\label{fig:randISO100}
  1541 \end{figure}
  1542 
  1543 
  1544 Considering the graph isomorphism problem, VF2++ consistently
  1545 outperforms its rival especially on sparse graphs. The reason for the
  1546 slightly super linear behaviour of VF2++ on denser graphs is the
  1547 larger number of nodes in the BFS tree constructed in
  1548 Algorithm~\ref{alg:VF2PPPseu}.
  1549 
  1550 \subsubsection{Induced subgraph isomorphism}
  1551 This section provides a comparison of VF2++ and VF2 Plus in the case
  1552 of induced subgraph isomorphism. In addition to the size of the large
  1553 graph, that of the small graph dramatically influences the hardness of
  1554 a given problem too, so the overall picture is provided by examining
  1555 small graphs of various size.
  1556 
  1557 For each chart, a number $0<\rho< 1$ has been fixed and the following
  1558 has been executed 150 times. Generating a large graph $G_{large}$,
  1559 choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes,
  1560 and for all the 10 subgraphs find a mapping by using both the graph
  1561 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
  1562 0.3, 0.6, 0.8, 0.95$ cases have been examined (see
  1563 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
  1564 \ref{fig:randIND35}, and for each $\delta$, a cumulative chart is
  1565 given as well, which excludes $\rho = 0.05$ and $0.1$ for the sake of
  1566 perspicuity (see Figure~\ref{fig:randIND5Sum}, \ref{fig:randIND10Sum}
  1567 and \ref{fig:randIND35Sum}).
  1568 
  1569 
  1570 
  1571 
  1572 
  1573 \begin{figure}[H]
  1574 \vspace*{-0.8cm}
  1575 \begin{subfigure}[b]{0.55\textwidth}
  1576 \begin{center}
  1577 \begin{tikzpicture}
  1578 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1579 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1580   west},scaled x ticks = false,x tick label style={/pgf/number
  1581   format/1000 sep = \space}]
  1582 %\addplot+[only marks] table {proteinsOrig.txt};
  1583 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
  1584 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1585         {randGraph/ind/vf2ppInd5_0.05.txt};
  1586 \end{axis}
  1587 \end{tikzpicture}
  1588 \end{center}
  1589      \end{subfigure}
  1590      \begin{subfigure}[b]{0.55\textwidth}
  1591 \begin{center}
  1592 \begin{tikzpicture}
  1593 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1594 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1595   west},scaled x ticks = false,x tick label style={/pgf/number
  1596   format/1000 sep = \space}]
  1597 %\addplot+[only marks] table {proteinsOrig.txt};
  1598 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
  1599 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1600         {randGraph/ind/vf2ppInd5_0.1.txt};
  1601 \end{axis}
  1602 \end{tikzpicture}
  1603 \end{center}
  1604 \end{subfigure}
  1605 \hspace{1cm}
  1606 \begin{subfigure}[b]{0.55\textwidth}
  1607 \begin{center}
  1608 \begin{tikzpicture}
  1609 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1610 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1611   west},scaled x ticks = false,x tick label style={/pgf/number
  1612   format/1000 sep = \space}]
  1613 %\addplot+[only marks] table {proteinsOrig.txt};
  1614 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
  1615 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1616         {randGraph/ind/vf2ppInd5_0.3.txt};
  1617 \end{axis}
  1618 \end{tikzpicture}
  1619 \end{center}
  1620      \end{subfigure}
  1621      \begin{subfigure}[b]{0.55\textwidth}
  1622 \begin{center}
  1623 \begin{tikzpicture}
  1624 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1625 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1626   west},scaled x ticks = false,x tick label style={/pgf/number
  1627   format/1000 sep = \space}]
  1628 %\addplot+[only marks] table {proteinsOrig.txt};
  1629 \addplot table {randGraph/ind/vf2pInd5_0.6.txt};
  1630 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1631         {randGraph/ind/vf2ppInd5_0.6.txt};
  1632 \end{axis}
  1633 \end{tikzpicture}
  1634 \end{center}
  1635 \end{subfigure}
  1636 \begin{subfigure}[b]{0.55\textwidth}
  1637           
  1638 \begin{tikzpicture}
  1639 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1640 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1641   west},scaled x ticks = false,x tick label style={/pgf/number
  1642   format/1000 sep = \space}]
  1643 %\addplot+[only marks] table {proteinsOrig.txt};
  1644 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
  1645 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1646         {randGraph/ind/vf2ppInd5_0.8.txt};
  1647 \end{axis}
  1648 \end{tikzpicture}
  1649      \end{subfigure}
  1650      \begin{subfigure}[b]{0.55\textwidth}
  1651 \begin{tikzpicture}
  1652 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1653 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1654   west},scaled x ticks = false,x tick label style={/pgf/number
  1655   format/1000 sep = \thinspace}]
  1656 %\addplot+[only marks] table {proteinsOrig.txt};
  1657 \addplot table {randGraph/ind/vf2pInd5_0.95.txt};
  1658 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1659         {randGraph/ind/vf2ppInd5_0.95.txt};
  1660 \end{axis}
  1661 \end{tikzpicture}
  1662 \end{subfigure}
  1663 \vspace*{-0.8cm}
  1664 \caption{IND on graphs having an average degree of
  1665   5.}\label{fig:randIND5}
  1666 \end{figure}
  1667 
  1668 \begin{figure}[H]
  1669 \begin{center}
  1670 \begin{tikzpicture}
  1671 \begin{axis}[title={Rand IND Summary, $\delta = 5$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1672 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
  1673   west},scaled x ticks = false,x tick label style={/pgf/number
  1674   format/1000 sep = \thinspace}]
  1675 %\addplot+[only marks] table {proteinsOrig.txt};
  1676 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1677         {randGraph/ind/vf2pInd5_0.3.txt}; \addplot[mark=triangle*,mark
  1678           size=1.8pt,color=red] table
  1679         {randGraph/ind/vf2ppInd5_0.3.txt}; \addplot[mark=*,mark
  1680           size=1.5pt,color=blue] table
  1681         {randGraph/ind/vf2pInd5_0.6.txt}; \addplot[mark=triangle*,mark
  1682           size=1.8pt,color=red] table
  1683         {randGraph/ind/vf2ppInd5_0.6.txt}; \addplot[mark=*,mark
  1684           size=1.5pt,color=blue] table
  1685         {randGraph/ind/vf2pInd5_0.8.txt}; \addplot[mark=triangle*,mark
  1686           size=1.8pt,color=red] table
  1687         {randGraph/ind/vf2ppInd5_0.8.txt}; \addplot[mark=*,mark
  1688           size=1.5pt,color=blue] table
  1689         {randGraph/ind/vf2pInd5_0.95.txt};
  1690         \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1691                 {randGraph/ind/vf2ppInd5_0.95.txt};
  1692 \end{axis}
  1693 \end{tikzpicture}
  1694 \end{center}
  1695 \vspace*{-0.8cm}
  1696 \caption{Cummulative chart for $\delta=5$.}\label{fig:randIND5Sum}
  1697 \end{figure}
  1698 
  1699 
  1700 
  1701 \begin{figure}[H]
  1702 \vspace*{-0.8cm}
  1703 \begin{subfigure}[b]{0.55\textwidth}
  1704 \begin{center}
  1705 \begin{tikzpicture}
  1706 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1707 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1708   west},scaled x ticks = false,x tick label style={/pgf/number
  1709   format/1000 sep = \space}]
  1710 %\addplot+[only marks] table {proteinsOrig.txt};
  1711 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
  1712 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1713         {randGraph/ind/vf2ppInd10_0.05.txt};
  1714 \end{axis}
  1715 \end{tikzpicture}
  1716 \end{center}
  1717      \end{subfigure}
  1718      \begin{subfigure}[b]{0.55\textwidth}
  1719 \begin{center}
  1720 \begin{tikzpicture}
  1721 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1722 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1723   west},scaled x ticks = false,x tick label style={/pgf/number
  1724   format/1000 sep = \space}]
  1725 %\addplot+[only marks] table {proteinsOrig.txt};
  1726 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
  1727 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1728         {randGraph/ind/vf2ppInd10_0.1.txt};
  1729 \end{axis}
  1730 \end{tikzpicture}
  1731 \end{center}
  1732 \end{subfigure}
  1733 \hspace{1cm}
  1734 \begin{subfigure}[b]{0.55\textwidth}
  1735 \begin{center}
  1736 \begin{tikzpicture}
  1737 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1738 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1739   west},scaled x ticks = false,x tick label style={/pgf/number
  1740   format/1000 sep = \space}]
  1741 %\addplot+[only marks] table {proteinsOrig.txt};
  1742 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
  1743 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1744         {randGraph/ind/vf2ppInd10_0.3.txt};
  1745 \end{axis}
  1746 \end{tikzpicture}
  1747 \end{center}
  1748      \end{subfigure}
  1749      \begin{subfigure}[b]{0.55\textwidth}
  1750 \begin{center}
  1751 \begin{tikzpicture}
  1752 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1753 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1754   west},scaled x ticks = false,x tick label style={/pgf/number
  1755   format/1000 sep = \space}]
  1756 %\addplot+[only marks] table {proteinsOrig.txt};
  1757 \addplot table {randGraph/ind/vf2pInd10_0.6.txt};
  1758 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1759         {randGraph/ind/vf2ppInd10_0.6.txt};
  1760 \end{axis}
  1761 \end{tikzpicture}
  1762 \end{center}
  1763 \end{subfigure}
  1764 \begin{subfigure}[b]{0.55\textwidth}
  1765           
  1766 \begin{tikzpicture}
  1767 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1768 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1769   west},scaled x ticks = false,x tick label style={/pgf/number
  1770   format/1000 sep = \space}]
  1771 %\addplot+[only marks] table {proteinsOrig.txt};
  1772 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
  1773 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1774         {randGraph/ind/vf2ppInd10_0.8.txt};
  1775 \end{axis}
  1776 \end{tikzpicture}
  1777      \end{subfigure}
  1778      \begin{subfigure}[b]{0.55\textwidth}
  1779 \begin{tikzpicture}
  1780 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1781 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1782   west},scaled x ticks = false,x tick label style={/pgf/number
  1783   format/1000 sep = \thinspace}]
  1784 %\addplot+[only marks] table {proteinsOrig.txt};
  1785 \addplot table {randGraph/ind/vf2pInd10_0.95.txt};
  1786 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1787         {randGraph/ind/vf2ppInd10_0.95.txt};
  1788 \end{axis}
  1789 \end{tikzpicture}
  1790 \end{subfigure}
  1791 \vspace*{-0.8cm}
  1792 \caption{IND on graphs having an average degree of
  1793   10.}\label{fig:randIND10}
  1794 \end{figure}
  1795 
  1796 \begin{figure}[H]
  1797 \begin{center}
  1798 \begin{tikzpicture}
  1799 \begin{axis}[title={Rand IND Summary, $\delta = 10$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1800 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
  1801   west},scaled x ticks = false,x tick label style={/pgf/number
  1802   format/1000 sep = \thinspace}]
  1803 %\addplot+[only marks] table {proteinsOrig.txt};
  1804 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1805         {randGraph/ind/vf2pInd10_0.3.txt};
  1806         \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1807                 {randGraph/ind/vf2ppInd10_0.3.txt};
  1808                 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1809                         {randGraph/ind/vf2pInd10_0.6.txt};
  1810                         \addplot[mark=triangle*,mark
  1811                           size=1.8pt,color=red] table
  1812                                 {randGraph/ind/vf2ppInd10_0.6.txt};
  1813                                 \addplot[mark=*,mark
  1814                                   size=1.5pt,color=blue] table
  1815                                         {randGraph/ind/vf2pInd10_0.8.txt};
  1816                                         \addplot[mark=triangle*,mark
  1817                                           size=1.8pt,color=red] table
  1818                                                 {randGraph/ind/vf2ppInd10_0.8.txt};
  1819                                                 \addplot[mark=*,mark
  1820                                                   size=1.5pt,color=blue]
  1821                                                 table
  1822                                                 {randGraph/ind/vf2pInd10_0.95.txt};
  1823                                                 \addplot[mark=triangle*,mark
  1824                                                   size=1.8pt,color=red]
  1825                                                 table
  1826                                                 {randGraph/ind/vf2ppInd10_0.95.txt};
  1827 \end{axis}
  1828 \end{tikzpicture}
  1829 \end{center}
  1830 \vspace*{-0.8cm}
  1831 \caption{Cummulative chart for $\delta=10$.}\label{fig:randIND10Sum}
  1832 \end{figure}
  1833 
  1834 
  1835 
  1836 \begin{figure}[H]
  1837 \vspace*{-0.8cm}
  1838 \begin{subfigure}[b]{0.55\textwidth}
  1839 \begin{center}
  1840 \begin{tikzpicture}
  1841 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1842 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1843   west},scaled x ticks = false,x tick label style={/pgf/number
  1844   format/1000 sep = \space}]
  1845 %\addplot+[only marks] table {proteinsOrig.txt};
  1846 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
  1847 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1848         {randGraph/ind/vf2ppInd35_0.05.txt};
  1849 \end{axis}
  1850 \end{tikzpicture}
  1851 \end{center}
  1852      \end{subfigure}
  1853      \begin{subfigure}[b]{0.55\textwidth}
  1854 \begin{center}
  1855 \begin{tikzpicture}
  1856 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1857 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1858   west},scaled x ticks = false,x tick label style={/pgf/number
  1859   format/1000 sep = \space}]
  1860 %\addplot+[only marks] table {proteinsOrig.txt};
  1861 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
  1862 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1863         {randGraph/ind/vf2ppInd35_0.1.txt};
  1864 \end{axis}
  1865 \end{tikzpicture}
  1866 \end{center}
  1867 \end{subfigure}
  1868 \hspace{1cm}
  1869 \begin{subfigure}[b]{0.55\textwidth}
  1870 \begin{center}
  1871 \begin{tikzpicture}
  1872 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1873 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1874   west},scaled x ticks = false,x tick label style={/pgf/number
  1875   format/1000 sep = \space}]
  1876 %\addplot+[only marks] table {proteinsOrig.txt};
  1877 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
  1878 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1879         {randGraph/ind/vf2ppInd35_0.3.txt};
  1880 \end{axis}
  1881 \end{tikzpicture}
  1882 \end{center}
  1883      \end{subfigure}
  1884      \begin{subfigure}[b]{0.55\textwidth}
  1885 \begin{center}
  1886 \begin{tikzpicture}
  1887 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1888 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1889   west},scaled x ticks = false,x tick label style={/pgf/number
  1890   format/1000 sep = \space}]
  1891 %\addplot+[only marks] table {proteinsOrig.txt};
  1892 \addplot table {randGraph/ind/vf2pInd35_0.6.txt};
  1893 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1894         {randGraph/ind/vf2ppInd35_0.6.txt};
  1895 \end{axis}
  1896 \end{tikzpicture}
  1897 \end{center}
  1898 \end{subfigure}
  1899 \begin{subfigure}[b]{0.55\textwidth}
  1900           
  1901 \begin{tikzpicture}
  1902 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1903 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1904   west},scaled x ticks = false,x tick label style={/pgf/number
  1905   format/1000 sep = \space}]
  1906 %\addplot+[only marks] table {proteinsOrig.txt};
  1907 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
  1908 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1909         {randGraph/ind/vf2ppInd35_0.8.txt};
  1910 \end{axis}
  1911 \end{tikzpicture}
  1912      \end{subfigure}
  1913      \begin{subfigure}[b]{0.55\textwidth}
  1914 \begin{tikzpicture}
  1915 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1916 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1917   west},scaled x ticks = false,x tick label style={/pgf/number
  1918   format/1000 sep = \thinspace}]
  1919 %\addplot+[only marks] table {proteinsOrig.txt};
  1920 \addplot table {randGraph/ind/vf2pInd35_0.95.txt};
  1921 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1922         {randGraph/ind/vf2ppInd35_0.95.txt};
  1923 \end{axis}
  1924 \end{tikzpicture}
  1925 \end{subfigure}
  1926 \vspace*{-0.8cm}
  1927 \caption{IND on graphs having an average degree of
  1928   35.}\label{fig:randIND35}
  1929 \end{figure}
  1930 
  1931 \begin{figure}[H]
  1932 \begin{center}
  1933 \begin{tikzpicture}
  1934 \begin{axis}[title={Rand IND Summary, $\delta = 35$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid
  1935 =major,mark size=1pt, legend style={at={(0,1)},anchor=north
  1936   west},scaled x ticks = false,x tick label style={/pgf/number
  1937   format/1000 sep = \thinspace}]
  1938 %\addplot+[only marks] table {proteinsOrig.txt};
  1939 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1940         {randGraph/ind/vf2pInd35_0.3.txt};
  1941         \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1942                 {randGraph/ind/vf2ppInd35_0.3.txt};
  1943                 \addplot[mark=*,mark size=1.5pt,color=blue] table
  1944                         {randGraph/ind/vf2pInd35_0.6.txt};
  1945                         \addplot[mark=triangle*,mark
  1946                           size=1.8pt,color=red] table
  1947                                 {randGraph/ind/vf2ppInd35_0.6.txt};
  1948                                 \addplot[mark=*,mark
  1949                                   size=1.5pt,color=blue] table
  1950                                         {randGraph/ind/vf2pInd35_0.8.txt};
  1951                                         \addplot[mark=triangle*,mark
  1952                                           size=1.8pt,color=red] table
  1953                                                 {randGraph/ind/vf2ppInd35_0.8.txt};
  1954                                                 \addplot[mark=*,mark
  1955                                                   size=1.5pt,color=blue]
  1956                                                 table
  1957                                                 {randGraph/ind/vf2pInd35_0.95.txt};
  1958                                                 \addplot[mark=triangle*,mark
  1959                                                   size=1.8pt,color=red]
  1960                                                 table
  1961                                                 {randGraph/ind/vf2ppInd35_0.95.txt};
  1962 \end{axis}
  1963 \end{tikzpicture}
  1964 \end{center}
  1965 \vspace*{-0.8cm}
  1966 \caption{Cummulative chart for $\delta=35$.}\label{fig:randIND35Sum}
  1967 \end{figure}
  1968 
  1969 Based on these experiments, VF2++ is faster than VF2 Plus and able to
  1970 handle really large graphs in milliseconds. Note that when $IND$ was
  1971 considered and the small graphs had proportionally few nodes ($\rho =
  1972 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
  1973 orders (e.g. see the $\delta=10$ case on
  1974 Figure~\ref{fig:randIND10}). If these examples had been excluded, the
  1975 charts would have seemed to be similar to the other ones.
  1976 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
  1977 Plus slow slightly down, but remain practically usable even on graphs
  1978 having 10 000 nodes.
  1979 
  1980 
  1981 
  1982 
  1983 
  1984 \section{Conclusion}
  1985 In this paper, after providing a short summary of the recent
  1986 algorithms, a new graph matching algorithm based on VF2, called VF2++,
  1987 has been presented and analyzed from a practical viewpoint.
  1988 
  1989 Recognizing the importance of the node order and determining an
  1990 efficient one, VF2++ is able to match graphs of thousands of nodes in
  1991 near practically linear time including preprocessing. In addition to
  1992 the proper order, VF2++ uses more efficient consistency and cutting
  1993 rules which are easy to compute and make the algorithm able to prune
  1994 most of the unfruitful branches without going astray.
  1995 
  1996 In order to show the efficiency of the new method, it has been
  1997 compared to VF2 Plus, which is the best concurrent algorithm based on
  1998 \cite{VF2Plus}.
  1999 
  2000 The experiments show that VF2++ consistently outperforms VF2 Plus on
  2001 biological graphs. It seems to be asymptotically faster on protein and
  2002 on contact map graphs in the case of induced subgraph isomorphism,
  2003 while in the case of graph isomorphism, it has definitely better
  2004 asymptotic behaviour on protein graphs.
  2005 
  2006 Regarding random sparse graphs, not only has VF2++ proved itself to be
  2007 faster than VF2 Plus, but it has a practically linear behaviour both
  2008 in the case of induced subgraph- and graph isomorphism, as well.
  2009 
  2010 
  2011 
  2012 %% The Appendices part is started with the command \appendix;
  2013 %% appendix sections are then done as normal sections
  2014 %% \appendix
  2015 
  2016 %% \section{}
  2017 %% \label{}
  2018 
  2019 %% If you have bibdatabase file and want bibtex to generate the
  2020 %% bibitems, please use
  2021 %%
  2022 \bibliographystyle{elsarticle-num} \bibliography{bibliography}
  2023 
  2024 %% else use the following coding to input the bibitems directly in the
  2025 %% TeX file.
  2026 
  2027 %% \begin{thebibliography}{00}
  2028 
  2029 %% %% \bibitem{label}
  2030 %% %% Text of bibliographic item
  2031 
  2032 %% \bibitem{}
  2033 
  2034 %% \end{thebibliography}
  2035 
  2036 \end{document}
  2037 \endinput
  2038 %%
  2039 %% End of file `elsarticle-template-num.tex'.