damecco.tex
author Madarasi Peter
Wed, 30 Nov 2016 16:13:29 +0100
changeset 21 54e95bfead8c
parent 19 b9a8744c5efc
child 22 1a4874982d84
permissions -rw-r--r--
refs
     1 %% 
     2 %% Copyright 2007, 2008, 2009 Elsevier Ltd
     3 %% 
     4 %% This file is part of the 'Elsarticle Bundle'.
     5 %% ---------------------------------------------
     6 %% 
     7 %% It may be distributed under the conditions of the LaTeX Project Public
     8 %% License, either version 1.2 of this license or (at your option) any
     9 %% later version.  The latest version of this license is in
    10 %%    http://www.latex-project.org/lppl.txt
    11 %% and version 1.2 or later is part of all distributions of LaTeX
    12 %% version 1999/12/01 or later.
    13 %% 
    14 %% The list of all files belonging to the 'Elsarticle Bundle' is
    15 %% given in the file `manifest.txt'.
    16 %% 
    17 
    18 %% Template article for Elsevier's document class `elsarticle'
    19 %% with numbered style bibliographic references
    20 %% SP 2008/03/01
    21 
    22 \documentclass[preprint,12pt]{elsarticle}
    23 
    24 %% Use the option review to obtain double line spacing
    25 %% \documentclass[authoryear,preprint,review,12pt]{elsarticle}
    26 
    27 %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
    28 %% for a journal layout:
    29 %% \documentclass[final,1p,times]{elsarticle}
    30 %% \documentclass[final,1p,times,twocolumn]{elsarticle}
    31 %% \documentclass[final,3p,times]{elsarticle}
    32 %% \documentclass[final,3p,times,twocolumn]{elsarticle}
    33 %% \documentclass[final,5p,times]{elsarticle}
    34 %% \documentclass[final,5p,times,twocolumn]{elsarticle}
    35 
    36 %% For including figures, graphicx.sty has been loaded in
    37 %% elsarticle.cls. If you prefer to use the old commands
    38 %% please give \usepackage{epsfig}
    39 
    40 %% The amssymb package provides various useful mathematical symbols
    41 \usepackage{amssymb}
    42 %% The amsthm package provides extended theorem environments
    43 %% \usepackage{amsthm}
    44 
    45 %% The lineno packages adds line numbers. Start line numbering with
    46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
    47 %% for the whole article with \linenumbers.
    48 %% \usepackage{lineno}
    49 
    50 \usepackage{amsmath}
    51 %% \usepackage[pdftex]{graphicx}
    52 
    53 \usepackage{pgfplots}
    54 \pgfplotsset{width=9cm}
    55 \pgfplotsset{compat=1.8}
    56 
    57 \usepackage{caption}
    58 \usepackage{subcaption} 
    59 
    60 \usepackage{algorithm}
    61 \usepackage{algpseudocode}
    62 \usepackage{tikz}
    63 
    64 \usepackage{amsthm,amssymb}
    65 \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}
    66 
    67 \newtheorem{theorem}{Theorem}[subsection]
    68 \newtheorem{corollary}{Corollary}[theorem]
    69 \newtheorem{claim}[theorem]{Claim}
    70 
    71 \newtheorem{definition}{Definition}[subsection]
    72 \newtheorem{notation}{Notation}[subsection]
    73 \newtheorem{example}{Example}[subsection]
    74 \usetikzlibrary{decorations.markings}
    75 \let\oldproofname=\proofname
    76 %% \renewcommand{\proofname}{\rm\bf{Proof:}}
    77 
    78 \captionsetup{font=normalsize}
    79 
    80 \journal{Discrete Applied Mathematics}
    81 
    82 \begin{document}
    83 
    84 \begin{frontmatter}
    85 
    86 %% Title, authors and addresses
    87 
    88 %% use the tnoteref command within \title for footnotes;
    89 %% use the tnotetext command for theassociated footnote;
    90 %% use the fnref command within \author or \address for footnotes;
    91 %% use the fntext command for theassociated footnote;
    92 %% use the corref command within \author for corresponding author footnotes;
    93 %% use the cortext command for theassociated footnote;
    94 %% use the ead command for the email address,
    95 %% and the form \ead[url] for the home page:
    96 %% \title{Title\tnoteref{label1}}
    97 %% \tnotetext[label1]{}
    98 %% \author{Name\corref{cor1}\fnref{label2}}
    99 %% \ead{email address}
   100 %% \ead[url]{home page}
   101 %% \fntext[label2]{}
   102 %% \cortext[cor1]{}
   103 %% \address{Address\fnref{label3}}
   104 %% \fntext[label3]{}
   105 
   106 \title{Improved Algorithms for Matching Biological Graphs}
   107 
   108 %% use optional labels to link authors explicitly to addresses:
   109 %% \author[label1,label2]{}
   110 %% \address[label1]{}
   111 %% \address[label2]{}
   112 
   113 \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi}
   114 
   115 \address{Dept of Operations Research, ELTE}
   116 
   117 \begin{abstract}
   118 Subgraph isomorphism is a well-known NP-Complete problem, while its
   119 special case, the graph isomorphism problem is one of the few problems
   120 in NP neither known to be in P nor NP-Complete. Their appearance in
   121 many fields of application such as pattern analysis, computer vision
   122 questions and the analysis of chemical and biological systems has
   123 fostered the design of various algorithms for handling special graph
   124 structures.
   125 
   126 This paper presents VF2++, a new algorithm based on the original VF2,
   127 which runs significantly faster on most test cases and performs
   128 especially well on special graph classes stemming from biological
   129 questions. VF2++ handles graphs of thousands of nodes in practically
   130 near linear time including preprocessing. Not only is it an improved
   131 version of VF2, but in fact, it is by far the fastest existing
   132 algorithm especially on biological graphs.
   133 
   134 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking
   135 into account the structure and the node labeling of the graph, VF2++
   136 determines a state order in which most of the unfruitful branches of
   137 the search space can be pruned immediately. Secondly, introducing more
   138 efficient - nevertheless still easier to compute - cutting rules
   139 reduces the chance of going astray even further.
   140 
   141 In addition to the usual subgraph isomorphism, specialized versions
   142 for induced subgraph isomorphism and for graph isomorphism are
   143 presented. VF2++ has gained a runtime improvement of one order of
   144 magnitude respecting induced subgraph isomorphism and a better
   145 asymptotical behaviour in the case of graph isomorphism problem.
   146 
   147 After having provided the description of VF2++, in order to evaluate
   148 its effectiveness, an extensive comparison to the contemporary other
   149 algorithms is shown, using a wide range of inputs, including both real
   150 life biological and chemical datasets and standard randomly generated
   151 graph series.
   152 
   153 The work was motivated and sponsored by QuantumBio Inc., and all the
   154 developed algorithms are available as the part of the open source
   155 LEMON graph and network optimization library
   156 (http://lemon.cs.elte.hu).
   157 \end{abstract}
   158 
   159 \begin{keyword}
   160 %% keywords here, in the form: keyword \sep keyword
   161 
   162 %% PACS codes here, in the form: \PACS code \sep code
   163 
   164 %% MSC codes here, in the form: \MSC code \sep code
   165 %% or \MSC[2008] code \sep code (2000 is the default)
   166 
   167 \end{keyword}
   168 
   169 \end{frontmatter}
   170 
   171 %% \linenumbers
   172 
   173 %% main text
   174 \section{Introduction}
   175 \label{sec:intro}
   176 
   177 In the last decades, combinatorial structures, and especially graphs
   178 have been considered with ever increasing interest, and applied to the
   179 solution of several new and revised questions.  The expressiveness,
   180 the simplicity and the studiedness of graphs make them practical for
   181 modelling and appear constantly in several seemingly independent
   182 fields, such as bioinformatics and chemistry.
   183 
   184 Complex biological systems arise from the interaction and cooperation
   185 of plenty of molecular components. Getting acquainted with such
   186 systems at the molecular level is of primary importance, since
   187 protein-protein interaction, DNA-protein interaction, metabolic
   188 interaction, transcription factor binding, neuronal networks, and
   189 hormone signaling networks can be understood this way.
   190 
   191 Many chemical and biological structures can easily be modeled
   192 as graphs, for instance, a molecular structure can be
   193 considered as a graph, whose nodes correspond to atoms and whose
   194 edges to chemical bonds. The similarity and dissimilarity of
   195 objects corresponding to nodes are incorporated to the model
   196 by \emph{node labels}. Understanding such networks basically
   197 requires finding specific subgraphs, thus calls for efficient
   198 graph matching algorithms.
   199 
   200 Other real-world fields related to some
   201 variants of graph matching include pattern recognition
   202 and machine vision \cite{HorstBunkeApplications}, symbol recognition
   203 \cite{CordellaVentoSymbolRecognition}, face identification
   204 \cite{JianzhuangYongFaceIdentification}.  \\
   205 
   206 Subgraph and induced subgraph matching problems are known to be
   207 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is
   208 one of the few problems in NP neither known to be in P nor
   209 NP-Complete. Although polynomial time isomorphism algorithms are known
   210 for various graph classes, like trees and planar
   211 graphs\cite{PlanarGraphIso}, bounded valence
   212 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
   213 or permutation graphs\cite{PermGraphIso}, and recently, an FPT algorithm has been presented for the coloured hypergraph isomorphism problem in \cite{ColoredHiperGraphIso}.
   214 
   215 In the following, some algorithms based on other approaches are
   216 summarized, which do not need any restrictions on the graphs. Even though,
   217 an overall polynomial behaviour is not expectable from such an
   218 alternative, it may often have good practical performance, in fact,
   219 it might be the best choice even on a graph class for which polynomial
   220 algorithm is known.
   221 
   222 The first practically usable approach was due to
   223 \emph{Ullmann}\cite{Ullmann} which is a commonly used depth-first
   224 search based algorithm with a complex heuristic for reducing the
   225 number of visited states. A major problem is its $\Theta(n^3)$ space
   226 complexity, which makes it impractical in the case of big sparse
   227 graphs.
   228 
   229 In a recent paper, Ullmann\cite{UllmannBit} presents an
   230 improved version of this algorithm based on a bit-vector solution for
   231 the binary Constraint Satisfaction Problem.
   232 
   233 The \emph{Nauty} algorithm\cite{Nauty} transforms the two graphs to
   234 a canonical form before starting to check for the isomorphism. It has
   235 been considered as one of the fastest graph isomorphism algorithms,
   236 although graph categories were shown in which it takes exponentially
   237 many steps. This algorithm handles only the graph isomorphism problem.
   238 
   239 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search
   240 strategy and formulates the matching as a Constraint Satisfaction
   241 Problem to prune the search tree. The constraints are that the mapping
   242 has to be injective and edge-preserving, hence it is possible to
   243 handle new matching types as well.
   244 
   245 The \emph{RI} algorithm\cite{RI} and its variations are based on a
   246 state space representation. After reordering the nodes of the graphs,
   247 it uses some fast executable heuristic checks without using any
   248 complex pruning rules. It seems to run really efficiently on graphs
   249 coming from biology, and won the International Contest on Pattern
   250 Search in Biological Databases\cite{Content}.
   251 
   252 The currently most commonly used algorithm is the
   253 \emph{VF2}\cite{VF2}, the improved version of \emph{VF}\cite{VF}, which was
   254 designed for solving pattern matching and computer vision problems,
   255 and has been one of the best overall algorithms for more than a
   256 decade. Although, it can't be up to new specialized algorithms, it is
   257 still widely used due to its simplicity and space efficiency. VF2 uses
   258 a state space representation and checks some conditions in each state
   259 to prune the search tree.
   260 
   261 Meanwhile, another variant called \emph{VF2 Plus}\cite{VF2Plus} has
   262 been published. It is considered to be as efficient as the RI
   263 algorithm and has a strictly better behavior on large graphs.  The
   264 main idea of VF2 Plus is to precompute a heuristic node order of the
   265 small graph, in which the VF2 works more efficiently.
   266 
   267 This paper introduces \emph{VF2++}, a new further improved algorithm
   268 for the graph and (induced)subgraph isomorphism problem, which uses
   269 efficient cutting rules and determines a node order in which VF2 runs
   270 significantly faster on practical inputs.
   271 
   272 This project was initiated and sponsored by QuantumBio
   273 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
   274 code --- has been published as a part of LEMON\cite{LEMON} open source
   275 graph library.
   276 
   277 \section{Problem Statement}
   278 This section provides a formal description of the problems to be
   279 solved.
   280 \subsection{Definitions}
   281 
   282 Throughout the paper $G_{1}=(V_{1}, E_{1})$ and
   283 $G_{2}=(V_{2}, E_{2})$ denote two undirected graphs.
   284 
   285 \begin{definition}
   286 $\mathcal{L}: (V_{1}\cup V_{2}) \longrightarrow K$ is a \textbf{node
   287     label function}, where K is an arbitrary set. The elements in K
   288   are the \textbf{node labels}. Two nodes, u and v are said to be
   289   \textbf{equivalent} if $\mathcal{L}(u)=\mathcal{L}(v)$.
   290 \end{definition}
   291 
   292 For the sake of simplicity, in this paper the graph, subgraph and induced subgraph isomorphisms are defined in a more general way.
   293 
   294 \begin{definition}\label{sec:ismorphic}
   295 $G_{1}$ and $G_{2}$ are \textbf{isomorphic} (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}:
   296   V_{1} \longrightarrow V_{2}$ bijection, for which the
   297   following is true:
   298 \begin{center}
   299 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and\\
   300 $\forall u,v\in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in{E_{2}}$
   301 \end{center}
   302 \end{definition}
   303 
   304 \begin{definition}
   305 $G_{1}$ is a \textbf{subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}:
   306   V_{1}\longrightarrow V_{2}$ injection, for which the
   307   following is true:
   308 \begin{center}
   309 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and\\
   310 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Rightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$
   311 \end{center}
   312 \end{definition}
   313 
   314 \begin{definition} 
   315 $G_{1}$ is an \textbf{induced subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists
   316   \mathfrak{m}: V_{1}\longrightarrow V_{2}$ injection, for which the
   317   following is true:
   318 \begin{center}
   319 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and
   320 
   321 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow
   322   (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$
   323 \end{center}
   324 \end{definition}
   325 
   326 
   327 \subsection{Common problems}\label{sec:CommProb}
   328 
   329 The focus of this paper is on two extensively studied topics, the
   330 subgraph isomorphism and its variations. However, the following
   331 problems also appear in many applications.
   332 
   333 The \textbf{subgraph matching problem} is the following: is
   334 $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node
   335 label?
   336 
   337 The \textbf{induced subgraph matching problem} asks the same about the
   338 existence of an induced subgraph.
   339 
   340 The \textbf{graph isomorphism problem} can be defined as induced
   341 subgraph matching problem where the sizes of the two graphs are equal.
   342 
   343 In addition to existence, it may be needed to show such a subgraph, or
   344 it may be necessary to list all of them.
   345 
   346 It should be noted that some authors misleadingly refer to the term
   347 \emph{subgraph isomorphism problem} as an \emph{induced subgraph
   348   isomorphism problem}.
   349 
   350 The following sections give the descriptions of VF2, VF2++, VF2 Plus
   351 and a particular comparison.
   352 
   353 \section{The VF2 Algorithm}
   354 This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2
   355 is able to handle all the variations mentioned in Section
   356   \ref{sec:CommProb}.  Although it can also handle directed graphs,
   357 for the sake of simplicity, only the undirected case will be
   358 discussed.
   359 
   360 
   361 \subsection{Common notations}
   362 \indent Assume $G_{1}$ is searched in $G_{2}$.  The following
   363 definitions and notations will be used throughout the whole paper.
   364 \begin{definition}
   365 An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$.
   366 \end{definition}
   367 
   368 \begin{notation}
   369 $\mathfrak{D}(f)$ and $\mathfrak{R}(f)$ denote the domain and the range of a function $f$, respectively.
   370 \end{notation}
   371 
   372 \begin{definition}
   373 Mapping $\mathfrak{m}$ \textbf{covers} a node $u\in V_1\cup V_2$ if $u\in \mathfrak{D}(\mathfrak{m})\cup \mathfrak{R}(\mathfrak{m})$.
   374 \end{definition}
   375 
   376 \begin{definition}
   377 A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the
   378 nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$.
   379 \end{definition}
   380 
   381 \begin{definition}
   382 Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : \mathfrak{m}(w)=f(w)$ and $f(u)=v$ holds. Where $u\in V_1\setminus\mathfrak{D}(\mathfrak{m})$ and $v\in V_2\setminus\mathfrak{R}(\mathfrak{m})$, otherwise $extend(\mathfrak{m},(u,v))$ is undefined.
   383 \end{definition}
   384 
   385 \begin{notation}
   386 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
   387 which can be substituted by any of the $\mathbf{ISO}$, $\mathbf{SUB}$
   388 and $\mathbf{IND}$ problems.
   389 \end{notation}
   390 
   391 \begin{definition}
   392 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
   393 \textbf{consistency function by } $\mathbf{PT}$ if the following
   394 holds. If there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$.
   395 \end{definition}
   396 
   397 \begin{definition} 
   398 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
   399 \textbf{cutting function by } $\mathbf{PT}$ if the following
   400 holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$.
   401 \end{definition}
   402 
   403 \begin{definition}
   404 $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if
   405   $Cons_{PT}(\mathfrak{m})$ is true.
   406 \end{definition}
   407 
   408 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
   409 \begin{notation}
   410 Let $\mathbf{Cons_{PT}(p, \mathfrak{m})}:=Cons_{PT}(extend(\mathfrak{m},p))$, and
   411 $\mathbf{Cut_{PT}(p, \mathfrak{m})}:=Cut_{PT}(extend(\mathfrak{m},p))$, where
   412 $p\in{V_{1}\backslash\mathfrak{D}(\mathfrak{m}) \!\times\!V_{2}\backslash\mathfrak{R}(\mathfrak{m})}$.
   413 \end{notation}
   414 
   415 $Cons_{PT}$ will be used to check the consistency of the already
   416 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
   417 no whole consistent mapping can contain the current mapping.
   418 
   419 \subsection{Overview of the algorithm}
   420 VF2 uses a state space representation of mappings, $Cons_{PT}$ for
   421 excluding inconsistency with the problem type and $Cut_{PT}$ for
   422 pruning the search tree.
   423 
   424 Algorithm~\ref{alg:VF2Pseu} is a high level description of
   425 the VF2 matching algorithm. Each state of the matching process can
   426 be associated with a mapping $\mathfrak{m}$. The initial state
   427 is associated with a mapping $\mathfrak{m}$, for which
   428 $\mathfrak{D}(\mathfrak{m})=\emptyset$, i.e. it starts with an empty mapping.
   429 
   430 
   431 \begin{algorithm}
   432 \algtext*{EndIf}%ne nyomtasson end if-et
   433 \algtext*{EndFor}%ne
   434 \algtext*{EndProcedure}%ne nyomtasson ..
   435 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
   436 \begin{algorithmic}[1]
   437 
   438 \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$}
   439   \If{$\mathfrak{m}$ covers
   440     $V_{1}$} \State Output($\mathfrak{m}$)
   441   \Else
   442   \State Compute the set $P_\mathfrak{m}$ of the pairs candidate for inclusion
   443   in $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$
   444     $\neg$Cut$_{PT}$($p,\mathfrak{m}$)}
   445     \State \textbf{call}
   446   VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
   447 \end{algorithmic}
   448 \end{algorithm}
   449 
   450 
   451 For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of
   452 candidate node pairs for adding to the current mapping $\mathfrak{m}_s$.
   453 
   454 For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and
   455 $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and
   456 the latter is false, the whole process is recursively applied to
   457 $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it
   458 can be proved that $\mathfrak{m}$ can not be extended to a whole mapping.
   459 
   460 In order to make sure of the correctness, see
   461 \begin{claim}
   462 Through consistent mappings, only consistent whole mappings can be
   463 reached, and all the consistent whole mappings are reachable through
   464 consistent mappings.
   465 \end{claim}
   466 
   467 Note that a mapping may be reached in exponentially many different ways, since the
   468 order of extensions does not influence the nascent mapping.
   469 
   470 However, one may observe
   471 
   472 \begin{claim}
   473 \label{claim:claimTotOrd}
   474 Let $\prec$ be an arbitrary total ordering relation on $V_{1}$.  If
   475 the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which
   476 \begin{center}
   477 $\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$,
   478 \end{center}
   479 then no mapping can be reached more than once, and each whole mapping remains reachable.
   480 \end{claim}
   481 
   482 Note that the cornerstone of the improvements to VF2 is a proper
   483 choice of a total ordering.
   484 
   485 \subsection{The candidate set}
   486 \label{candidateComputingVF2}
   487 Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$.
   488 
   489 \begin{notation}
   490 Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and
   491  $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$.
   492 \end{notation}
   493 
   494 The set $P_\mathfrak{m}$ includes the pairs of uncovered neighbours of covered
   495 nodes, and if there is not such a node pair, all the pairs containing
   496 two uncovered nodes are added. Formally, let
   497 \[
   498  P_\mathfrak{m}\!=\!
   499   \begin{cases} 
   500    T_{1}(\mathfrak{m})\times T_{2}(\mathfrak{m})&\hspace{-0.15cm}\text{if }
   501    T_{1}(\mathfrak{m})\!\neq\!\emptyset\ \text{and }T_{2}(\mathfrak{m})\!\neq
   502    \emptyset,\\ (V_{1}\!\setminus\!\mathfrak{D}(\mathfrak{m}))\!\times\!(V_{2}\!\setminus\!\mathfrak{R}(\mathfrak{m}))
   503    &\hspace{-0.15cm}\text{otherwise}.
   504   \end{cases}
   505 \]
   506 
   507 \subsection{Consistency}
   508 Suppose $p=(u,v)$, where $u\in V_{1}$ and $v\in V_{2}$, $\mathfrak{m}$ is a consistent mapping by
   509 $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether
   510 including pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$.
   511 
   512 For example, the consistency function of induced subgraph isomorphism is as follows.
   513 \begin{notation}
   514 Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} :
   515 (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2}
   516   (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$.
   517 \end{notation}
   518 
   519 $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ $\Leftrightarrow
   520 (\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}
   521 \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. The
   522 following formulation gives an efficient way of calculating
   523 $Cons_{IND}$.
   524 \begin{claim}
   525 $Cons_{IND}((u,v),\mathfrak{m}):=\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge
   526   (\forall \tilde{u}\in \Gamma_{1}(u)
   527   \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is a
   528   consistency function in the case of $IND$.
   529 \end{claim}
   530 
   531 \subsection{Cutting rules}
   532 $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently
   533 verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can
   534 be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a
   535 whole mapping.
   536 
   537 As an example, the cutting function of induced subgraph isomorphism is presented.
   538 \begin{notation}
   539 Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash
   540 \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and
   541 \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash
   542 \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$.
   543 \end{notation}
   544 
   545 \begin{claim}
   546 $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| <
   547   |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap
   548   \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap
   549   \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function by $IND$.
   550 \end{claim}
   551 
   552 \section{The VF2++ Algorithm}
   553 Although any total ordering relation makes the search space of VF2 a
   554 tree, its choice turns out to dramatically influence the number of
   555 visited states. The goal is to determine an efficient one as quickly
   556 as possible.
   557 
   558 The main reason for VF2++' superiority over VF2 is twofold. Firstly,
   559 taking into account the structure and the node labeling of the graph,
   560 VF2++ determines a state order in which most of the unfruitful
   561 branches of the search space can be pruned immediately. Secondly,
   562 introducing more efficient --- nevertheless still easier to compute
   563 --- cutting rules reduces the chance of going astray even further.
   564 
   565 In addition to the usual subgraph isomorphism, specialized versions
   566 for induced subgraph isomorphism and for graph isomorphism have been
   567 designed. VF2++ has gained a runtime improvement of one order of
   568 magnitude respecting induced subgraph isomorphism and a better
   569 asymptotical behaviour in the case of graph isomorphism problem.
   570 
   571 Note that a weaker version of the cutting rules and the more efficient
   572 candidate set calculating were described in \cite{VF2Plus}, too.
   573 
   574 It should be noted that all the methods described in this section are
   575 extendable to handle directed graphs and edge labels as well.\newline
   576 
   577 The basic ideas and the detailed description of VF2++ are provided in
   578 the following.
   579 
   580 The goal is to find a matching order in which the algorithm is able to
   581 recognize inconsistency or prune the infeasible branches on the
   582 highest levels and goes deep only if it is needed.
   583 
   584 \begin{notation}
   585 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{1}(u)\cap H\}|$, that is the
   586 number of neighbours of u which are in H, where $u\in V_{1} $ and
   587 $H\subseteq V_{1}$.
   588 \end{notation}
   589 
   590 The principal question is the following. Suppose a mapping $\mathfrak{m}$ is
   591 given. For which node of $T_{1}(\mathfrak{m})$ is the hardest to find a
   592 consistent pair in $G_{2}$? The more covered neighbours a node in
   593 $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has
   594 ---, the more rarely satisfiable consistency constraints for its pair
   595 are given.
   596 
   597 In biology, most of the graphs are sparse, thus several nodes in
   598 $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes
   599 reasonable to define a secondary and a tertiary order between them.
   600 The observation above proves itself to be as determining, that the
   601 secondary ordering prefers nodes with the most uncovered neighbours
   602 among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase
   603 $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes so much, as possible.  The
   604 tertiary ordering prefers nodes having the rarest uncovered labels.
   605 
   606 Note that the secondary ordering is the same as the ordering by $deg$,
   607 which is a static data in front of the above used.
   608 
   609 These rules can easily result in a matching order which contains the
   610 nodes of a long path successively, whose nodes may have low $Conn$ and
   611 is easily matchable into $G_{2}$. To avoid that, a BFS order is
   612 used, which provides the shortest possible paths.
   613 \newline
   614 
   615 In the following, some examples on which the VF2 may be slow are
   616 described, although they are easily solvable by using a proper
   617 matching order.
   618 
   619 \begin{example}
   620 Suppose $G_{1}$ can be mapped into $G_{2}$ in many ways
   621 without node labels. Let $u\in V_{1}$ and $v\in V_{2}$.
   622 \newline
   623 $\mathcal{L}(u):=black$
   624 \newline
   625 $\mathcal{L}(v):=black$
   626 \newline
   627 $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in (V_{1}\backslash
   628 \{u\})$
   629 \newline
   630 $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in (V_{2}\backslash
   631 \{v\})$
   632 \newline
   633 
   634 Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since
   635 $u$ is black and no node in $V_{2}$ has a black label except
   636 $v$. If unfortunately $u$ were the last node which will get covered,
   637 VF2 would check only in the last steps, whether $u$ can be matched to
   638 $v$.
   639 \newline
   640 However, had $u$ been the first matched node, u would have been
   641 matched immediately to v, so all the mappings would have been
   642 precluded in which node labels can not correspond.
   643 \end{example}
   644 
   645 \begin{example}
   646 Suppose there is no node label given, $G_{1}$ is a small graph and
   647 can not be mapped into $G_{2}$ and $u\in V_{1}$.
   648 \newline
   649 Let $G'_{1}:=(V_{1}\cup
   650 \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup
   651 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
   652 $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint
   653 from $G_{1}$ and one of its starting points is connected to $u\in
   654 V_{1}\}$.
   655 \newline
   656 Is there a subgraph of $G_{2}$, which is isomorph with
   657 $G'_{1}$?
   658 \newline
   659 If unfortunately the nodes of the path were the first $k$ nodes in the
   660 matching order, the algorithm would iterate through all the possible k
   661 long paths in $G_{2}$, and it would recognize that no path can be
   662 extended to $G'_{1}$.
   663 \newline
   664 However, had it started by the matching of $G_{1}$, it would not
   665 have matched any nodes of the path.
   666 \end{example}
   667 
   668 These examples may look artificial, but the same problems also appear
   669 in real-world instances, even though in a less obvious way.
   670 
   671 \subsection{Preparations}
   672 \begin{claim}
   673 \label{claim:claimCoverFromLeft}
   674 The total ordering relation uniquely determines a node order, in which
   675 the nodes of $V_{1}$ will be covered by VF2. From the point of
   676 view of the matching procedure, this means, that always the same node
   677 of $G_{1}$ will be covered on the d-th level.
   678 \end{claim}
   679 
   680 \begin{definition}
   681 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of
   682 $V_{1}$ is \textbf{matching order} if exists $\prec$ total
   683 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
   684 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$.
   685 \end{definition}
   686 
   687 \begin{claim}\label{claim:MOclaim}
   688 A total ordering is matching order iff the nodes of every component
   689 form an interval in the node sequence, and every node connects to a
   690 previous node in its component except the first node of each component.
   691 \end{claim}
   692 
   693 To summing up, a total ordering always uniquely determines a matching
   694 order, and every matching order can be determined by a total ordering,
   695 however, more than one different total orderings may determine the
   696 same matching order.
   697 
   698 \subsection{Total ordering}
   699 The matching order will be searched directly.
   700 \begin{notation}
   701 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} :
   702 l=\mathcal{L}(v)\}|-|\{u\in V_{1}\backslash \mathcal{M} : l=\mathcal{L}(u)\}|$ ,
   703 where $l$ is a label and $\mathcal{M}\subseteq V_{1}$.
   704 \end{notation}
   705 
   706 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
   707 \end{definition}
   708 
   709 \begin{algorithm}
   710 \algtext*{EndIf}
   711 \algtext*{EndProcedure}
   712 \algtext*{EndWhile}
   713 \algtext*{EndFor}
   714 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
   715 \begin{algorithmic}[1]
   716 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$
   717 \Comment{matching order} \While{$V_{1}\backslash \mathcal{M}
   718   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
   719 min$_{F_\mathcal{M}\circ \mathcal{L}}(V_{1}\backslash
   720 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
   721 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
   722 $d$-th level \State Process $V_d$ \Comment{See Algorithm
   723   \ref{alg:VF2PPProcess1}} \EndFor
   724 \EndWhile \EndProcedure
   725 \end{algorithmic}
   726 \end{algorithm}
   727 
   728 \begin{algorithm}
   729 \algtext*{EndIf}
   730 \algtext*{EndProcedure}%ne nyomtasson ..
   731 \algtext*{EndWhile}
   732 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
   733 \begin{algorithmic}[1]
   734 \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$}
   735 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ \mathcal{L}}($ arg max$_{deg}($arg
   736 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
   737 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
   738 $F_\mathcal{M}$ \EndWhile \EndProcedure
   739 \end{algorithmic}
   740 \end{algorithm}
   741 
   742 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
   743 matching order procedure of VF2++. It computes a BFS tree for each
   744 component in ascending order of their rarest node labels and largest $deg$,
   745 whose root vertex is the component's minimal
   746 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending
   747 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately
   748 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.
   749 
   750 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}
   751 provides a matching order.
   752 
   753 
   754 \subsection{Cutting rules}
   755 \label{VF2PPCuttingRules}
   756 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels.
   757 \begin{notation}
   758 Let $\mathbf{\Gamma_{1}^{l}(u)}:=\{\tilde{u} : \mathcal{L}(\tilde{u})=l
   759 \wedge \tilde{u}\in \Gamma_{1} (u)\}$ and
   760 $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge
   761 \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in
   762 V_{2}$ and $l$ is a label.
   763 \end{notation}
   764 
   765 \subsubsection{Induced subgraph isomorphism}
   766 \begin{claim}
   767 \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by IND.
   768 \end{claim}
   769 \subsubsection{Graph isomorphism}
   770 \begin{claim}
   771 \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by ISO.
   772 \end{claim}
   773 
   774 \subsubsection{Subgraph isomorphism}
   775 \begin{claim}
   776 \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function by SUB.
   777 \end{claim}
   778 
   779 
   780 
   781 \section{Implementation details}
   782 This section provides a detailed summary of an efficient
   783 implementation of VF2++.
   784 \subsubsection{Storing a mapping}
   785 After fixing an arbitrary node order ($u_0, u_1, ..,
   786 u_{|G_{1}|-1}$) of $G_{1}$, an array $M$ is usable to store
   787 the current mapping in the following way.
   788 \[
   789  M[i] =
   790   \begin{cases} 
   791    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D &
   792    if\ no\ node\ has\ been\ mapped\ to\ u_i,
   793   \end{cases}
   794 \]
   795 where $i\in\{0,1, ..,|G_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$
   796 means "no node".
   797 \subsubsection{Avoiding the recurrence}
   798 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
   799 as a \textit{while loop}, which has a loop counter $depth$ denoting the
   800 all-time depth of the recursion. Fixing a matching order, let $M$
   801 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},
   802 $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before
   803 $depth$. $M[depth]$ changes
   804 while the state is being processed, but the property is held before
   805 both stepping back to a predecessor state and exploring a successor
   806 state.
   807 
   808 The necessary part of the candidate set is easily maintainable or
   809 computable by following
   810 Section~\ref{candidateComputingVF2}. A much faster method
   811 has been designed for biological- and sparse graphs, see the next
   812 section for details.
   813 
   814 \subsubsection{Calculating the candidates for a node}
   815 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
   816 task is not to maintain the candidate set, but to generate the
   817 candidate nodes in $G_{2}$ for a given node $u\in V_{1}$.  In
   818 case of any of the three problem types and a mapping $\mathfrak{m}$, if a node $v\in
   819 V_{2}$ is a potential pair of $u\in V_{1}$, then $\forall
   820 u'\in \mathfrak{D}(\mathfrak{m}) : (u,u')\in
   821 E_{1}\Rightarrow (v,\mathfrak{m}(u'))\in
   822 E_{2}$. That is, each covered neighbour of $u$ has to be mapped to
   823 a covered neighbour of $v$.
   824 
   825 Having said that, an algorithm running in $\Theta(deg)$ time is
   826 describable if there exists a covered node in the component containing
   827 $u$, and a linear one otherwise.
   828 
   829 
   830 \subsubsection{Determining the node order}
   831 This section describes how the node order preprocessing method of
   832 VF2++ can efficiently be implemented.
   833 
   834 For using lookup tables, the node labels are associated with the
   835 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
   836 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order
   837 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
   838 in $V_{1}$ having label $i$, which is easy to compute in
   839 $\Theta(|V_{1}|)$ steps.
   840 
   841 Representing $\mathcal{M}\subseteq V_{1}$ as an array of
   842 size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.
   843 
   844 \subsubsection{Cutting rules}
   845 In Section~\ref{VF2PPCuttingRules}, the cutting rules were
   846 described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$
   847 and $\tilde T_{2}$, which are dependent on the all-time mapping
   848 (i.e. on the all-time state). The aim is to check the labeled cutting
   849 rules of VF2++ in $\Theta(deg)$ time.
   850 
   851 Firstly, suppose that these four sets are given in such a way, that
   852 checking whether a node is in a certain set takes constant time,
   853 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
   854 initially zero integer lookup table of size $|K|$. After incrementing
   855 $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and
   856 decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap
   857 T_{2}(s)$, the first part of the cutting rules is checkable in
   858 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
   859 to zero takes $\Theta(deg)$ time again, which makes it possible to use
   860 the same table through the whole algorithm. The second part of the
   861 cutting rules can be verified using the same method with $\tilde
   862 T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and
   863 $T_{2}$. Thus, the overall complexity is $\Theta(deg)$.
   864 
   865 Another integer lookup table storing the number of covered neighbours
   866 of each node in $G_{2}$ gives all the information about the sets
   867 $T_{2}$ and $\tilde T_{2}$, which is maintainable in
   868 $\Theta(deg)$ time when a pair is added or substracted by incrementing
   869 or decrementing the proper indices. A further improvement is that the
   870 values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on
   871 $u$, i.e. on the size of the mapping, so for each $u\in V_{1}$ an
   872 array of pairs (label, number of such labels) can be stored to skip
   873 the maintaining operations. Note that these arrays are at most of size
   874 $deg$.
   875 
   876 Using similar techniques, the consistency function can be evaluated in
   877 $\Theta(deg)$ steps, as well.
   878 
   879 \section{Experimental results}
   880 This section compares the performance of VF2++ and VF2 Plus. According to
   881 our experience, both algorithms run faster than VF2 with orders of
   882 magnitude, thus its inclusion was not reasonable.
   883 
   884 The algorithms were implemented in C++ using the open source
   885 LEMON graph and network optimization library\cite{LEMON}. The test were carried out on a linux based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM.
   886 \subsection{Biological graphs}
   887 The tests have been executed on a recent biological dataset created
   888 for the International Contest on Pattern Search in Biological
   889 Databases\cite{Content}, which has been constructed of molecule,
   890 protein and contact map graphs extracted from the Protein Data
   891 Bank\cite{ProteinDataBank}.
   892 
   893 The molecule dataset contains small graphs with less than 100 nodes
   894 and an average degree of less than 3. The protein dataset contains
   895 graphs having 500-10 000 nodes and an average degree of 4, while the
   896 contact map dataset contains graphs with 150-800 nodes and an average
   897 degree of 20.  \\
   898 
   899 In the following, both the induced subgraph isomorphism and the graph
   900 isomorphism will be examined.
   901 
   902 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}.
   903 
   904 In an other experiment, the nodes of each graph in the database had been
   905 shuffled, and an isomorphism between the shuffled and the original
   906 graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}.
   907 
   908 
   909 \begin{figure}[H]
   910 \vspace*{-2cm}
   911 \hspace*{-1.5cm}
   912 \begin{subfigure}[b]{0.55\textwidth}
   913 \begin{figure}[H]
   914 \begin{tikzpicture}[trim axis left, trim axis right]
   915 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   916 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   917   west},scaled x ticks = false,x tick label style={/pgf/number
   918   format/1000 sep = \thinspace}]
   919 %\addplot+[only marks] table {proteinsOrig.txt};
   920 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark
   921   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
   922 \end{axis}
   923 \end{tikzpicture}
   924 \caption{In the case of molecules, the algorithms have
   925   similar behaviour, but VF2++ is almost two times faster even on such
   926   small graphs.} \label{fig:INDMolecule}
   927 \end{figure}
   928 \end{subfigure}
   929 \hspace*{1.5cm}
   930 \begin{subfigure}[b]{0.55\textwidth}
   931 \begin{figure}[H]
   932 \begin{tikzpicture}[trim axis left, trim axis right]
   933 \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   934 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   935   west},scaled x ticks = false,x tick label style={/pgf/number
   936   format/1000 sep = \thinspace}]
   937 %\addplot+[only marks] table {proteinsOrig.txt};
   938 \addplot table {Orig/ContactMaps.128.txt};
   939 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
   940         {VF2PPLabel/ContactMaps.128.txt};
   941 \end{axis}
   942 \end{tikzpicture}
   943 \caption{On contact maps, VF2++ runs almost in constant time, while VF2
   944   Plus has a near linear behaviour.} \label{fig:INDContact}
   945 \end{figure}
   946 \end{subfigure}
   947 
   948 \begin{center}
   949 \vspace*{-0.5cm}
   950 \begin{subfigure}[b]{0.55\textwidth}
   951 \begin{figure}[H]
   952 \begin{tikzpicture}[trim axis left, trim axis right]
   953   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   954   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   955     west},scaled x ticks = false,x tick label style={/pgf/number
   956     format/1000 sep = \thinspace}] %\addplot+[only marks] table
   957     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]
   958     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark
   959       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
   960   \end{axis}
   961   \end{tikzpicture}
   962 \caption{Both the algorithms have linear behaviour on protein
   963   graphs. VF2++ is more than 10 times faster than VF2
   964   Plus.} \label{fig:INDProt}
   965 \end{figure}
   966 \end{subfigure}
   967 \end{center}
   968 \vspace*{-0.5cm}
   969 \caption{\normalsize{Induced subgraph isomorphism on biological graphs}}\label{fig:bioIND}
   970 \end{figure}
   971 
   972 
   973 \begin{figure}[H]
   974 \vspace*{-2cm}
   975 \hspace*{-1.5cm}
   976 \begin{subfigure}[b]{0.55\textwidth}
   977 \begin{figure}[H]
   978 \begin{tikzpicture}[trim axis left, trim axis right]
   979 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   980 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   981   west},scaled x ticks = false,x tick label style={/pgf/number
   982   format/1000 sep = \thinspace}]
   983 %\addplot+[only marks] table {proteinsOrig.txt};
   984 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark
   985   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
   986 \end{axis}
   987 \end{tikzpicture}
   988 \caption{In the case of molecules, there is not such a significant
   989   difference, but VF2++ seems to be faster as the number of nodes
   990   increases.}\label{fig:ISOMolecule}
   991 \end{figure}
   992 \end{subfigure}
   993 \hspace*{1.5cm}
   994 \begin{subfigure}[b]{0.55\textwidth}
   995 \begin{figure}[H]
   996 \begin{tikzpicture}[trim axis left, trim axis right]
   997 \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   998 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   999   west},scaled x ticks = false,x tick label style={/pgf/number
  1000   format/1000 sep = \thinspace}]
  1001 %\addplot+[only marks] table {proteinsOrig.txt};
  1002 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark
  1003   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
  1004 \end{axis}
  1005 \end{tikzpicture}
  1006 \caption{The results are closer to each other on contact maps, but
  1007   VF2++ still performs consistently better.}\label{fig:ISOContact}
  1008 \end{figure}
  1009 \end{subfigure}
  1010 
  1011 \begin{center}
  1012 \vspace*{-0.5cm}
  1013 \begin{subfigure}[b]{0.55\textwidth}
  1014 \begin{figure}[H]
  1015 \begin{tikzpicture}[trim axis left, trim axis right]
  1016 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
  1017 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1018   west},scaled x ticks = false,x tick label style={/pgf/number
  1019   format/1000 sep = \thinspace}]
  1020 %\addplot+[only marks] table {proteinsOrig.txt};
  1021 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark
  1022   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
  1023 \end{axis}
  1024 \end{tikzpicture}
  1025 \caption{On protein graphs, VF2 Plus has a super linear time
  1026   complexity, while VF2++ runs in near constant time. The difference
  1027   is about two order of magnitude on large graphs.}\label{fig:ISOProt}
  1028 \end{figure}
  1029 \end{subfigure}
  1030 \end{center}
  1031 \vspace*{-0.6cm}
  1032 \caption{\normalsize{Graph isomorphism on biological graphs}}\label{fig:bioISO}
  1033 \end{figure}
  1034 
  1035 
  1036 
  1037 
  1038 \subsection{Random graphs}
  1039 This section compares VF2++ with VF2 Plus on random graphs of a large
  1040 size. The node labels are uniformly distributed.  Let $\delta$ denote
  1041 the average degree.  For the parameters of problems solved in the
  1042 experiments, please see the top of each chart.
  1043 \subsubsection{Graph isomorphism}
  1044 To evaluate the efficiency of the algorithms in the case of graph
  1045 isomorphism, random connected graphs of less than 20 000 nodes have been
  1046 considered. Generating a random graph and shuffling its nodes, an
  1047 isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results
  1048 on graph sets of various density.
  1049 
  1050 
  1051 
  1052 
  1053 \begin{figure}
  1054 \vspace*{-1.5cm}
  1055 \hspace*{-1.5cm}
  1056 \begin{subfigure}[b]{0.55\textwidth}
  1057 \begin{center}
  1058 \begin{tikzpicture}
  1059 \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1060 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1061   west},scaled x ticks = false,x tick label style={/pgf/number
  1062   format/1000 sep = \space}]
  1063 %\addplot+[only marks] table {proteinsOrig.txt};
  1064 \addplot table {randGraph/iso/vf2pIso5_1.txt};
  1065 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1066         {randGraph/iso/vf2ppIso5_1.txt};
  1067 \end{axis}
  1068 \end{tikzpicture}
  1069 \end{center}
  1070 \end{subfigure}
  1071 %\hspace{1cm}
  1072 \begin{subfigure}[b]{0.55\textwidth}
  1073 \begin{center}
  1074 \begin{tikzpicture}
  1075 \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1076 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1077   west},scaled x ticks = false,x tick label style={/pgf/number
  1078   format/1000 sep = \space}]
  1079 %\addplot+[only marks] table {proteinsOrig.txt};
  1080 \addplot table {randGraph/iso/vf2pIso10_1.txt};
  1081 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1082         {randGraph/iso/vf2ppIso10_1.txt};
  1083 \end{axis}
  1084 \end{tikzpicture}
  1085 \end{center}
  1086 \end{subfigure}
  1087 %%\hspace{1cm}
  1088 \hspace*{-1.5cm}
  1089 \begin{subfigure}[b]{0.55\textwidth}
  1090 \begin{center}
  1091 \begin{tikzpicture}
  1092 \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1093 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1094   west},scaled x ticks = false,x tick label style={/pgf/number
  1095   format/1000 sep = \space}]
  1096 %\addplot+[only marks] table {proteinsOrig.txt};
  1097 \addplot table {randGraph/iso/vf2pIso15_1.txt};
  1098 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1099         {randGraph/iso/vf2ppIso15_1.txt};
  1100 \end{axis}
  1101 \end{tikzpicture}
  1102 \end{center}
  1103      \end{subfigure}
  1104      \begin{subfigure}[b]{0.55\textwidth}
  1105 \begin{center}
  1106 \begin{tikzpicture}
  1107 \begin{axis}[title={Random ISO, $\delta = 35$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1108 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1109   west},scaled x ticks = false,x tick label style={/pgf/number
  1110   format/1000 sep = \space}]
  1111 %\addplot+[only marks] table {proteinsOrig.txt};
  1112 \addplot table {randGraph/iso/vf2pIso35_1.txt};
  1113 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1114         {randGraph/iso/vf2ppIso35_1.txt};
  1115 \end{axis}
  1116 \end{tikzpicture}
  1117 \end{center}
  1118 \end{subfigure}
  1119 \begin{subfigure}[b]{0.55\textwidth}
  1120 \hspace*{-1.5cm}
  1121 \begin{tikzpicture}
  1122 \begin{axis}[title={Random ISO, $\delta = 45$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1123 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1124   west},scaled x ticks = false,x tick label style={/pgf/number
  1125   format/1000 sep = \space}]
  1126 %\addplot+[only marks] table {proteinsOrig.txt};
  1127 \addplot table {randGraph/iso/vf2pIso45_1.txt};
  1128 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1129         {randGraph/iso/vf2ppIso45_1.txt};
  1130 \end{axis}
  1131 \end{tikzpicture}
  1132 \end{subfigure}
  1133 \hspace*{-1.5cm}
  1134 \begin{subfigure}[b]{0.55\textwidth}
  1135 \begin{tikzpicture}
  1136 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1137 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1138   west},scaled x ticks = false,x tick label style={/pgf/number
  1139   format/1000 sep = \thinspace}]
  1140 %\addplot+[only marks] table {proteinsOrig.txt};
  1141 \addplot table {randGraph/iso/vf2pIso100_1.txt};
  1142 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1143         {randGraph/iso/vf2ppIso100_1.txt};
  1144 \end{axis}
  1145 \end{tikzpicture}
  1146 \end{subfigure}
  1147 \vspace*{-0.8cm}
  1148 \caption{IND on graphs having an average degree of
  1149   5.}\label{fig:randISO}
  1150 \end{figure}
  1151 
  1152 
  1153 \subsubsection{Induced subgraph isomorphism}
  1154 This section presents a comparison of VF2++ and VF2 Plus in the case
  1155 of induced subgraph isomorphism. In addition to the size of the large
  1156 graph, that of the small graph dramatically influences the hardness of
  1157 a given problem too, so the overall picture is provided by examining
  1158 small graphs of various size.
  1159 
  1160 For each chart, a number $0<\rho< 1$ has been fixed, and the following
  1161 has been executed 150 times. Generating a large graph $G_{2}$ of an average degree of $\delta$,
  1162 choose 10 of its induced subgraphs having $\rho\ |V_{2}|$ nodes,
  1163 and for all the 10 subgraphs find a mapping by using both the graph
  1164 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
  1165 0.3, 0.6, 0.8, 0.95$ cases have been examined, see
  1166 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
  1167 \ref{fig:randIND35}.
  1168 
  1169 
  1170 
  1171 
  1172 
  1173 \begin{figure}
  1174 \vspace*{-1.5cm}
  1175 \hspace*{-1.5cm}
  1176 \begin{subfigure}[b]{0.55\textwidth}
  1177 \begin{center}
  1178 \begin{tikzpicture}
  1179 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1180 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1181   west},scaled x ticks = false,x tick label style={/pgf/number
  1182   format/1000 sep = \space}]
  1183 %\addplot+[only marks] table {proteinsOrig.txt};
  1184 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
  1185 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1186         {randGraph/ind/vf2ppInd5_0.05.txt};
  1187 \end{axis}
  1188 \end{tikzpicture}
  1189 \end{center}
  1190      \end{subfigure}
  1191      \begin{subfigure}[b]{0.55\textwidth}
  1192 \begin{center}
  1193 \begin{tikzpicture}
  1194 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1195 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1196   west},scaled x ticks = false,x tick label style={/pgf/number
  1197   format/1000 sep = \space}]
  1198 %\addplot+[only marks] table {proteinsOrig.txt};
  1199 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
  1200 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1201         {randGraph/ind/vf2ppInd5_0.1.txt};
  1202 \end{axis}
  1203 \end{tikzpicture}
  1204 \end{center}
  1205 \end{subfigure}
  1206 \hspace*{-1.5cm}
  1207 \begin{subfigure}[b]{0.55\textwidth}
  1208 \begin{center}
  1209 \begin{tikzpicture}
  1210 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1211 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1212   west},scaled x ticks = false,x tick label style={/pgf/number
  1213   format/1000 sep = \space}]
  1214 %\addplot+[only marks] table {proteinsOrig.txt};
  1215 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
  1216 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1217         {randGraph/ind/vf2ppInd5_0.3.txt};
  1218 \end{axis}
  1219 \end{tikzpicture}
  1220 \end{center}
  1221      \end{subfigure}
  1222      \begin{subfigure}[b]{0.55\textwidth}
  1223 \begin{center}
  1224 \begin{tikzpicture}
  1225 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1226 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1227   west},scaled x ticks = false,x tick label style={/pgf/number
  1228   format/1000 sep = \space}]
  1229 %\addplot+[only marks] table {proteinsOrig.txt};
  1230 \addplot table {randGraph/ind/vf2pInd5_0.6.txt};
  1231 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1232         {randGraph/ind/vf2ppInd5_0.6.txt};
  1233 \end{axis}
  1234 \end{tikzpicture}
  1235 \end{center}
  1236 \end{subfigure}
  1237 \begin{subfigure}[b]{0.55\textwidth}
  1238 \hspace*{-1.5cm}
  1239 \begin{tikzpicture}
  1240 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1241 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1242   west},scaled x ticks = false,x tick label style={/pgf/number
  1243   format/1000 sep = \space}]
  1244 %\addplot+[only marks] table {proteinsOrig.txt};
  1245 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
  1246 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1247         {randGraph/ind/vf2ppInd5_0.8.txt};
  1248 \end{axis}
  1249 \end{tikzpicture}
  1250      \end{subfigure}
  1251      \hspace*{-1.5cm}
  1252      \begin{subfigure}[b]{0.55\textwidth}
  1253 \begin{tikzpicture}
  1254 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1255 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1256   west},scaled x ticks = false,x tick label style={/pgf/number
  1257   format/1000 sep = \thinspace}]
  1258 %\addplot+[only marks] table {proteinsOrig.txt};
  1259 \addplot table {randGraph/ind/vf2pInd5_0.95.txt};
  1260 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1261         {randGraph/ind/vf2ppInd5_0.95.txt};
  1262 \end{axis}
  1263 \end{tikzpicture}
  1264 \end{subfigure}
  1265 \vspace*{-0.8cm}
  1266 \caption{IND on graphs having an average degree of
  1267   5.}\label{fig:randIND5}
  1268 \end{figure}
  1269 
  1270 
  1271 \begin{figure}[H]
  1272 \vspace*{-1.5cm}
  1273 \hspace*{-1.5cm}
  1274 \begin{subfigure}[b]{0.55\textwidth}
  1275 \begin{center}
  1276 \hspace*{-0.5cm}
  1277 \begin{tikzpicture}
  1278 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1279 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1280   west},scaled x ticks = false,x tick label style={/pgf/number
  1281   format/1000 sep = \space}]
  1282 %\addplot+[only marks] table {proteinsOrig.txt};
  1283 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
  1284 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1285         {randGraph/ind/vf2ppInd10_0.05.txt};
  1286 \end{axis}
  1287 \end{tikzpicture}
  1288 \end{center}
  1289      \end{subfigure}
  1290      \begin{subfigure}[b]{0.55\textwidth}
  1291 \begin{center}
  1292      \hspace*{-0.5cm}
  1293 \begin{tikzpicture}
  1294 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1295 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1296   west},scaled x ticks = false,x tick label style={/pgf/number
  1297   format/1000 sep = \space}]
  1298 %\addplot+[only marks] table {proteinsOrig.txt};
  1299 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
  1300 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1301         {randGraph/ind/vf2ppInd10_0.1.txt};
  1302 \end{axis}
  1303 \end{tikzpicture}
  1304 \end{center}
  1305 \end{subfigure}
  1306 \hspace*{-1.5cm}
  1307 \begin{subfigure}[b]{0.55\textwidth}
  1308 \begin{center}
  1309 \begin{tikzpicture}
  1310 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1311 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1312   west},scaled x ticks = false,x tick label style={/pgf/number
  1313   format/1000 sep = \space}]
  1314 %\addplot+[only marks] table {proteinsOrig.txt};
  1315 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
  1316 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1317         {randGraph/ind/vf2ppInd10_0.3.txt};
  1318 \end{axis}
  1319 \end{tikzpicture}
  1320 \end{center}
  1321      \end{subfigure}
  1322      \begin{subfigure}[b]{0.55\textwidth}
  1323 \begin{center}
  1324 \begin{tikzpicture}
  1325 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1326 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1327   west},scaled x ticks = false,x tick label style={/pgf/number
  1328   format/1000 sep = \space}]
  1329 %\addplot+[only marks] table {proteinsOrig.txt};
  1330 \addplot table {randGraph/ind/vf2pInd10_0.6.txt};
  1331 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1332         {randGraph/ind/vf2ppInd10_0.6.txt};
  1333 \end{axis}
  1334 \end{tikzpicture}
  1335 \end{center}
  1336 \end{subfigure}
  1337 \hspace*{-1.5cm}
  1338 \begin{subfigure}[b]{0.55\textwidth}
  1339 \begin{tikzpicture}
  1340 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1341 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1342   west},scaled x ticks = false,x tick label style={/pgf/number
  1343   format/1000 sep = \space}]
  1344 %\addplot+[only marks] table {proteinsOrig.txt};
  1345 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
  1346 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1347         {randGraph/ind/vf2ppInd10_0.8.txt};
  1348 \end{axis}
  1349 \end{tikzpicture}
  1350      \end{subfigure}
  1351      \begin{subfigure}[b]{0.55\textwidth}
  1352 \begin{tikzpicture}
  1353 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1354 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1355   west},scaled x ticks = false,x tick label style={/pgf/number
  1356   format/1000 sep = \thinspace}]
  1357 %\addplot+[only marks] table {proteinsOrig.txt};
  1358 \addplot table {randGraph/ind/vf2pInd10_0.95.txt};
  1359 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1360         {randGraph/ind/vf2ppInd10_0.95.txt};
  1361 \end{axis}
  1362 \end{tikzpicture}
  1363 \end{subfigure}
  1364 \vspace*{-0.8cm}
  1365 \caption{IND on graphs having an average degree of
  1366   10.}\label{fig:randIND10}
  1367 \end{figure}
  1368 
  1369 
  1370 
  1371 \begin{figure}[H]
  1372 \vspace*{-1.5cm}
  1373 \hspace*{-1.5cm}
  1374 \begin{subfigure}[b]{0.55\textwidth}
  1375 \begin{center}
  1376 \begin{tikzpicture}
  1377 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1378 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1379   west},scaled x ticks = false,x tick label style={/pgf/number
  1380   format/1000 sep = \space}]
  1381 %\addplot+[only marks] table {proteinsOrig.txt};
  1382 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
  1383 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1384         {randGraph/ind/vf2ppInd35_0.05.txt};
  1385 \end{axis}
  1386 \end{tikzpicture}
  1387 \end{center}
  1388      \end{subfigure}
  1389      \begin{subfigure}[b]{0.55\textwidth}
  1390 \begin{center}
  1391 \begin{tikzpicture}
  1392 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1393 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1394   west},scaled x ticks = false,x tick label style={/pgf/number
  1395   format/1000 sep = \space}]
  1396 %\addplot+[only marks] table {proteinsOrig.txt};
  1397 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
  1398 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1399         {randGraph/ind/vf2ppInd35_0.1.txt};
  1400 \end{axis}
  1401 \end{tikzpicture}
  1402 \end{center}
  1403 \end{subfigure}
  1404 \hspace*{-1.5cm}
  1405 \begin{subfigure}[b]{0.55\textwidth}
  1406 \begin{center}
  1407 \begin{tikzpicture}
  1408 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1409 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1410   west},scaled x ticks = false,x tick label style={/pgf/number
  1411   format/1000 sep = \space}]
  1412 %\addplot+[only marks] table {proteinsOrig.txt};
  1413 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
  1414 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1415         {randGraph/ind/vf2ppInd35_0.3.txt};
  1416 \end{axis}
  1417 \end{tikzpicture}
  1418 \end{center}
  1419      \end{subfigure}
  1420      \begin{subfigure}[b]{0.55\textwidth}
  1421 \begin{center}
  1422 \begin{tikzpicture}
  1423 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1424 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1425   west},scaled x ticks = false,x tick label style={/pgf/number
  1426   format/1000 sep = \space}]
  1427 %\addplot+[only marks] table {proteinsOrig.txt};
  1428 \addplot table {randGraph/ind/vf2pInd35_0.6.txt};
  1429 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1430         {randGraph/ind/vf2ppInd35_0.6.txt};
  1431 \end{axis}
  1432 \end{tikzpicture}
  1433 \end{center}
  1434 \end{subfigure}
  1435 \hspace*{-1.5cm}
  1436 \begin{subfigure}[b]{0.55\textwidth}
  1437 \begin{tikzpicture}
  1438 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1439 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1440   west},scaled x ticks = false,x tick label style={/pgf/number
  1441   format/1000 sep = \space}]
  1442 %\addplot+[only marks] table {proteinsOrig.txt};
  1443 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
  1444 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1445         {randGraph/ind/vf2ppInd35_0.8.txt};
  1446 \end{axis}
  1447 \end{tikzpicture}
  1448      \end{subfigure}
  1449      \begin{subfigure}[b]{0.55\textwidth}
  1450 \begin{tikzpicture}
  1451 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1452 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1453   west},scaled x ticks = false,x tick label style={/pgf/number
  1454   format/1000 sep = \thinspace}]
  1455 %\addplot+[only marks] table {proteinsOrig.txt};
  1456 \addplot table {randGraph/ind/vf2pInd35_0.95.txt};
  1457 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1458         {randGraph/ind/vf2ppInd35_0.95.txt};
  1459 \end{axis}
  1460 \end{tikzpicture}
  1461 \end{subfigure}
  1462 \vspace*{-0.8cm}
  1463 \caption{IND on graphs having an average degree of
  1464   35.}\label{fig:randIND35}
  1465 \end{figure}
  1466 
  1467 
  1468 Based on these experiments, VF2++ is faster than VF2 Plus and able to
  1469 handle really large graphs in milliseconds. Note that when $IND$ was
  1470 considered and the small graphs had proportionally few nodes ($\rho =
  1471 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
  1472 orders (e.g. see the $\delta=10$ case on
  1473 Figure~\ref{fig:randIND10}). If these instances had been excluded, the
  1474 charts would have seemed to be similar to the other ones.
  1475 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
  1476 Plus slow slightly down, but remain practically usable even on graphs
  1477 having 10 000 nodes.
  1478 
  1479 
  1480 
  1481 
  1482 
  1483 \section{Conclusion}
  1484 This paper presented VF2++, a new graph matching algorithm based on VF2, called VF2++, and analyzed it from a practical viewpoint.
  1485 
  1486 Recognizing the importance of the node order and determining an
  1487 efficient one, VF2++ is able to match graphs of thousands of nodes in
  1488 near practically linear time including preprocessing. In addition to
  1489 the proper order, VF2++ uses more efficient consistency and cutting
  1490 rules which are easy to compute and make the algorithm able to prune
  1491 most of the unfruitful branches without going astray.
  1492 
  1493 In order to show the efficiency of the new method, it has been
  1494 compared to VF2 Plus\cite{VF2Plus}, which is the best contemporary algorithm.
  1495 .
  1496 
  1497 The experiments show that VF2++ consistently outperforms VF2 Plus on
  1498 biological graphs. It seems to be asymptotically faster on protein and
  1499 on contact map graphs in the case of induced subgraph isomorphism,
  1500 while in the case of graph isomorphism, it has definitely better
  1501 asymptotic behaviour on protein graphs.
  1502 
  1503 Regarding random sparse graphs, not only has VF2++ proved itself to be
  1504 faster than VF2 Plus, but it also has a practically linear behaviour both
  1505 in the case of induced subgraph- and graph isomorphism.
  1506 
  1507 
  1508 
  1509 %% The Appendices part is started with the command \appendix;
  1510 %% appendix sections are then done as normal sections
  1511 %% \appendix
  1512 
  1513 %% \section{}
  1514 %% \label{}
  1515 
  1516 %% If you have bibdatabase file and want bibtex to generate the
  1517 %% bibitems, please use
  1518 %%
  1519 \bibliographystyle{elsarticle-num} \bibliography{bibliography}
  1520 
  1521 %% else use the following coding to input the bibitems directly in the
  1522 %% TeX file.
  1523 
  1524 %% \begin{thebibliography}{00}
  1525 
  1526 %% %% \bibitem{label}
  1527 %% %% Text of bibliographic item
  1528 
  1529 %% \bibitem{}
  1530 
  1531 %% \end{thebibliography}
  1532 
  1533 \end{document}
  1534 \endinput
  1535 %%
  1536 %% End of file `elsarticle-template-num.tex'.