damecco.tex
author Alpar Juttner <alpar@cs.elte.hu>
Wed, 30 Nov 2016 22:45:35 +0100
changeset 24 bdf97dafabfb
parent 23 b098561f70fe
child 25 217340b8dec7
child 26 42fbe17f0e3b
permissions -rw-r--r--
New title and abstract
     1 %% 
     2 %% Copyright 2007, 2008, 2009 Elsevier Ltd
     3 %% 
     4 %% This file is part of the 'Elsarticle Bundle'.
     5 %% ---------------------------------------------
     6 %% 
     7 %% It may be distributed under the conditions of the LaTeX Project Public
     8 %% License, either version 1.2 of this license or (at your option) any
     9 %% later version.  The latest version of this license is in
    10 %%    http://www.latex-project.org/lppl.txt
    11 %% and version 1.2 or later is part of all distributions of LaTeX
    12 %% version 1999/12/01 or later.
    13 %% 
    14 %% The list of all files belonging to the 'Elsarticle Bundle' is
    15 %% given in the file `manifest.txt'.
    16 %% 
    17 
    18 %% Template article for Elsevier's document class `elsarticle'
    19 %% with numbered style bibliographic references
    20 %% SP 2008/03/01
    21 
    22 \documentclass[preprint,12pt]{elsarticle}
    23 
    24 %% Use the option review to obtain double line spacing
    25 %% \documentclass[authoryear,preprint,review,12pt]{elsarticle}
    26 
    27 %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
    28 %% for a journal layout:
    29 %% \documentclass[final,1p,times]{elsarticle}
    30 %% \documentclass[final,1p,times,twocolumn]{elsarticle}
    31 %% \documentclass[final,3p,times]{elsarticle}
    32 %% \documentclass[final,3p,times,twocolumn]{elsarticle}
    33 %% \documentclass[final,5p,times]{elsarticle}
    34 %% \documentclass[final,5p,times,twocolumn]{elsarticle}
    35 
    36 %% For including figures, graphicx.sty has been loaded in
    37 %% elsarticle.cls. If you prefer to use the old commands
    38 %% please give \usepackage{epsfig}
    39 
    40 %% The amssymb package provides various useful mathematical symbols
    41 \usepackage{amssymb}
    42 %% The amsthm package provides extended theorem environments
    43 %% \usepackage{amsthm}
    44 
    45 %% The lineno packages adds line numbers. Start line numbering with
    46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
    47 %% for the whole article with \linenumbers.
    48 %% \usepackage{lineno}
    49 
    50 \usepackage{amsmath}
    51 %% \usepackage[pdftex]{graphicx}
    52 
    53 \usepackage{pgfplots}
    54 \pgfplotsset{width=9cm}
    55 \pgfplotsset{compat=1.8}
    56 
    57 \usepackage{caption}
    58 \usepackage{subcaption} 
    59 
    60 \usepackage{algorithm}
    61 \usepackage{algpseudocode}
    62 \usepackage{tikz}
    63 
    64 \usepackage{amsthm,amssymb}
    65 \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}
    66 
    67 \newtheorem{theorem}{Theorem}[subsection]
    68 \newtheorem{corollary}{Corollary}[theorem]
    69 \newtheorem{claim}[theorem]{Claim}
    70 
    71 \newtheorem{definition}{Definition}[subsection]
    72 \newtheorem{notation}{Notation}[subsection]
    73 \newtheorem{example}{Example}[subsection]
    74 \usetikzlibrary{decorations.markings}
    75 \let\oldproofname=\proofname
    76 %% \renewcommand{\proofname}{\rm\bf{Proof:}}
    77 
    78 \captionsetup{font=normalsize}
    79 
    80 \journal{Discrete Applied Mathematics}
    81 
    82 \begin{document}
    83 
    84 \begin{frontmatter}
    85 
    86 %% Title, authors and addresses
    87 
    88 %% use the tnoteref command within \title for footnotes;
    89 %% use the tnotetext command for theassociated footnote;
    90 %% use the fnref command within \author or \address for footnotes;
    91 %% use the fntext command for theassociated footnote;
    92 %% use the corref command within \author for corresponding author footnotes;
    93 %% use the cortext command for theassociated footnote;
    94 %% use the ead command for the email address,
    95 %% and the form \ead[url] for the home page:
    96 %% \title{Title\tnoteref{label1}}
    97 %% \tnotetext[label1]{}
    98 %% \author{Name\corref{cor1}\fnref{label2}}
    99 %% \ead{email address}
   100 %% \ead[url]{home page}
   101 %% \fntext[label2]{}
   102 %% \cortext[cor1]{}
   103 %% \address{Address\fnref{label3}}
   104 %% \fntext[label3]{}
   105 
   106 \title{VF2++ --- An Improved Subgraph Isomorphism Algorithm}
   107 
   108 %% use optional labels to link authors explicitly to addresses:
   109 %% \author[label1,label2]{}
   110 %% \address[label1]{}
   111 %% \address[label2]{}
   112 
   113 \author[egres,elte]{Alp{\'a}r J{\"u}ttner}
   114 \ead{alpar@cs.elte.hu}
   115 \author[elte]{P{\'e}ter Madarasi}
   116 \ead{madarasip@caesar.elte.hu}
   117 \address[egres]{MTA-ELTE Egerv{\'a}ry Research Group, Budapest, Hungary.}
   118 \address[elte]{Department of Operations Research, E{\"o}tv{\"o}s Lor{\'a}nd University, Budapest, Hungary.}
   119 
   120 \begin{abstract}
   121 
   122   This paper presents a largely improved version of the VF2 algorithm
   123   for the \emph{Subgraph Isomorphism Problem}. The improvements are
   124   twofold. Firstly, it is based on a new approach for determining the
   125   matching order of the nodes, and secondly, more efficient -
   126   nevertheless easier to compute - cutting rules significantly
   127   reducing the search space are applied.
   128 
   129   In addition to the usual subgraph isomorphism, the paper also
   130   presents specialized versions for the \emph{Induced Subgraph
   131     Isomorphism} and for the \emph{Graph Isomorphism Problems}.
   132 
   133   Finally, an extensive experimental evaluation is provided using a
   134   wide range of inputs, including both real life biological and
   135   chemical datasets and standard randomly generated graph series. The
   136   results show major and consistent running time improvements over the
   137   other known methods.
   138  
   139   The C++ implementations of the algorithms are available open source as
   140   the part of the LEMON graph and network optimization library.
   141   
   142 \end{abstract}
   143 
   144 \begin{keyword}
   145   Computational Biology, Subgraph Isomorphism Problem
   146 %% keywords here, in the form: keyword \sep keyword
   147 
   148 %% PACS codes here, in the form: \PACS code \sep code
   149 
   150 %% MSC codes here, in the form: \MSC code \sep code
   151 %% or \MSC[2008] code \sep code (2000 is the default)
   152 
   153 \end{keyword}
   154 
   155 \end{frontmatter}
   156 
   157 %% \linenumbers
   158 
   159 %% main text
   160 \section{Introduction}
   161 \label{sec:intro}
   162 
   163 In the last decades, combinatorial structures, and especially graphs
   164 have been considered with ever increasing interest, and applied to the
   165 solution of several new and revised questions.  The expressiveness,
   166 the simplicity and the studiedness of graphs make them practical for
   167 modelling and appear constantly in several seemingly independent
   168 fields, such as bioinformatics and chemistry.
   169 
   170 Complex biological systems arise from the interaction and cooperation
   171 of plenty of molecular components. Getting acquainted with such
   172 systems at the molecular level is of primary importance, since
   173 protein-protein interaction, DNA-protein interaction, metabolic
   174 interaction, transcription factor binding, neuronal networks, and
   175 hormone signaling networks can be understood this way.
   176 
   177 Many chemical and biological structures can easily be modeled
   178 as graphs, for instance, a molecular structure can be
   179 considered as a graph, whose nodes correspond to atoms and whose
   180 edges to chemical bonds. The similarity and dissimilarity of
   181 objects corresponding to nodes are incorporated to the model
   182 by \emph{node labels}. Understanding such networks basically
   183 requires finding specific subgraphs, thus calls for efficient
   184 graph matching algorithms.
   185 
   186 Other real-world fields related to some
   187 variants of graph matching include pattern recognition
   188 and machine vision \cite{HorstBunkeApplications}, symbol recognition
   189 \cite{CordellaVentoSymbolRecognition}, face identification
   190 \cite{JianzhuangYongFaceIdentification}.  \\
   191 
   192 Subgraph and induced subgraph matching problems are known to be
   193 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is
   194 one of the few problems in NP neither known to be in P nor
   195 NP-Complete. Although polynomial time isomorphism algorithms are known
   196 for various graph classes, like trees and planar
   197 graphs\cite{PlanarGraphIso}, bounded valence
   198 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
   199 or permutation graphs\cite{PermGraphIso}, and recently, an FPT algorithm has been presented for the coloured hypergraph isomorphism problem in \cite{ColoredHiperGraphIso}.
   200 
   201 In the following, some algorithms based on other approaches are
   202 summarized, which do not need any restrictions on the graphs. Even though,
   203 an overall polynomial behaviour is not expectable from such an
   204 alternative, it may often have good practical performance, in fact,
   205 it might be the best choice even on a graph class for which polynomial
   206 algorithm is known.
   207 
   208 The first practically usable approach was due to
   209 \emph{Ullmann}\cite{Ullmann} which is a commonly used depth-first
   210 search based algorithm with a complex heuristic for reducing the
   211 number of visited states. A major problem is its $\Theta(n^3)$ space
   212 complexity, which makes it impractical in the case of big sparse
   213 graphs.
   214 
   215 In a recent paper, Ullmann\cite{UllmannBit} presents an
   216 improved version of this algorithm based on a bit-vector solution for
   217 the binary Constraint Satisfaction Problem.
   218 
   219 The \emph{Nauty} algorithm\cite{Nauty} transforms the two graphs to
   220 a canonical form before starting to check for the isomorphism. It has
   221 been considered as one of the fastest graph isomorphism algorithms,
   222 although graph categories were shown in which it takes exponentially
   223 many steps. This algorithm handles only the graph isomorphism problem.
   224 
   225 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search
   226 strategy and formulates the matching as a Constraint Satisfaction
   227 Problem to prune the search tree. The constraints are that the mapping
   228 has to be injective and edge-preserving, hence it is possible to
   229 handle new matching types as well.
   230 
   231 The \emph{RI} algorithm\cite{RI} and its variations are based on a
   232 state space representation. After reordering the nodes of the graphs,
   233 it uses some fast executable heuristic checks without using any
   234 complex pruning rules. It seems to run really efficiently on graphs
   235 coming from biology, and won the International Contest on Pattern
   236 Search in Biological Databases\cite{Content}.
   237 
   238 The currently most commonly used algorithm is the
   239 \emph{VF2}\cite{VF2}, the improved version of \emph{VF}\cite{VF}, which was
   240 designed for solving pattern matching and computer vision problems,
   241 and has been one of the best overall algorithms for more than a
   242 decade. Although, it can't be up to new specialized algorithms, it is
   243 still widely used due to its simplicity and space efficiency. VF2 uses
   244 a state space representation and checks some conditions in each state
   245 to prune the search tree.
   246 
   247 Meanwhile, another variant called \emph{VF2 Plus}\cite{VF2Plus} has
   248 been published. It is considered to be as efficient as the RI
   249 algorithm and has a strictly better behavior on large graphs.  The
   250 main idea of VF2 Plus is to precompute a heuristic node order of the
   251 small graph, in which the VF2 works more efficiently.
   252 
   253 This paper introduces \emph{VF2++}, a new further improved algorithm
   254 for the graph and (induced)subgraph isomorphism problem, which uses
   255 efficient cutting rules and determines a node order in which VF2 runs
   256 significantly faster on practical inputs.
   257 
   258 This project was initiated and sponsored by QuantumBio
   259 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
   260 code --- has been published as a part of LEMON\cite{LEMON} open source
   261 graph library.
   262 
   263 Outline: Section~\ref{sec:ProbStat} defines the problems to be solved, Section~\ref{sec:VF2Alg} provides a description of VF2, Section~\ref{sec:VF2ppAlg} introduces VF2++, a new graph matching algorithm, Section~\ref{sec:VF2ppImpl} presents the details of an efficient implementation of VF2++, and Section~\ref{sec:ExpRes} compares VF2++ to a state of the art algorithm.
   264 
   265 \section{Problem Statement}\label{sec:ProbStat}
   266 This section provides a formal description of the problems to be
   267 solved.
   268 \subsection{Definitions}
   269 
   270 Throughout the paper $G_{1}=(V_{1}, E_{1})$ and
   271 $G_{2}=(V_{2}, E_{2})$ denote two undirected graphs.
   272 
   273 \begin{definition}
   274 $\mathcal{L}: (V_{1}\cup V_{2}) \longrightarrow K$ is a \textbf{node
   275     label function}, where K is an arbitrary set. The elements in K
   276   are the \textbf{node labels}. Two nodes, u and v are said to be
   277   \textbf{equivalent} if $\mathcal{L}(u)=\mathcal{L}(v)$.
   278 \end{definition}
   279 
   280 For the sake of simplicity, in this paper the graph, subgraph and induced subgraph isomorphisms are defined in a more general way.
   281 
   282 \begin{definition}\label{sec:ismorphic}
   283 $G_{1}$ and $G_{2}$ are \textbf{isomorphic} (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}:
   284   V_{1} \longrightarrow V_{2}$ bijection, for which the
   285   following is true:
   286 \begin{center}
   287 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and\\
   288 $\forall u,v\in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in{E_{2}}$
   289 \end{center}
   290 \end{definition}
   291 
   292 \begin{definition}
   293 $G_{1}$ is a \textbf{subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}:
   294   V_{1}\longrightarrow V_{2}$ injection, for which the
   295   following is true:
   296 \begin{center}
   297 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and\\
   298 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Rightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$
   299 \end{center}
   300 \end{definition}
   301 
   302 \begin{definition} 
   303 $G_{1}$ is an \textbf{induced subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists
   304   \mathfrak{m}: V_{1}\longrightarrow V_{2}$ injection, for which the
   305   following is true:
   306 \begin{center}
   307 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and
   308 
   309 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow
   310   (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$
   311 \end{center}
   312 \end{definition}
   313 
   314 
   315 \subsection{Common problems}\label{sec:CommProb}
   316 
   317 The focus of this paper is on two extensively studied topics, the
   318 subgraph isomorphism and its variations. However, the following
   319 problems also appear in many applications.
   320 
   321 The \textbf{subgraph matching problem} is the following: is
   322 $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node
   323 label?
   324 
   325 The \textbf{induced subgraph matching problem} asks the same about the
   326 existence of an induced subgraph.
   327 
   328 The \textbf{graph isomorphism problem} can be defined as induced
   329 subgraph matching problem where the sizes of the two graphs are equal.
   330 
   331 In addition, one may want to find a \textbf{single} mapping or \textbf{enumerate} all of them.
   332 
   333 Note that some authors refer to the term
   334 \emph{subgraph isomorphism problem} as an \emph{induced subgraph
   335   isomorphism problem}.
   336 
   337 \section{The VF2 Algorithm}\label{sec:VF2Alg}
   338 This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2
   339 is able to handle all the variations mentioned in Section
   340   \ref{sec:CommProb}.  Although it can also handle directed graphs,
   341 for the sake of simplicity, only the undirected case will be
   342 discussed.
   343 
   344 
   345 \subsection{Common notations}
   346 \indent Assume $G_{1}$ is searched in $G_{2}$.  The following
   347 definitions and notations will be used throughout the whole paper.
   348 \begin{definition}
   349 An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$.
   350 \end{definition}
   351 
   352 \begin{notation}
   353 $\mathfrak{D}(f)$ and $\mathfrak{R}(f)$ denote the domain and the range of a function $f$, respectively.
   354 \end{notation}
   355 
   356 \begin{definition}
   357 Mapping $\mathfrak{m}$ \textbf{covers} a node $u\in V_1\cup V_2$ if $u\in \mathfrak{D}(\mathfrak{m})\cup \mathfrak{R}(\mathfrak{m})$.
   358 \end{definition}
   359 
   360 \begin{definition}
   361 A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the
   362 nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$.
   363 \end{definition}
   364 
   365 \begin{definition}
   366 Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : \mathfrak{m}(w)=f(w)$ and $f(u)=v$ holds. Where $u\in V_1\setminus\mathfrak{D}(\mathfrak{m})$ and $v\in V_2\setminus\mathfrak{R}(\mathfrak{m})$, otherwise $extend(\mathfrak{m},(u,v))$ is undefined.
   367 \end{definition}
   368 
   369 \begin{notation}
   370 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
   371 which can be substituted by any of the $\mathbf{ISO}$, $\mathbf{SUB}$
   372 and $\mathbf{IND}$ problems.
   373 \end{notation}
   374 
   375 \begin{definition}
   376 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
   377 \textbf{consistency function by } $\mathbf{PT}$ if the following
   378 holds. If there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$.
   379 \end{definition}
   380 
   381 \begin{definition} 
   382 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
   383 \textbf{cutting function by } $\mathbf{PT}$ if the following
   384 holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$.
   385 \end{definition}
   386 
   387 \begin{definition}
   388 $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if
   389   $Cons_{PT}(\mathfrak{m})$ is true.
   390 \end{definition}
   391 
   392 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
   393 \begin{notation}
   394 Let $\mathbf{Cons_{PT}(p, \mathfrak{m})}:=Cons_{PT}(extend(\mathfrak{m},p))$, and
   395 $\mathbf{Cut_{PT}(p, \mathfrak{m})}:=Cut_{PT}(extend(\mathfrak{m},p))$, where
   396 $p\in{V_{1}\backslash\mathfrak{D}(\mathfrak{m}) \!\times\!V_{2}\backslash\mathfrak{R}(\mathfrak{m})}$.
   397 \end{notation}
   398 
   399 $Cons_{PT}$ will be used to check the consistency of the already
   400 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
   401 no whole consistent mapping can contain the current mapping.
   402 
   403 \subsection{Overview of the algorithm}
   404 VF2 uses a state space representation of mappings, $Cons_{PT}$ for
   405 excluding inconsistency with the problem type and $Cut_{PT}$ for
   406 pruning the search tree.
   407 
   408 Algorithm~\ref{alg:VF2Pseu} is a high level description of
   409 the VF2 matching algorithm. Each state of the matching process can
   410 be associated with a mapping $\mathfrak{m}$. The initial state
   411 is associated with a mapping $\mathfrak{m}$, for which
   412 $\mathfrak{D}(\mathfrak{m})=\emptyset$, i.e. it starts with an empty mapping.
   413 
   414 
   415 \begin{algorithm}
   416 \algtext*{EndIf}%ne nyomtasson end if-et
   417 \algtext*{EndFor}%ne
   418 \algtext*{EndProcedure}%ne nyomtasson ..
   419 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
   420 \begin{algorithmic}[1]
   421 
   422 \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$}
   423   \If{$\mathfrak{m}$ covers
   424     $V_{1}$} \State Output($\mathfrak{m}$)
   425   \Else
   426   \State Compute the set $P_\mathfrak{m}$ of the pairs candidate for inclusion
   427   in $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$
   428     $\neg$Cut$_{PT}$($p,\mathfrak{m}$)}
   429     \State \textbf{call}
   430   VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
   431 \end{algorithmic}
   432 \end{algorithm}
   433 
   434 
   435 For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of
   436 candidate node pairs for adding to the current mapping $\mathfrak{m}_s$.
   437 
   438 For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and
   439 $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and
   440 the latter is false, the whole process is recursively applied to
   441 $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it
   442 can be proved that $\mathfrak{m}$ can not be extended to a whole mapping.
   443 
   444 In order to make sure of the correctness, see
   445 \begin{claim}
   446 Through consistent mappings, only consistent whole mappings can be
   447 reached, and all the consistent whole mappings are reachable through
   448 consistent mappings.
   449 \end{claim}
   450 
   451 Note that a mapping may be reached in exponentially many different ways, since the
   452 order of extensions does not influence the nascent mapping.
   453 
   454 However, one may observe
   455 
   456 \begin{claim}
   457 \label{claim:claimTotOrd}
   458 Let $\prec$ be an arbitrary total ordering relation on $V_{1}$.  If
   459 the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which
   460 \begin{center}
   461 $\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$,
   462 \end{center}
   463 then no mapping can be reached more than once, and each whole mapping remains reachable.
   464 \end{claim}
   465 
   466 Note that the cornerstone of the improvements to VF2 is a proper
   467 choice of a total ordering.
   468 
   469 \subsection{The candidate set}
   470 \label{candidateComputingVF2}
   471 Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$.
   472 
   473 \begin{notation}
   474 Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and
   475  $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$.
   476 \end{notation}
   477 
   478 The set $P_\mathfrak{m}$ includes the pairs of uncovered neighbours of covered
   479 nodes, and if there is not such a node pair, all the pairs containing
   480 two uncovered nodes are added. Formally, let
   481 \[
   482  P_\mathfrak{m}\!=\!
   483   \begin{cases} 
   484    T_{1}(\mathfrak{m})\times T_{2}(\mathfrak{m})&\hspace{-0.15cm}\text{if }
   485    T_{1}(\mathfrak{m})\!\neq\!\emptyset\ \text{and }T_{2}(\mathfrak{m})\!\neq
   486    \emptyset,\\ (V_{1}\!\setminus\!\mathfrak{D}(\mathfrak{m}))\!\times\!(V_{2}\!\setminus\!\mathfrak{R}(\mathfrak{m}))
   487    &\hspace{-0.15cm}\text{otherwise}.
   488   \end{cases}
   489 \]
   490 
   491 \subsection{Consistency}
   492 Suppose $p=(u,v)$, where $u\in V_{1}$ and $v\in V_{2}$, $\mathfrak{m}$ is a consistent mapping by
   493 $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether
   494 including pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$.
   495 
   496 For example, the consistency function of induced subgraph isomorphism is as follows.
   497 \begin{notation}
   498 Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} :
   499 (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2}
   500   (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$.
   501 \end{notation}
   502 
   503 $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ $\Leftrightarrow
   504 (\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}
   505 \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. The
   506 following formulation gives an efficient way of calculating
   507 $Cons_{IND}$.
   508 \begin{claim}
   509 $Cons_{IND}((u,v),\mathfrak{m}):=\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge
   510   (\forall \tilde{u}\in \Gamma_{1}(u)
   511   \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is a
   512   consistency function in the case of $IND$.
   513 \end{claim}
   514 
   515 \subsection{Cutting rules}
   516 $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently
   517 verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can
   518 be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a
   519 whole mapping.
   520 
   521 As an example, the cutting function of induced subgraph isomorphism is presented.
   522 \begin{notation}
   523 Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash
   524 \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and
   525 \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash
   526 \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$.
   527 \end{notation}
   528 
   529 \begin{claim}
   530 $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| <
   531   |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap
   532   \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap
   533   \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function by $IND$.
   534 \end{claim}
   535 
   536 \section{The VF2++ Algorithm}\label{sec:VF2ppAlg}
   537 Although any total ordering relation makes the search space of VF2 a
   538 tree, its choice turns out to dramatically influence the number of
   539 visited states. The goal is to determine an efficient one as quickly
   540 as possible.
   541 
   542 The main reason for VF2++' superiority over VF2 is twofold. Firstly,
   543 taking into account the structure and the node labeling of the graph,
   544 VF2++ determines a state order in which most of the unfruitful
   545 branches of the search space can be pruned immediately. Secondly,
   546 introducing more efficient --- nevertheless still easier to compute
   547 --- cutting rules reduces the chance of going astray even further.
   548 
   549 In addition to the usual subgraph isomorphism, specialized versions
   550 for induced subgraph isomorphism and for graph isomorphism have been
   551 designed.
   552 
   553 Note that a weaker version of the cutting rules and an efficient
   554 candidate set calculating were described in \cite{VF2Plus}.
   555 
   556 It should be noted that all the methods described in this section are
   557 extendable to handle directed graphs and edge labels as well.
   558 The basic ideas and the detailed description of VF2++ are provided in
   559 the following.\newline
   560 
   561 The goal is to find a matching order in which the algorithm is able to
   562 recognize inconsistency or prune the infeasible branches on the
   563 highest levels and goes deep only if it is needed.
   564 
   565 \begin{notation}
   566 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{1}(u)\cap H\}|$, that is the
   567 number of neighbours of u which are in H, where $u\in V_{1} $ and
   568 $H\subseteq V_{1}$.
   569 \end{notation}
   570 
   571 The principal question is the following. Suppose a mapping $\mathfrak{m}$ is
   572 given. For which node of $T_{1}(\mathfrak{m})$ is the hardest to find a
   573 consistent pair in $G_{2}$? The more covered neighbours a node in
   574 $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has
   575 ---, the more rarely satisfiable consistency constraints for its pair
   576 are given.
   577 
   578 In biology, most of the graphs are sparse, thus several nodes in
   579 $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes
   580 reasonable to define a secondary and a tertiary order between them.
   581 The observation above proves itself to be as determining, that the
   582 secondary ordering prefers nodes with the most uncovered neighbours
   583 among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase
   584 $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes so much, as possible.  The
   585 tertiary ordering prefers nodes having the rarest uncovered labels.
   586 
   587 Note that the secondary ordering is the same as the ordering by $deg$,
   588 which is a static data in front of the above used.
   589 
   590 These rules can easily result in a matching order which contains the
   591 nodes of a long path successively, whose nodes may have low $Conn$ and
   592 is easily matchable into $G_{2}$. To avoid that, a BFS order is
   593 used, which provides the shortest possible paths.
   594 \newline
   595 
   596 In the following, some examples on which the VF2 may be slow are
   597 described, although they are easily solvable by using a proper
   598 matching order.
   599 
   600 \begin{example}
   601 Suppose $G_{1}$ can be mapped into $G_{2}$ in many ways
   602 without node labels. Let $u\in V_{1}$ and $v\in V_{2}$.
   603 \newline
   604 $\mathcal{L}(u):=black$
   605 \newline
   606 $\mathcal{L}(v):=black$
   607 \newline
   608 $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in V_{1}\backslash
   609 \{u\}$
   610 \newline
   611 $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in V_{2}\backslash
   612 \{v\}$
   613 \newline
   614 
   615 Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since
   616 $u$ is black and no node in $V_{2}$ has a black label except
   617 $v$. If unfortunately $u$ were the last node which will get covered,
   618 VF2 would check only in the last steps, whether $u$ can be matched to
   619 $v$.
   620 \newline
   621 However, had $u$ been the first matched node, u would have been
   622 matched immediately to v, so all the mappings would have been
   623 precluded in which node labels can not correspond.
   624 \end{example}
   625 
   626 \begin{example}
   627 Suppose there is no node label given, $G_{1}$ is a small graph and
   628 can not be mapped into $G_{2}$ and $u\in V_{1}$.
   629 \newline
   630 Let $G'_{1}:=(V_{1}\cup
   631 \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup
   632 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
   633 $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint
   634 from $G_{1}$ and one of its starting points is connected to $u\in
   635 V_{1}\}$.
   636 \newline
   637 Is there a subgraph of $G_{2}$, which is isomorph with
   638 $G'_{1}$?
   639 \newline
   640 If unfortunately the nodes of the path were the first $k$ nodes in the
   641 matching order, the algorithm would iterate through all the possible k
   642 long paths in $G_{2}$, and it would recognize that no path can be
   643 extended to $G'_{1}$.
   644 \newline
   645 However, had it started by the matching of $G_{1}$, it would not
   646 have matched any nodes of the path.
   647 \end{example}
   648 
   649 These examples may look artificial, but the same problems also appear
   650 in real-world instances, even though in a less obvious way.
   651 
   652 \subsection{Preparations}
   653 \begin{claim}
   654 \label{claim:claimCoverFromLeft}
   655 The total ordering relation uniquely determines a node order, in which
   656 the nodes of $V_{1}$ will be covered by VF2. From the point of
   657 view of the matching procedure, this means, that always the same node
   658 of $G_{1}$ will be covered on the d-th level.
   659 \end{claim}
   660 
   661 \begin{definition}
   662 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of
   663 $V_{1}$ is \textbf{matching order} if exists $\prec$ total
   664 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
   665 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$.
   666 \end{definition}
   667 
   668 \begin{claim}\label{claim:MOclaim}
   669 A total ordering is matching order iff the nodes of every component
   670 form an interval in the node sequence, and every node connects to a
   671 previous node in its component except the first node of each component.
   672 \end{claim}
   673 
   674 To summing up, a total ordering always uniquely determines a matching
   675 order, and every matching order can be determined by a total ordering,
   676 however, more than one different total orderings may determine the
   677 same matching order.
   678 
   679 \subsection{Total ordering}
   680 The matching order will be searched directly.
   681 \begin{notation}
   682 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} :
   683 l=\mathcal{L}(v)\}|-|\{u\in V_{1}\backslash \mathcal{M} : l=\mathcal{L}(u)\}|$ ,
   684 where $l$ is a label and $\mathcal{M}\subseteq V_{1}$.
   685 \end{notation}
   686 
   687 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
   688 \end{definition}
   689 
   690 \begin{algorithm}
   691 \algtext*{EndIf}
   692 \algtext*{EndProcedure}
   693 \algtext*{EndWhile}
   694 \algtext*{EndFor}
   695 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
   696 \begin{algorithmic}[1]
   697 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$
   698 \Comment{matching order} \While{$V_{1}\backslash \mathcal{M}
   699   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
   700 min$_{F_\mathcal{M}\circ \mathcal{L}}(V_{1}\backslash
   701 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
   702 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
   703 $d$-th level \State Process $V_d$ \Comment{See Algorithm
   704   \ref{alg:VF2PPProcess1}} \EndFor
   705 \EndWhile \EndProcedure
   706 \end{algorithmic}
   707 \end{algorithm}
   708 
   709 \begin{algorithm}
   710 \algtext*{EndIf}
   711 \algtext*{EndProcedure}%ne nyomtasson ..
   712 \algtext*{EndWhile}
   713 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
   714 \begin{algorithmic}[1]
   715 \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$}
   716 \State $m\in$ arg min$_{F_{\mathcal{M}\circ\ \mathcal{L}}}($ arg max$_{deg}($arg
   717 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
   718 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
   719 $F_\mathcal{M}$ \EndWhile \EndProcedure
   720 \end{algorithmic}
   721 \end{algorithm}
   722 
   723 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
   724 matching order procedure of VF2++. It computes a BFS tree for each
   725 component in ascending order of their rarest node labels and largest $deg$,
   726 whose root vertex is the component's minimal
   727 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending
   728 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately
   729 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.
   730 
   731 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}
   732 provides a matching order.
   733 
   734 
   735 \subsection{Cutting rules}
   736 \label{VF2PPCuttingRules}
   737 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels.
   738 \begin{notation}
   739 Let $\mathbf{\Gamma_{1}^{l}(u)}:=\{\tilde{u} : \mathcal{L}(\tilde{u})=l
   740 \wedge \tilde{u}\in \Gamma_{1} (u)\}$ and
   741 $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge
   742 \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in
   743 V_{2}$ and $l$ is a label.
   744 \end{notation}
   745 
   746 \subsubsection{Induced subgraph isomorphism}
   747 \begin{claim}
   748 \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by IND.
   749 \end{claim}
   750 \subsubsection{Graph isomorphism}
   751 \begin{claim}
   752 \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by ISO.
   753 \end{claim}
   754 
   755 \subsubsection{Subgraph isomorphism}
   756 \begin{claim}
   757 \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function by SUB.
   758 \end{claim}
   759 
   760 
   761 
   762 \section{Implementation details}\label{sec:VF2ppImpl}
   763 This section provides a detailed summary of an efficient
   764 implementation of VF2++.
   765 \subsection{Storing a mapping}
   766 After fixing an arbitrary node order ($u_0, u_1, ..,
   767 u_{|G_{1}|-1}$) of $G_{1}$, an array $M$ is usable to store
   768 the current mapping in the following way.
   769 \[
   770  M[i] =
   771   \begin{cases} 
   772    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D &
   773    if\ no\ node\ has\ been\ mapped\ to\ u_i,
   774   \end{cases}
   775 \]
   776 where $i\in\{0,1, ..,|G_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$
   777 means "no node".
   778 \subsection{Avoiding the recurrence}
   779 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
   780 as a \textit{while loop}, which has a loop counter $depth$ denoting the
   781 all-time depth of the recursion. Fixing a matching order, let $M$
   782 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},
   783 $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before
   784 $depth$. $M[depth]$ changes
   785 while the state is being processed, but the property is held before
   786 both stepping back to a predecessor state and exploring a successor
   787 state.
   788 
   789 The necessary part of the candidate set is easily maintainable or
   790 computable by following
   791 Section~\ref{candidateComputingVF2}. A much faster method
   792 has been designed for biological- and sparse graphs, see the next
   793 section for details.
   794 
   795 \subsection{Calculating the candidates for a node}
   796 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
   797 task is not to maintain the candidate set, but to generate the
   798 candidate nodes in $G_{2}$ for a given node $u\in V_{1}$.  In
   799 case of any of the three problem types and a mapping $\mathfrak{m}$, if a node $v\in
   800 V_{2}$ is a potential pair of $u\in V_{1}$, then $\forall
   801 u'\in \mathfrak{D}(\mathfrak{m}) : (u,u')\in
   802 E_{1}\Rightarrow (v,\mathfrak{m}(u'))\in
   803 E_{2}$. That is, each covered neighbour of $u$ has to be mapped to
   804 a covered neighbour of $v$.
   805 
   806 Having said that, an algorithm running in $\Theta(deg)$ time is
   807 describable if there exists a covered node in the component containing
   808 $u$, and a linear one otherwise.
   809 
   810 
   811 \subsection{Determining the node order}
   812 This section describes how the node order preprocessing method of
   813 VF2++ can efficiently be implemented.
   814 
   815 For using lookup tables, the node labels are associated with the
   816 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
   817 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order
   818 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
   819 in $V_{1}$ having label $i$, which is easy to compute in
   820 $\Theta(|V_{1}|)$ steps.
   821 
   822 Representing $\mathcal{M}\subseteq V_{1}$ as an array of
   823 size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.
   824 
   825 \subsection{Cutting rules}
   826 In Section~\ref{VF2PPCuttingRules}, the cutting rules were
   827 described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$
   828 and $\tilde T_{2}$, which are dependent on the all-time mapping
   829 (i.e. on the all-time state). The aim is to check the labeled cutting
   830 rules of VF2++ in $\Theta(deg)$ time.
   831 
   832 Firstly, suppose that these four sets are given in such a way, that
   833 checking whether a node is in a certain set takes constant time,
   834 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
   835 initially zero integer lookup table of size $|K|$. After incrementing
   836 $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and
   837 decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap
   838 T_{2}(s)$, the first part of the cutting rules is checkable in
   839 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
   840 to zero takes $\Theta(deg)$ time again, which makes it possible to use
   841 the same table through the whole algorithm. The second part of the
   842 cutting rules can be verified using the same method with $\tilde
   843 T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and
   844 $T_{2}$. Thus, the overall complexity is $\Theta(deg)$.
   845 
   846 Another integer lookup table storing the number of covered neighbours
   847 of each node in $G_{2}$ gives all the information about the sets
   848 $T_{2}$ and $\tilde T_{2}$, which is maintainable in
   849 $\Theta(deg)$ time when a pair is added or substracted by incrementing
   850 or decrementing the proper indices. A further improvement is that the
   851 values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on
   852 $u$, i.e. on the size of the mapping, so for each $u\in V_{1}$ an
   853 array of pairs (label, number of such labels) can be stored to skip
   854 the maintaining operations. Note that these arrays are at most of size
   855 $deg$.
   856 
   857 Using similar techniques, the consistency function can be evaluated in
   858 $\Theta(deg)$ steps, as well.
   859 
   860 \section{Experimental results}\label{sec:ExpRes}
   861 This section compares the performance of VF2++ and VF2 Plus. According to
   862 our experience, both algorithms run faster than VF2 with orders of
   863 magnitude, thus its inclusion was not reasonable.
   864 
   865 The algorithms were implemented in C++ using the open source
   866 LEMON graph and network optimization library\cite{LEMON}. The test were carried out on a linux based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM.
   867 \subsection{Biological graphs}
   868 The tests have been executed on a recent biological dataset created
   869 for the International Contest on Pattern Search in Biological
   870 Databases\cite{Content}, which has been constructed of molecule,
   871 protein and contact map graphs extracted from the Protein Data
   872 Bank\cite{ProteinDataBank}.
   873 
   874 The molecule dataset contains small graphs with less than 100 nodes
   875 and an average degree of less than 3. The protein dataset contains
   876 graphs having 500-10 000 nodes and an average degree of 4, while the
   877 contact map dataset contains graphs with 150-800 nodes and an average
   878 degree of 20.  \\
   879 
   880 In the following, both the induced subgraph isomorphism and the graph
   881 isomorphism will be examined.
   882 
   883 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}.
   884 
   885 In an other experiment, the nodes of each graph in the database had been
   886 shuffled, and an isomorphism between the shuffled and the original
   887 graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}.
   888 
   889 
   890 \begin{figure}[H]
   891 \vspace*{-2cm}
   892 \hspace*{-1.5cm}
   893 \begin{subfigure}[b]{0.55\textwidth}
   894 \begin{figure}[H]
   895 \begin{tikzpicture}[trim axis left, trim axis right]
   896 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   897 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   898   west},scaled x ticks = false,x tick label style={/pgf/number
   899   format/1000 sep = \thinspace}]
   900 %\addplot+[only marks] table {proteinsOrig.txt};
   901 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark
   902   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
   903 \end{axis}
   904 \end{tikzpicture}
   905 \caption{In the case of molecules, the algorithms have
   906   similar behaviour, but VF2++ is almost two times faster even on such
   907   small graphs.} \label{fig:INDMolecule}
   908 \end{figure}
   909 \end{subfigure}
   910 \hspace*{1.5cm}
   911 \begin{subfigure}[b]{0.55\textwidth}
   912 \begin{figure}[H]
   913 \begin{tikzpicture}[trim axis left, trim axis right]
   914 \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   915 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   916   west},scaled x ticks = false,x tick label style={/pgf/number
   917   format/1000 sep = \thinspace}]
   918 %\addplot+[only marks] table {proteinsOrig.txt};
   919 \addplot table {Orig/ContactMaps.128.txt};
   920 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
   921         {VF2PPLabel/ContactMaps.128.txt};
   922 \end{axis}
   923 \end{tikzpicture}
   924 \caption{On contact maps, VF2++ runs almost in constant time, while VF2
   925   Plus has a near linear behaviour.} \label{fig:INDContact}
   926 \end{figure}
   927 \end{subfigure}
   928 
   929 \begin{center}
   930 \vspace*{-0.5cm}
   931 \begin{subfigure}[b]{0.55\textwidth}
   932 \begin{figure}[H]
   933 \begin{tikzpicture}[trim axis left, trim axis right]
   934   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   935   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   936     west},scaled x ticks = false,x tick label style={/pgf/number
   937     format/1000 sep = \thinspace}] %\addplot+[only marks] table
   938     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]
   939     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark
   940       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
   941   \end{axis}
   942   \end{tikzpicture}
   943 \caption{Both the algorithms have linear behaviour on protein
   944   graphs. VF2++ is more than 10 times faster than VF2
   945   Plus.} \label{fig:INDProt}
   946 \end{figure}
   947 \end{subfigure}
   948 \end{center}
   949 \vspace*{-0.5cm}
   950 \caption{\normalsize{Induced subgraph isomorphism on biological graphs}}\label{fig:bioIND}
   951 \end{figure}
   952 
   953 
   954 \begin{figure}[H]
   955 \vspace*{-2cm}
   956 \hspace*{-1.5cm}
   957 \begin{subfigure}[b]{0.55\textwidth}
   958 \begin{figure}[H]
   959 \begin{tikzpicture}[trim axis left, trim axis right]
   960 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   961 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   962   west},scaled x ticks = false,x tick label style={/pgf/number
   963   format/1000 sep = \thinspace}]
   964 %\addplot+[only marks] table {proteinsOrig.txt};
   965 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark
   966   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
   967 \end{axis}
   968 \end{tikzpicture}
   969 \caption{In the case of molecules, there is not such a significant
   970   difference, but VF2++ seems to be faster as the number of nodes
   971   increases.}\label{fig:ISOMolecule}
   972 \end{figure}
   973 \end{subfigure}
   974 \hspace*{1.5cm}
   975 \begin{subfigure}[b]{0.55\textwidth}
   976 \begin{figure}[H]
   977 \begin{tikzpicture}[trim axis left, trim axis right]
   978 \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   979 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   980   west},scaled x ticks = false,x tick label style={/pgf/number
   981   format/1000 sep = \thinspace}]
   982 %\addplot+[only marks] table {proteinsOrig.txt};
   983 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark
   984   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
   985 \end{axis}
   986 \end{tikzpicture}
   987 \caption{The results are closer to each other on contact maps, but
   988   VF2++ still performs consistently better.}\label{fig:ISOContact}
   989 \end{figure}
   990 \end{subfigure}
   991 
   992 \begin{center}
   993 \vspace*{-0.5cm}
   994 \begin{subfigure}[b]{0.55\textwidth}
   995 \begin{figure}[H]
   996 \begin{tikzpicture}[trim axis left, trim axis right]
   997 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
   998 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
   999   west},scaled x ticks = false,x tick label style={/pgf/number
  1000   format/1000 sep = \thinspace}]
  1001 %\addplot+[only marks] table {proteinsOrig.txt};
  1002 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark
  1003   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
  1004 \end{axis}
  1005 \end{tikzpicture}
  1006 \caption{On protein graphs, VF2 Plus has a super linear time
  1007   complexity, while VF2++ runs in near constant time. The difference
  1008   is about two order of magnitude on large graphs.}\label{fig:ISOProt}
  1009 \end{figure}
  1010 \end{subfigure}
  1011 \end{center}
  1012 \vspace*{-0.6cm}
  1013 \caption{\normalsize{Graph isomorphism on biological graphs}}\label{fig:bioISO}
  1014 \end{figure}
  1015 
  1016 
  1017 
  1018 
  1019 \subsection{Random graphs}
  1020 This section compares VF2++ with VF2 Plus on random graphs of a large
  1021 size. The node labels are uniformly distributed.  Let $\delta$ denote
  1022 the average degree.  For the parameters of problems solved in the
  1023 experiments, please see the top of each chart.
  1024 \subsubsection{Graph isomorphism}
  1025 To evaluate the efficiency of the algorithms in the case of graph
  1026 isomorphism, random connected graphs of less than 20 000 nodes have been
  1027 considered. Generating a random graph and shuffling its nodes, an
  1028 isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results
  1029 on graph sets of various density.
  1030 
  1031 
  1032 
  1033 
  1034 \begin{figure}
  1035 \vspace*{-1.5cm}
  1036 \hspace*{-1.5cm}
  1037 \begin{subfigure}[b]{0.55\textwidth}
  1038 \begin{center}
  1039 \begin{tikzpicture}
  1040 \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1041 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1042   west},scaled x ticks = false,x tick label style={/pgf/number
  1043   format/1000 sep = \space}]
  1044 %\addplot+[only marks] table {proteinsOrig.txt};
  1045 \addplot table {randGraph/iso/vf2pIso5_1.txt};
  1046 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1047         {randGraph/iso/vf2ppIso5_1.txt};
  1048 \end{axis}
  1049 \end{tikzpicture}
  1050 \end{center}
  1051 \end{subfigure}
  1052 %\hspace{1cm}
  1053 \begin{subfigure}[b]{0.55\textwidth}
  1054 \begin{center}
  1055 \begin{tikzpicture}
  1056 \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1057 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1058   west},scaled x ticks = false,x tick label style={/pgf/number
  1059   format/1000 sep = \space}]
  1060 %\addplot+[only marks] table {proteinsOrig.txt};
  1061 \addplot table {randGraph/iso/vf2pIso10_1.txt};
  1062 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1063         {randGraph/iso/vf2ppIso10_1.txt};
  1064 \end{axis}
  1065 \end{tikzpicture}
  1066 \end{center}
  1067 \end{subfigure}
  1068 %%\hspace{1cm}
  1069 \hspace*{-1.5cm}
  1070 \begin{subfigure}[b]{0.55\textwidth}
  1071 \begin{center}
  1072 \begin{tikzpicture}
  1073 \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1074 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1075   west},scaled x ticks = false,x tick label style={/pgf/number
  1076   format/1000 sep = \space}]
  1077 %\addplot+[only marks] table {proteinsOrig.txt};
  1078 \addplot table {randGraph/iso/vf2pIso15_1.txt};
  1079 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1080         {randGraph/iso/vf2ppIso15_1.txt};
  1081 \end{axis}
  1082 \end{tikzpicture}
  1083 \end{center}
  1084      \end{subfigure}
  1085      \begin{subfigure}[b]{0.55\textwidth}
  1086 \begin{center}
  1087 \begin{tikzpicture}
  1088 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1089 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1090   west},scaled x ticks = false,x tick label style={/pgf/number
  1091   format/1000 sep = \thinspace}]
  1092 %\addplot+[only marks] table {proteinsOrig.txt};
  1093 \addplot table {randGraph/iso/vf2pIso100_1.txt};
  1094 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1095         {randGraph/iso/vf2ppIso100_1.txt};
  1096 \end{axis}
  1097 \end{tikzpicture}
  1098 \end{center}
  1099 \end{subfigure}
  1100 \vspace*{-0.8cm}
  1101 \caption{ISO on random graphs.
  1102 }\label{fig:randISO}
  1103 \end{figure}
  1104 
  1105 
  1106 \subsubsection{Induced subgraph isomorphism}
  1107 This section presents a comparison of VF2++ and VF2 Plus in the case
  1108 of induced subgraph isomorphism. In addition to the size of the large
  1109 graph, that of the small graph dramatically influences the hardness of
  1110 a given problem too, so the overall picture is provided by examining
  1111 small graphs of various size.
  1112 
  1113 For each chart, a number $0<\rho< 1$ has been fixed, and the following
  1114 has been executed 150 times. Generating a large graph $G_{2}$ of an average degree of $\delta$,
  1115 choose 10 of its induced subgraphs having $\rho\ |V_{2}|$ nodes,
  1116 and for all the 10 subgraphs find a mapping by using both the graph
  1117 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
  1118 0.3, 0.8$ cases have been examined, see
  1119 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
  1120 \ref{fig:randIND35}.
  1121 
  1122 
  1123 
  1124 
  1125 
  1126 \begin{figure}
  1127 \vspace*{-1.5cm}
  1128 \hspace*{-1.5cm}
  1129 \begin{subfigure}[b]{0.55\textwidth}
  1130 \begin{center}
  1131 \begin{tikzpicture}
  1132 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1133 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1134   west},scaled x ticks = false,x tick label style={/pgf/number
  1135   format/1000 sep = \space}]
  1136 %\addplot+[only marks] table {proteinsOrig.txt};
  1137 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
  1138 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1139         {randGraph/ind/vf2ppInd5_0.05.txt};
  1140 \end{axis}
  1141 \end{tikzpicture}
  1142 \end{center}
  1143      \end{subfigure}
  1144      \begin{subfigure}[b]{0.55\textwidth}
  1145 \begin{center}
  1146 \begin{tikzpicture}
  1147 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1148 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1149   west},scaled x ticks = false,x tick label style={/pgf/number
  1150   format/1000 sep = \space}]
  1151 %\addplot+[only marks] table {proteinsOrig.txt};
  1152 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
  1153 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1154         {randGraph/ind/vf2ppInd5_0.1.txt};
  1155 \end{axis}
  1156 \end{tikzpicture}
  1157 \end{center}
  1158 \end{subfigure}
  1159 \hspace*{-1.5cm}
  1160 \begin{subfigure}[b]{0.55\textwidth}
  1161 \begin{center}
  1162 \begin{tikzpicture}
  1163 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1164 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1165   west},scaled x ticks = false,x tick label style={/pgf/number
  1166   format/1000 sep = \space}]
  1167 %\addplot+[only marks] table {proteinsOrig.txt};
  1168 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
  1169 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1170         {randGraph/ind/vf2ppInd5_0.3.txt};
  1171 \end{axis}
  1172 \end{tikzpicture}
  1173 \end{center}
  1174      \end{subfigure}
  1175      \begin{subfigure}[b]{0.55\textwidth}
  1176 \begin{center}
  1177 \begin{tikzpicture}
  1178 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1179 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1180   west},scaled x ticks = false,x tick label style={/pgf/number
  1181   format/1000 sep = \space}]
  1182 %\addplot+[only marks] table {proteinsOrig.txt};
  1183 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
  1184 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1185         {randGraph/ind/vf2ppInd5_0.8.txt};
  1186 \end{axis}
  1187 \end{tikzpicture}
  1188 \end{center}
  1189 \end{subfigure}
  1190 \vspace*{-0.8cm}
  1191 \caption{IND on graphs having an average degree of
  1192   5.}\label{fig:randIND5}
  1193 \end{figure}
  1194 
  1195 
  1196 \begin{figure}
  1197 \vspace*{-1.5cm}
  1198 \hspace*{-1.5cm}
  1199 \begin{subfigure}[b]{0.55\textwidth}
  1200 \begin{center}
  1201 \hspace*{-0.5cm}
  1202 \begin{tikzpicture}
  1203 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1204 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1205   west},scaled x ticks = false,x tick label style={/pgf/number
  1206   format/1000 sep = \space}]
  1207 %\addplot+[only marks] table {proteinsOrig.txt};
  1208 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
  1209 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1210         {randGraph/ind/vf2ppInd10_0.05.txt};
  1211 \end{axis}
  1212 \end{tikzpicture}
  1213 \end{center}
  1214      \end{subfigure}
  1215      \begin{subfigure}[b]{0.55\textwidth}
  1216 \begin{center}
  1217      \hspace*{-0.5cm}
  1218 \begin{tikzpicture}
  1219 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1220 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1221   west},scaled x ticks = false,x tick label style={/pgf/number
  1222   format/1000 sep = \space}]
  1223 %\addplot+[only marks] table {proteinsOrig.txt};
  1224 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
  1225 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1226         {randGraph/ind/vf2ppInd10_0.1.txt};
  1227 \end{axis}
  1228 \end{tikzpicture}
  1229 \end{center}
  1230 \end{subfigure}
  1231 \hspace*{-1.5cm}
  1232 \begin{subfigure}[b]{0.55\textwidth}
  1233 \begin{center}
  1234 \begin{tikzpicture}
  1235 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1236 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1237   west},scaled x ticks = false,x tick label style={/pgf/number
  1238   format/1000 sep = \space}]
  1239 %\addplot+[only marks] table {proteinsOrig.txt};
  1240 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
  1241 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1242         {randGraph/ind/vf2ppInd10_0.3.txt};
  1243 \end{axis}
  1244 \end{tikzpicture}
  1245 \end{center}
  1246      \end{subfigure}
  1247      \begin{subfigure}[b]{0.55\textwidth}
  1248 \begin{center}
  1249 \begin{tikzpicture}
  1250 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1251 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1252   west},scaled x ticks = false,x tick label style={/pgf/number
  1253   format/1000 sep = \space}]
  1254 %\addplot+[only marks] table {proteinsOrig.txt};
  1255 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
  1256 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1257         {randGraph/ind/vf2ppInd10_0.8.txt};
  1258 \end{axis}
  1259 \end{tikzpicture}
  1260 \end{center}
  1261 \end{subfigure}
  1262 \vspace*{-0.8cm}
  1263 \caption{IND on graphs having an average degree of
  1264   10.}\label{fig:randIND10}
  1265 \end{figure}
  1266 
  1267 
  1268 
  1269 \begin{figure}
  1270 \vspace*{-1.5cm}
  1271 \hspace*{-1.5cm}
  1272 \begin{subfigure}[b]{0.55\textwidth}
  1273 \begin{center}
  1274 \begin{tikzpicture}
  1275 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1276 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1277   west},scaled x ticks = false,x tick label style={/pgf/number
  1278   format/1000 sep = \space}]
  1279 %\addplot+[only marks] table {proteinsOrig.txt};
  1280 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
  1281 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1282         {randGraph/ind/vf2ppInd35_0.05.txt};
  1283 \end{axis}
  1284 \end{tikzpicture}
  1285 \end{center}
  1286      \end{subfigure}
  1287      \begin{subfigure}[b]{0.55\textwidth}
  1288 \begin{center}
  1289 \begin{tikzpicture}
  1290 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1291 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1292   west},scaled x ticks = false,x tick label style={/pgf/number
  1293   format/1000 sep = \space}]
  1294 %\addplot+[only marks] table {proteinsOrig.txt};
  1295 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
  1296 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1297         {randGraph/ind/vf2ppInd35_0.1.txt};
  1298 \end{axis}
  1299 \end{tikzpicture}
  1300 \end{center}
  1301 \end{subfigure}
  1302 \hspace*{-1.5cm}
  1303 \begin{subfigure}[b]{0.55\textwidth}
  1304 \begin{center}
  1305 \begin{tikzpicture}
  1306 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1307 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1308   west},scaled x ticks = false,x tick label style={/pgf/number
  1309   format/1000 sep = \space}]
  1310 %\addplot+[only marks] table {proteinsOrig.txt};
  1311 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
  1312 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1313         {randGraph/ind/vf2ppInd35_0.3.txt};
  1314 \end{axis}
  1315 \end{tikzpicture}
  1316 \end{center}
  1317      \end{subfigure}
  1318      \begin{subfigure}[b]{0.55\textwidth}
  1319 \begin{center}
  1320 \begin{tikzpicture}
  1321 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
  1322 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
  1323   west},scaled x ticks = false,x tick label style={/pgf/number
  1324   format/1000 sep = \space}]
  1325 %\addplot+[only marks] table {proteinsOrig.txt};
  1326 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
  1327 \addplot[mark=triangle*,mark size=1.8pt,color=red] table
  1328         {randGraph/ind/vf2ppInd35_0.8.txt};
  1329 \end{axis}
  1330 \end{tikzpicture}
  1331 \end{center}
  1332 \end{subfigure}
  1333 \vspace*{-0.8cm}
  1334 \caption{IND on graphs having an average degree of
  1335   35.}\label{fig:randIND35}
  1336 \end{figure}
  1337 
  1338 
  1339 Based on these experiments, VF2++ is faster than VF2 Plus and able to
  1340 handle really large graphs in milliseconds. Note that when $IND$ was
  1341 considered and the small graphs had proportionally few nodes ($\rho =
  1342 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
  1343 orders (e.g. see the $\delta=10$ case on
  1344 Figure~\ref{fig:randIND10}). If these instances had been excluded, the
  1345 charts would have seemed to be similar to the other ones.
  1346 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
  1347 Plus slow slightly down, but remain practically usable even on graphs
  1348 having 10 000 nodes.
  1349 
  1350 
  1351 
  1352 
  1353 
  1354 \section{Conclusion}
  1355 This paper presented VF2++, a new graph matching algorithm based on VF2, called VF2++, and analyzed it from a practical viewpoint.
  1356 
  1357 Recognizing the importance of the node order and determining an
  1358 efficient one, VF2++ is able to match graphs of thousands of nodes in
  1359 near practically linear time including preprocessing. In addition to
  1360 the proper order, VF2++ uses more efficient consistency and cutting
  1361 rules which are easy to compute and make the algorithm able to prune
  1362 most of the unfruitful branches without going astray.
  1363 
  1364 In order to show the efficiency of the new method, it has been
  1365 compared to VF2 Plus\cite{VF2Plus}, which is the best contemporary algorithm.
  1366 .
  1367 
  1368 The experiments show that VF2++ consistently outperforms VF2 Plus on
  1369 biological graphs. It seems to be asymptotically faster on protein and
  1370 on contact map graphs in the case of induced subgraph isomorphism,
  1371 while in the case of graph isomorphism, it has definitely better
  1372 asymptotic behaviour on protein graphs.
  1373 
  1374 Regarding random sparse graphs, not only has VF2++ proved itself to be
  1375 faster than VF2 Plus, but it also has a practically linear behaviour both
  1376 in the case of induced subgraph- and graph isomorphism.
  1377 
  1378 
  1379 
  1380 %% The Appendices part is started with the command \appendix;
  1381 %% appendix sections are then done as normal sections
  1382 %% \appendix
  1383 
  1384 %% \section{}
  1385 %% \label{}
  1386 
  1387 %% If you have bibdatabase file and want bibtex to generate the
  1388 %% bibitems, please use
  1389 %%
  1390 \bibliographystyle{elsarticle-num} \bibliography{bibliography}
  1391 
  1392 %% else use the following coding to input the bibitems directly in the
  1393 %% TeX file.
  1394 
  1395 %% \begin{thebibliography}{00}
  1396 
  1397 %% %% \bibitem{label}
  1398 %% %% Text of bibliographic item
  1399 
  1400 %% \bibitem{}
  1401 
  1402 %% \end{thebibliography}
  1403 
  1404 \end{document}
  1405 \endinput
  1406 %%
  1407 %% End of file `elsarticle-template-num.tex'.