lemon-vf2: damecco.tex@497868c58d36

     1 %%

     2 %% Copyright 2007, 2008, 2009 Elsevier Ltd

     3 %%

     4 %% This file is part of the 'Elsarticle Bundle'.

     5 %% ---------------------------------------------

     6 %%

     7 %% It may be distributed under the conditions of the LaTeX Project Public

     8 %% License, either version 1.2 of this license or (at your option) any

     9 %% later version.  The latest version of this license is in

    10 %%    http://www.latex-project.org/lppl.txt

    11 %% and version 1.2 or later is part of all distributions of LaTeX

    12 %% version 1999/12/01 or later.

    13 %%

    14 %% The list of all files belonging to the 'Elsarticle Bundle' is

    15 %% given in the file `manifest.txt'.

    16 %%

    18 %% Template article for Elsevier's document class `elsarticle'

    19 %% with numbered style bibliographic references

    20 %% SP 2008/03/01

    22 \documentclass[preprint,12pt]{elsarticle}

    24 %% Use the option review to obtain double line spacing

    25 %% \documentclass[authoryear,preprint,review,12pt]{elsarticle}

    27 %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn

    28 %% for a journal layout:

    29 %% \documentclass[final,1p,times]{elsarticle}

    30 %% \documentclass[final,1p,times,twocolumn]{elsarticle}

    31 %% \documentclass[final,3p,times]{elsarticle}

    32 %% \documentclass[final,3p,times,twocolumn]{elsarticle}

    33 %% \documentclass[final,5p,times]{elsarticle}

    34 %% \documentclass[final,5p,times,twocolumn]{elsarticle}

    36 %% For including figures, graphicx.sty has been loaded in

    37 %% elsarticle.cls. If you prefer to use the old commands

    38 %% please give \usepackage{epsfig}

    40 %% The amssymb package provides various useful mathematical symbols

    41 \usepackage{amssymb}

    42 %% The amsthm package provides extended theorem environments

    43 %% \usepackage{amsthm}

    45 %% The lineno packages adds line numbers. Start line numbering with

    46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on

    47 %% for the whole article with \linenumbers.

    48 %% \usepackage{lineno}

    50 \usepackage{amsmath}

    51 %% \usepackage[pdftex]{graphicx}

    53 \usepackage{pgfplots}

    54 \pgfplotsset{width=9cm}

    55 \pgfplotsset{compat=1.8}

    57 \usepackage{caption}

    58 \usepackage{subcaption}

    60 \usepackage{algorithm}

    61 \usepackage{algpseudocode}

    62 \usepackage{tikz}

    64 \usepackage{amsthm,amssymb}

    65 \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}

    67 \newtheorem{theorem}{Theorem}[subsection]

    68 \newtheorem{corollary}{Corollary}[theorem]

    69 \newtheorem{claim}[theorem]{Claim}

    71 \newtheorem{definition}{Definition}[subsection]

    72 \newtheorem{notation}{Notation}[subsection]

    73 \newtheorem{example}{Example}[subsection]

    74 \usetikzlibrary{decorations.markings}

    75 \let\oldproofname=\proofname

    76 %% \renewcommand{\proofname}{\rm\bf{Proof:}}

    78 \captionsetup{font=normalsize}

    80 \journal{Discrete Applied Mathematics}

    82 \begin{document}

    84 \begin{frontmatter}

    86 %% Title, authors and addresses

    88 %% use the tnoteref command within \title for footnotes;

    89 %% use the tnotetext command for theassociated footnote;

    90 %% use the fnref command within \author or \address for footnotes;

    91 %% use the fntext command for theassociated footnote;

    92 %% use the corref command within \author for corresponding author footnotes;

    93 %% use the cortext command for theassociated footnote;

    94 %% use the ead command for the email address,

    95 %% and the form \ead[url] for the home page:

    96 %% \title{Title\tnoteref{label1}}

    97 %% \tnotetext[label1]{}

    98 %% \author{Name\corref{cor1}\fnref{label2}}

    99 %% \ead{email address}

   100 %% \ead[url]{home page}

   101 %% \fntext[label2]{}

   102 %% \cortext[cor1]{}

   103 %% \address{Address\fnref{label3}}

   104 %% \fntext[label3]{}

   106 \title{VF2++ --- An Improved Subgraph Isomorphism Algorithm}

   108 %% use optional labels to link authors explicitly to addresses:

   109 %% \author[label1,label2]{}

   110 %% \address[label1]{}

   111 %% \address[label2]{}

   113 \author[egres,elte]{Alp{\'a}r J{\"u}ttner}

   114 \ead{alpar@cs.elte.hu}

   115 \author[elte]{P{\'e}ter Madarasi}

   116 \ead{madarasip@caesar.elte.hu}

   117 \address[egres]{MTA-ELTE Egerv{\'a}ry Research Group, Budapest, Hungary.}

   118 \address[elte]{Department of Operations Research, E{\"o}tv{\"o}s Lor{\'a}nd University, Budapest, Hungary.}

   120 \begin{abstract}

   122   This paper presents a largely improved version of the VF2 algorithm

   123   for the \emph{Subgraph Isomorphism Problem}. The improvements are

   124   twofold. Firstly, it is based on a new approach for determining the

   125   matching order of the nodes, and secondly, more efficient -

   126   nevertheless easier to compute - cutting rules significantly

   127   reducing the search space are applied.

   129   In addition to the usual subgraph isomorphism, the paper also

   130   presents specialized versions for the \emph{Induced Subgraph

   131     Isomorphism} and for the \emph{Graph Isomorphism Problems}.

   133   Finally, an extensive experimental evaluation is provided using a

   134   wide range of inputs, including both real life biological and

   135   chemical datasets and standard randomly generated graph series. The

   136   results show major and consistent running time improvements over the

   137   other known methods.

   139   The C++ implementations of the algorithms are available open source as

   140   the part of the LEMON graph and network optimization library.

   142 \end{abstract}

   144 \begin{keyword}

   145   Computational Biology, Subgraph Isomorphism Problem

   146 %% keywords here, in the form: keyword \sep keyword

   148 %% PACS codes here, in the form: \PACS code \sep code

   150 %% MSC codes here, in the form: \MSC code \sep code

   151 %% or \MSC[2008] code \sep code (2000 is the default)

   153 \end{keyword}

   155 \end{frontmatter}

   157 %% \linenumbers

   159 %% main text

   160 \section{Introduction}

   161 \label{sec:intro}

   163 In the last decades, combinatorial structures, and especially graphs

   164 have been considered with ever increasing interest, and applied to the

   165 solution of several new and revised questions.  The expressiveness,

   166 the simplicity and the studiedness of graphs make them practical for

   167 modelling and appear constantly in several seemingly independent

   168 fields, such as bioinformatics and chemistry.

   170 Complex biological systems arise from the interaction and cooperation

   171 of plenty of molecular components. Getting acquainted with such

   172 systems at the molecular level is of primary importance, since

   173 protein-protein interaction, DNA-protein interaction, metabolic

   174 interaction, transcription factor binding, neuronal networks, and

   175 hormone signaling networks can be understood this way.

   177 Many chemical and biological structures can easily be modeled

   178 as graphs, for instance, a molecular structure can be

   179 considered as a graph, whose nodes correspond to atoms and whose

   180 edges to chemical bonds. The similarity and dissimilarity of

   181 objects corresponding to nodes are incorporated to the model

   182 by \emph{node labels}. Understanding such networks basically

   183 requires finding specific subgraphs, thus it calls for efficient

   184 graph matching algorithms.

   186 Other real-world fields related to some

   187 variants of graph matching include pattern recognition

   188 and machine vision \cite{HorstBunkeApplications}, symbol recognition

   189 \cite{CordellaVentoSymbolRecognition}, face identification

   190 \cite{JianzhuangYongFaceIdentification}.  \\

   192 Subgraph and induced subgraph matching problems are known to be

   193 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is

   194 one of the few problems in NP neither known to be in P nor

   195 NP-Complete. Although polynomial time isomorphism algorithms are known

   196 for various graph classes, like trees and planar

   197 graphs\cite{PlanarGraphIso}, bounded valence

   198 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}

   199 or permutation graphs\cite{PermGraphIso}, and recently, an FPT algorithm has been presented for the coloured hypergraph isomorphism problem in \cite{ColoredHiperGraphIso}.

   201 In the following, some algorithms based on other approaches are

   202 summarized, which do not need any restrictions on the graphs. Even though,

   203 an overall polynomial behaviour is not expectable from such an

   204 alternative, it may often have good practical performance, in fact,

   205 it might be the best choice even on a graph class for which polynomial

   206 algorithm is known.

   208 The first practically usable approach was due to

   209 \emph{Ullmann}\cite{Ullmann} which is a commonly used depth-first

   210 search based algorithm with a complex heuristic for reducing the

   211 number of visited states. A major problem is its $\Theta(n^3)$ space

   212 complexity, which makes it impractical in the case of big sparse

   213 graphs.

   215 In a recent paper, Ullmann\cite{UllmannBit} presents an

   216 improved version of this algorithm based on a bit-vector solution for

   217 the binary Constraint Satisfaction Problem.

   219 The \emph{Nauty} algorithm\cite{Nauty} transforms the two graphs to

   220 a canonical form before starting to check for the isomorphism. It has

   221 been considered as one of the fastest graph isomorphism algorithms,

   222 although graph categories were shown in which it takes exponentially

   223 many steps. This algorithm handles only the graph isomorphism problem.

   225 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search

   226 strategy and formulates the matching as a Constraint Satisfaction

   227 Problem to prune the search tree. The constraints are that the mapping

   228 has to be injective and edge-preserving, hence it is possible to

   229 handle new matching types as well.

   231 The \emph{RI} algorithm\cite{RI} and its variations are based on a

   232 state space representation. After reordering the nodes of the graphs,

   233 it uses some fast executable heuristic checks without using any

   234 complex pruning rules. It seems to run really efficiently on graphs

   235 coming from biology, and won the International Contest on Pattern

   236 Search in Biological Databases\cite{Content}.

   238 The currently most commonly used algorithm is the

   239 \emph{VF2}\cite{VF2}, the improved version of \emph{VF}\cite{VF}, which was

   240 designed for solving pattern matching and computer vision problems,

   241 and has been one of the best overall algorithms for more than a

   242 decade. Although, it can't be up to new specialized algorithms, it is

   243 still widely used due to its simplicity and space efficiency. VF2 uses

   244 a state space representation and checks some conditions in each state

   245 to prune the search tree.

   247 Meanwhile, another variant called \emph{VF2 Plus}\cite{VF2Plus} has

   248 been published. It is considered to be as efficient as the RI

   249 algorithm and has a strictly better behavior on large graphs.  The

   250 main idea of VF2 Plus is to precompute a heuristic node order of the

   251 small graph, in which the VF2 works more efficiently.

   253 This paper introduces \emph{VF2++}, a new further improved algorithm

   254 for the graph and (induced)subgraph isomorphism problem, which uses

   255 efficient cutting rules and determines a node order in which VF2 runs

   256 significantly faster on practical inputs.

   258 The rest of the paper is structured as

   259 follows. Section~\ref{sec:ProbStat} defines the exact problems to be

   260 solved, Section~\ref{sec:VF2Alg} provides a description of VF2. Based

   261 on that, Section~\ref{sec:VF2ppAlg} introduces VF2++. Some technical

   262 details necessary for an efficient implementation are discussed in

   263 Section~\ref{sec:VF2ppImpl}.  Finally, Section~\ref{sec:ExpRes}

   264 provide a detailed experimental evaluation of VF2++ and its comparison

   265 to the state-of-the-art algorithm.

   267 It must also be mentioned that the C++ implementations of the

   268 algorithms have been made available for evaluation and use under an

   269 open source license as a part of LEMON\cite{LEMON} open source graph

   270 library.

   272 \section{Problem Statement}\label{sec:ProbStat}

   273 This section provides a formal description of the problems to be

   274 solved.

   275 \subsection{Definitions}

   277 Throughout the paper $G_{1}=(V_{1}, E_{1})$ and

   278 $G_{2}=(V_{2}, E_{2})$ denote two undirected graphs.

   280 \begin{definition}

   281 $\mathcal{L}: (V_{1}\cup V_{2}) \longrightarrow K$ is a \textbf{node

   282     label function}, where K is an arbitrary set. The elements in K

   283   are the \textbf{node labels}. Two nodes, u and v are said to be

   284   \textbf{equivalent} if $\mathcal{L}(u)=\mathcal{L}(v)$.

   285 \end{definition}

   287 For the sake of simplicity, in this paper the graph, subgraph and induced subgraph isomorphisms are defined in a more general way.

   289 \begin{definition}\label{sec:ismorphic}

   290 $G_{1}$ and $G_{2}$ are \textbf{isomorphic} (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}:

   291   V_{1} \longrightarrow V_{2}$ bijection, for which the

   292   following is true:

   293 \begin{center}

   294 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and\\

   295 $\forall u,v\in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in{E_{2}}$

   296 \end{center}

   297 \end{definition}

   299 \begin{definition}

   300 $G_{1}$ is a \textbf{subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}:

   301   V_{1}\longrightarrow V_{2}$ injection, for which the

   302   following is true:

   303 \begin{center}

   304 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and\\

   305 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Rightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$

   306 \end{center}

   307 \end{definition}

   309 \begin{definition}

   310 $G_{1}$ is an \textbf{induced subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists

   311   \mathfrak{m}: V_{1}\longrightarrow V_{2}$ injection, for which the

   312   following is true:

   313 \begin{center}

   314 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and

   316 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow

   317   (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$

   318 \end{center}

   319 \end{definition}

   322 \subsection{Common problems}\label{sec:CommProb}

   324 The focus of this paper is on two extensively studied topics, the

   325 subgraph isomorphism and its variations. However, the following

   326 problems also appear in many applications.

   328 The \textbf{subgraph matching problem} is the following: is

   329 $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node

   330 label?

   332 The \textbf{induced subgraph matching problem} asks the same about the

   333 existence of an induced subgraph.

   335 The \textbf{graph isomorphism problem} can be defined as induced

   336 subgraph matching problem where the sizes of the two graphs are equal.

   338 In addition, one may want to find a \textbf{single} mapping or \textbf{enumerate} all of them.

   340 Note that some authors refer to the term

   341 \emph{subgraph isomorphism problem} as an \emph{induced subgraph

   342   isomorphism problem}.

   344 \section{The VF2 Algorithm}\label{sec:VF2Alg}

   345 This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2

   346 is able to handle all the variations mentioned in Section

   347   \ref{sec:CommProb}.  Although it can also handle directed graphs,

   348 for the sake of simplicity, only the undirected case will be

   349 discussed.

   352 \subsection{Common notations}

   353 \indent Assume $G_{1}$ is searched in $G_{2}$.  The following

   354 definitions and notations will be used throughout the whole paper.

   355 \begin{definition}

   356 An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$.

   357 \end{definition}

   359 \begin{notation}

   360 $\mathfrak{D}(f)$ and $\mathfrak{R}(f)$ denote the domain and the range of a function $f$, respectively.

   361 \end{notation}

   363 \begin{definition}

   364 Mapping $\mathfrak{m}$ \textbf{covers} a node $u\in V_1\cup V_2$ if $u\in \mathfrak{D}(\mathfrak{m})\cup \mathfrak{R}(\mathfrak{m})$.

   365 \end{definition}

   367 \begin{definition}

   368 A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the

   369 nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$.

   370 \end{definition}

   372 \begin{definition}

   373 Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : \mathfrak{m}(w)=f(w)$ and $f(u)=v$ holds. Where $u\in V_1\setminus\mathfrak{D}(\mathfrak{m})$ and $v\in V_2\setminus\mathfrak{R}(\mathfrak{m})$, otherwise $extend(\mathfrak{m},(u,v))$ is undefined.

   374 \end{definition}

   376 \begin{notation}

   377 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type

   378 which can be substituted by any of the $\mathbf{ISO}$, $\mathbf{SUB}$

   379 and $\mathbf{IND}$ problems.

   380 \end{notation}

   382 \begin{definition}

   383 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a

   384 \textbf{consistency function by } $\mathbf{PT}$ if the following

   385 holds. If there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$.

   386 \end{definition}

   388 \begin{definition}

   389 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a

   390 \textbf{cutting function by } $\mathbf{PT}$ if the following

   391 holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$.

   392 \end{definition}

   394 \begin{definition}

   395 $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if

   396   $Cons_{PT}(\mathfrak{m})$ is true.

   397 \end{definition}

   399 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.

   400 \begin{notation}

   401 Let $\mathbf{Cons_{PT}(p, \mathfrak{m})}:=Cons_{PT}(extend(\mathfrak{m},p))$, and

   402 $\mathbf{Cut_{PT}(p, \mathfrak{m})}:=Cut_{PT}(extend(\mathfrak{m},p))$, where

   403 $p\in{V_{1}\backslash\mathfrak{D}(\mathfrak{m}) \!\times\!V_{2}\backslash\mathfrak{R}(\mathfrak{m})}$.

   404 \end{notation}

   406 $Cons_{PT}$ will be used to check the consistency of the already

   407 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if

   408 no whole consistent mapping can contain the current mapping.

   410 \subsection{Overview of the algorithm}

   411 VF2 uses a state space representation of mappings, $Cons_{PT}$ for

   412 excluding inconsistency with the problem type and $Cut_{PT}$ for

   413 pruning the search tree.

   415 Algorithm~\ref{alg:VF2Pseu} is a high level description of

   416 the VF2 matching algorithm. Each state of the matching process can

   417 be associated with a mapping $\mathfrak{m}$. The initial state

   418 is associated with a mapping $\mathfrak{m}$, for which

   419 $\mathfrak{D}(\mathfrak{m})=\emptyset$, i.e. it starts with an empty mapping.

   422 \begin{algorithm}

   423 \algtext*{EndIf}%ne nyomtasson end if-et

   424 \algtext*{EndFor}%ne

   425 \algtext*{EndProcedure}%ne nyomtasson ..

   426 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}

   427 \begin{algorithmic}[1]

   429 \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$}

   430   \If{$\mathfrak{m}$ covers

   431     $V_{1}$} \State Output($\mathfrak{m}$)

   432   \Else

   433   \State Compute the set $P_\mathfrak{m}$ of the pairs candidate for inclusion

   434   in $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$

   435     $\neg$Cut$_{PT}$($p,\mathfrak{m}$)}

   436     \State \textbf{call}

   437   VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure

   438 \end{algorithmic}

   439 \end{algorithm}

   442 For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of

   443 candidate node pairs for adding to the current mapping $\mathfrak{m}_s$.

   445 For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and

   446 $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and

   447 the latter is false, the whole process is recursively applied to

   448 $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it

   449 can be proved that $\mathfrak{m}$ can not be extended to a whole mapping.

   451 In order to make sure of the correctness, see

   452 \begin{claim}

   453 Through consistent mappings, only consistent whole mappings can be

   454 reached, and all the consistent whole mappings are reachable through

   455 consistent mappings.

   456 \end{claim}

   458 Note that a mapping may be reached in exponentially many different ways, since the

   459 order of extensions does not influence the nascent mapping.

   461 However, one may observe

   463 \begin{claim}

   464 \label{claim:claimTotOrd}

   465 Let $\prec$ be an arbitrary total ordering relation on $V_{1}$.  If

   466 the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which

   467 \begin{center}

   468 $\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$,

   469 \end{center}

   470 then no mapping can be reached more than once, and each whole mapping remains reachable.

   471 \end{claim}

   473 Note that the cornerstone of the improvements to VF2 is a proper

   474 choice of a total ordering.

   476 \subsection{The candidate set}

   477 \label{candidateComputingVF2}

   478 Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$.

   480 \begin{notation}

   481 Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and

   482  $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$.

   483 \end{notation}

   485 The set $P_\mathfrak{m}$ includes the pairs of uncovered neighbours of covered

   486 nodes, and if there is not such a node pair, all the pairs containing

   487 two uncovered nodes are added. Formally, let

   488 \[

   489  P_\mathfrak{m}\!=\!

   490   \begin{cases}

   491    T_{1}(\mathfrak{m})\times T_{2}(\mathfrak{m})&\hspace{-0.15cm}\text{if }

   492    T_{1}(\mathfrak{m})\!\neq\!\emptyset\ \text{and }T_{2}(\mathfrak{m})\!\neq

   493    \emptyset,\\ (V_{1}\!\setminus\!\mathfrak{D}(\mathfrak{m}))\!\times\!(V_{2}\!\setminus\!\mathfrak{R}(\mathfrak{m}))

   494    &\hspace{-0.15cm}\text{otherwise}.

   495   \end{cases}

   496 \]

   498 \subsection{Consistency}

   499 Suppose $p=(u,v)$, where $u\in V_{1}$ and $v\in V_{2}$, $\mathfrak{m}$ is a consistent mapping by

   500 $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether

   501 including pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$.

   503 For example, the consistency function of induced subgraph isomorphism is as follows.

   504 \begin{notation}

   505 Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} :

   506 (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2}

   507   (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$.

   508 \end{notation}

   510 $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ $\Leftrightarrow

   511 (\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}

   512 \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. The

   513 following formulation gives an efficient way of calculating

   514 $Cons_{IND}$.

   515 \begin{claim}

   516 $Cons_{IND}((u,v),\mathfrak{m}):=\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge

   517   (\forall \tilde{u}\in \Gamma_{1}(u)

   518   \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is a

   519   consistency function in the case of $IND$.

   520 \end{claim}

   522 \subsection{Cutting rules}

   523 $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently

   524 verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can

   525 be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a

   526 whole mapping.

   528 As an example, the cutting function of induced subgraph isomorphism is presented.

   529 \begin{notation}

   530 Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash

   531 \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and

   532 \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash

   533 \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$.

   534 \end{notation}

   536 \begin{claim}

   537 $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| <

   538   |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap

   539   \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap

   540   \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function by $IND$.

   541 \end{claim}

   543 \section{The VF2++ Algorithm}\label{sec:VF2ppAlg}

   544 Although any total ordering relation makes the search space of VF2 a

   545 tree, its choice turns out to dramatically influence the number of

   546 visited states. The goal is to determine an efficient one as quickly

   547 as possible.

   549 The main reason for VF2++' superiority over VF2 is twofold. Firstly,

   550 taking into account the structure and the node labeling of the graph,

   551 VF2++ determines a state order in which most of the unfruitful

   552 branches of the search space can be pruned immediately. Secondly,

   553 introducing more efficient --- nevertheless still easier to compute

   554 --- cutting rules reduces the chance of going astray even further.

   556 In addition to the usual subgraph isomorphism, specialized versions

   557 for induced subgraph isomorphism and for graph isomorphism have been

   558 designed.

   560 Note that a weaker version of the cutting rules and an efficient

   561 candidate set calculating were described in \cite{VF2Plus}.

   563 It should be noted that all the methods described in this section are

   564 extendable to handle directed graphs and edge labels as well.

   565 The basic ideas and the detailed description of VF2++ are provided in

   566 the following.\newline

   568 The goal is to find a matching order in which the algorithm is able to

   569 recognize inconsistency or prune the infeasible branches on the

   570 highest levels and goes deep only if it is needed.

   572 \begin{notation}

   573 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{1}(u)\cap H\}|$, that is the

   574 number of neighbours of u which are in H, where $u\in V_{1} $ and

   575 $H\subseteq V_{1}$.

   576 \end{notation}

   578 The principal question is the following. Suppose a mapping $\mathfrak{m}$ is

   579 given. For which node of $T_{1}(\mathfrak{m})$ is the hardest to find a

   580 consistent pair in $G_{2}$? The more covered neighbours a node in

   581 $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has

   582 ---, the more rarely satisfiable consistency constraints for its pair

   583 are given.

   585 In biology, most of the graphs are sparse, thus several nodes in

   586 $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes

   587 reasonable to define a secondary and a tertiary order between them.

   588 The observation above proves itself to be as determining, that the

   589 secondary ordering prefers nodes with the most uncovered neighbours

   590 among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase

   591 $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes so much, as possible.  The

   592 tertiary ordering prefers nodes having the rarest uncovered labels.

   594 Note that the secondary ordering is the same as the ordering by $deg$,

   595 which is a static data in front of the above used.

   597 These rules can easily result in a matching order which contains the

   598 nodes of a long path successively, whose nodes may have low $Conn$ and

   599 is easily matchable into $G_{2}$. To avoid that, a BFS order is

   600 used, which provides the shortest possible paths.

   601 \newline

   603 In the following, some examples on which the VF2 may be slow are

   604 described, although they are easily solvable by using a proper

   605 matching order.

   607 \begin{example}

   608 Suppose $G_{1}$ can be mapped into $G_{2}$ in many ways

   609 without node labels. Let $u\in V_{1}$ and $v\in V_{2}$.

   610 \newline

   611 $\mathcal{L}(u):=black$

   612 \newline

   613 $\mathcal{L}(v):=black$

   614 \newline

   615 $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in V_{1}\backslash

   616 \{u\}$

   617 \newline

   618 $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in V_{2}\backslash

   619 \{v\}$

   620 \newline

   622 Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since

   623 $u$ is black and no node in $V_{2}$ has a black label except

   624 $v$. If unfortunately $u$ were the last node which will get covered,

   625 VF2 would check only in the last steps, whether $u$ can be matched to

   626 $v$.

   627 \newline

   628 However, had $u$ been the first matched node, u would have been

   629 matched immediately to v, so all the mappings would have been

   630 precluded in which node labels can not correspond.

   631 \end{example}

   633 \begin{example}

   634 Suppose there is no node label given, $G_{1}$ is a small graph and

   635 can not be mapped into $G_{2}$ and $u\in V_{1}$.

   636 \newline

   637 Let $G'_{1}:=(V_{1}\cup

   638 \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup

   639 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,

   640 $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint

   641 from $G_{1}$ and one of its starting points is connected to $u\in

   642 V_{1}\}$.

   643 \newline

   644 Is there a subgraph of $G_{2}$, which is isomorph with

   645 $G'_{1}$?

   646 \newline

   647 If unfortunately the nodes of the path were the first $k$ nodes in the

   648 matching order, the algorithm would iterate through all the possible k

   649 long paths in $G_{2}$, and it would recognize that no path can be

   650 extended to $G'_{1}$.

   651 \newline

   652 However, had it started by the matching of $G_{1}$, it would not

   653 have matched any nodes of the path.

   654 \end{example}

   656 These examples may look artificial, but the same problems also appear

   657 in real-world instances, even though in a less obvious way.

   659 \subsection{Preparations}

   660 \begin{claim}

   661 \label{claim:claimCoverFromLeft}

   662 The total ordering relation uniquely determines a node order, in which

   663 the nodes of $V_{1}$ will be covered by VF2. From the point of

   664 view of the matching procedure, this means, that always the same node

   665 of $G_{1}$ will be covered on the d-th level.

   666 \end{claim}

   668 \begin{definition}

   669 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of

   670 $V_{1}$ is \textbf{matching order} if exists $\prec$ total

   671 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds

   672 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$.

   673 \end{definition}

   675 \begin{claim}\label{claim:MOclaim}

   676 A total ordering is matching order iff the nodes of every component

   677 form an interval in the node sequence, and every node connects to a

   678 previous node in its component except the first node of each component.

   679 \end{claim}

   681 To summing up, a total ordering always uniquely determines a matching

   682 order, and every matching order can be determined by a total ordering,

   683 however, more than one different total orderings may determine the

   684 same matching order.

   686 \subsection{Total ordering}

   687 The matching order will be searched directly.

   688 \begin{notation}

   689 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} :

   690 l=\mathcal{L}(v)\}|-|\{u\in V_{1}\backslash \mathcal{M} : l=\mathcal{L}(u)\}|$ ,

   691 where $l$ is a label and $\mathcal{M}\subseteq V_{1}$.

   692 \end{notation}

   694 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.

   695 \end{definition}

   697 \begin{algorithm}

   698 \algtext*{EndIf}

   699 \algtext*{EndProcedure}

   700 \algtext*{EndWhile}

   701 \algtext*{EndFor}

   702 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}

   703 \begin{algorithmic}[1]

   704 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$

   705 \Comment{matching order} \While{$V_{1}\backslash \mathcal{M}

   706   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg

   707 min$_{F_\mathcal{M}\circ \mathcal{L}}(V_{1}\backslash

   708 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with

   709 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the

   710 $d$-th level \State Process $V_d$ \Comment{See Algorithm

   711   \ref{alg:VF2PPProcess1}} \EndFor

   712 \EndWhile \EndProcedure

   713 \end{algorithmic}

   714 \end{algorithm}

   716 \begin{algorithm}

   717 \algtext*{EndIf}

   718 \algtext*{EndProcedure}%ne nyomtasson ..

   719 \algtext*{EndWhile}

   720 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}

   721 \begin{algorithmic}[1]

   722 \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$}

   723 \State $m\in$ arg min$_{F_{\mathcal{M}\circ\ \mathcal{L}}}($ arg max$_{deg}($arg

   724 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$

   725 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh

   726 $F_\mathcal{M}$ \EndWhile \EndProcedure

   727 \end{algorithmic}

   728 \end{algorithm}

   730 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the

   731 matching order procedure of VF2++. It computes a BFS tree for each

   732 component in ascending order of their rarest node labels and largest $deg$,

   733 whose root vertex is the component's minimal

   734 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending

   735 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately

   736 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.

   738 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}

   739 provides a matching order.

   742 \subsection{Cutting rules}

   743 \label{VF2PPCuttingRules}

   744 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels.

   745 \begin{notation}

   746 Let $\mathbf{\Gamma_{1}^{l}(u)}:=\{\tilde{u} : \mathcal{L}(\tilde{u})=l

   747 \wedge \tilde{u}\in \Gamma_{1} (u)\}$ and

   748 $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge

   749 \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in

   750 V_{2}$ and $l$ is a label.

   751 \end{notation}

   753 \subsubsection{Induced subgraph isomorphism}

   754 \begin{claim}

   755 \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by IND.

   756 \end{claim}

   757 \subsubsection{Graph isomorphism}

   758 \begin{claim}

   759 \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by ISO.

   760 \end{claim}

   762 \subsubsection{Subgraph isomorphism}

   763 \begin{claim}

   764 \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function by SUB.

   765 \end{claim}

   769 \section{Implementation details}\label{sec:VF2ppImpl}

   770 This section provides a detailed summary of an efficient

   771 implementation of VF2++.

   772 \subsection{Storing a mapping}

   773 After fixing an arbitrary node order ($u_0, u_1, ..,

   774 u_{|G_{1}|-1}$) of $G_{1}$, an array $M$ is usable to store

   775 the current mapping in the following way.

   776 \[

   777  M[i] =

   778   \begin{cases}

   779    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D &

   780    if\ no\ node\ has\ been\ mapped\ to\ u_i,

   781   \end{cases}

   782 \]

   783 where $i\in\{0,1, ..,|G_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$

   784 means "no node".

   785 \subsection{Avoiding the recurrence}

   786 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized

   787 as a \textit{while loop}, which has a loop counter $depth$ denoting the

   788 all-time depth of the recursion. Fixing a matching order, let $M$

   789 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},

   790 $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before

   791 $depth$. $M[depth]$ changes

   792 while the state is being processed, but the property is held before

   793 both stepping back to a predecessor state and exploring a successor

   794 state.

   796 The necessary part of the candidate set is easily maintainable or

   797 computable by following

   798 Section~\ref{candidateComputingVF2}. A much faster method

   799 has been designed for biological- and sparse graphs, see the next

   800 section for details.

   802 \subsection{Calculating the candidates for a node}

   803 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the

   804 task is not to maintain the candidate set, but to generate the

   805 candidate nodes in $G_{2}$ for a given node $u\in V_{1}$.  In

   806 case of any of the three problem types and a mapping $\mathfrak{m}$, if a node $v\in

   807 V_{2}$ is a potential pair of $u\in V_{1}$, then $\forall

   808 u'\in \mathfrak{D}(\mathfrak{m}) : (u,u')\in

   809 E_{1}\Rightarrow (v,\mathfrak{m}(u'))\in

   810 E_{2}$. That is, each covered neighbour of $u$ has to be mapped to

   811 a covered neighbour of $v$.

   813 Having said that, an algorithm running in $\Theta(deg)$ time is

   814 describable if there exists a covered node in the component containing

   815 $u$, and a linear one otherwise.

   818 \subsection{Determining the node order}

   819 This section describes how the node order preprocessing method of

   820 VF2++ can efficiently be implemented.

   822 For using lookup tables, the node labels are associated with the

   823 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It

   824 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order

   825 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes

   826 in $V_{1}$ having label $i$, which is easy to compute in

   827 $\Theta(|V_{1}|)$ steps.

   829 Representing $\mathcal{M}\subseteq V_{1}$ as an array of

   830 size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.

   832 \subsection{Cutting rules}

   833 In Section~\ref{VF2PPCuttingRules}, the cutting rules were

   834 described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$

   835 and $\tilde T_{2}$, which are dependent on the all-time mapping

   836 (i.e. on the all-time state). The aim is to check the labeled cutting

   837 rules of VF2++ in $\Theta(deg)$ time.

   839 Firstly, suppose that these four sets are given in such a way, that

   840 checking whether a node is in a certain set takes constant time,

   841 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an

   842 initially zero integer lookup table of size $|K|$. After incrementing

   843 $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and

   844 decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap

   845 T_{2}(s)$, the first part of the cutting rules is checkable in

   846 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$

   847 to zero takes $\Theta(deg)$ time again, which makes it possible to use

   848 the same table through the whole algorithm. The second part of the

   849 cutting rules can be verified using the same method with $\tilde

   850 T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and

   851 $T_{2}$. Thus, the overall complexity is $\Theta(deg)$.

   853 Another integer lookup table storing the number of covered neighbours

   854 of each node in $G_{2}$ gives all the information about the sets

   855 $T_{2}$ and $\tilde T_{2}$, which is maintainable in

   856 $\Theta(deg)$ time when a pair is added or substracted by incrementing

   857 or decrementing the proper indices. A further improvement is that the

   858 values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on

   859 $u$, i.e. on the size of the mapping, so for each $u\in V_{1}$ an

   860 array of pairs (label, number of such labels) can be stored to skip

   861 the maintaining operations. Note that these arrays are at most of size

   862 $deg$.

   864 Using similar techniques, the consistency function can be evaluated in

   865 $\Theta(deg)$ steps, as well.

   867 \section{Experimental results}\label{sec:ExpRes}

   868 This section compares the performance of VF2++ and VF2 Plus. According to

   869 our experience, both algorithms run faster than VF2 with orders of

   870 magnitude, thus its inclusion was not reasonable.

   872 The algorithms were implemented in C++ using the open source

   873 LEMON graph and network optimization library\cite{LEMON}. The test were carried out on a linux based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM.

   874 \subsection{Biological graphs}

   875 The tests have been executed on a recent biological dataset created

   876 for the International Contest on Pattern Search in Biological

   877 Databases\cite{Content}, which has been constructed of molecule,

   878 protein and contact map graphs extracted from the Protein Data

   879 Bank\cite{ProteinDataBank}.

   881 The molecule dataset contains small graphs with less than 100 nodes

   882 and an average degree of less than 3. The protein dataset contains

   883 graphs having 500-10 000 nodes and an average degree of 4, while the

   884 contact map dataset contains graphs with 150-800 nodes and an average

   885 degree of 20.  \\

   887 In the following, both the induced subgraph isomorphism and the graph

   888 isomorphism will be examined.

   890 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}.

   892 In an other experiment, the nodes of each graph in the database had been

   893 shuffled, and an isomorphism between the shuffled and the original

   894 graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}.

   897 \begin{figure}[H]

   898 \vspace*{-2cm}

   899 \hspace*{-1.5cm}

   900 \begin{subfigure}[b]{0.55\textwidth}

   901 \begin{figure}[H]

   902 \begin{tikzpicture}[trim axis left, trim axis right]

   903 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

   904 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

   905   west},scaled x ticks = false,x tick label style={/pgf/number

   906   format/1000 sep = \thinspace}]

   907 %\addplot+[only marks] table {proteinsOrig.txt};

   908 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark

   909   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};

   910 \end{axis}

   911 \end{tikzpicture}

   912 \caption{In the case of molecules, the algorithms have

   913   similar behaviour, but VF2++ is almost two times faster even on such

   914   small graphs.} \label{fig:INDMolecule}

   915 \end{figure}

   916 \end{subfigure}

   917 \hspace*{1.5cm}

   918 \begin{subfigure}[b]{0.55\textwidth}

   919 \begin{figure}[H]

   920 \begin{tikzpicture}[trim axis left, trim axis right]

   921 \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

   922 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

   923   west},scaled x ticks = false,x tick label style={/pgf/number

   924   format/1000 sep = \thinspace}]

   925 %\addplot+[only marks] table {proteinsOrig.txt};

   926 \addplot table {Orig/ContactMaps.128.txt};

   927 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

   928         {VF2PPLabel/ContactMaps.128.txt};

   929 \end{axis}

   930 \end{tikzpicture}

   931 \caption{On contact maps, VF2++ runs almost in constant time, while VF2

   932   Plus has a near linear behaviour.} \label{fig:INDContact}

   933 \end{figure}

   934 \end{subfigure}

   936 \begin{center}

   937 \vspace*{-0.5cm}

   938 \begin{subfigure}[b]{0.55\textwidth}

   939 \begin{figure}[H]

   940 \begin{tikzpicture}[trim axis left, trim axis right]

   941   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

   942   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

   943     west},scaled x ticks = false,x tick label style={/pgf/number

   944     format/1000 sep = \thinspace}] %\addplot+[only marks] table

   945     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]

   946     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark

   947       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};

   948   \end{axis}

   949   \end{tikzpicture}

   950 \caption{Both the algorithms have linear behaviour on protein

   951   graphs. VF2++ is more than 10 times faster than VF2

   952   Plus.} \label{fig:INDProt}

   953 \end{figure}

   954 \end{subfigure}

   955 \end{center}

   956 \vspace*{-0.5cm}

   957 \caption{\normalsize{Induced subgraph isomorphism on biological graphs}}\label{fig:bioIND}

   958 \end{figure}

   961 \begin{figure}[H]

   962 \vspace*{-2cm}

   963 \hspace*{-1.5cm}

   964 \begin{subfigure}[b]{0.55\textwidth}

   965 \begin{figure}[H]

   966 \begin{tikzpicture}[trim axis left, trim axis right]

   967 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

   968 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

   969   west},scaled x ticks = false,x tick label style={/pgf/number

   970   format/1000 sep = \thinspace}]

   971 %\addplot+[only marks] table {proteinsOrig.txt};

   972 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark

   973   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};

   974 \end{axis}

   975 \end{tikzpicture}

   976 \caption{In the case of molecules, there is not such a significant

   977   difference, but VF2++ seems to be faster as the number of nodes

   978   increases.}\label{fig:ISOMolecule}

   979 \end{figure}

   980 \end{subfigure}

   981 \hspace*{1.5cm}

   982 \begin{subfigure}[b]{0.55\textwidth}

   983 \begin{figure}[H]

   984 \begin{tikzpicture}[trim axis left, trim axis right]

   985 \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

   986 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

   987   west},scaled x ticks = false,x tick label style={/pgf/number

   988   format/1000 sep = \thinspace}]

   989 %\addplot+[only marks] table {proteinsOrig.txt};

   990 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark

   991   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};

   992 \end{axis}

   993 \end{tikzpicture}

   994 \caption{The results are closer to each other on contact maps, but

   995   VF2++ still performs consistently better.}\label{fig:ISOContact}

   996 \end{figure}

   997 \end{subfigure}

   999 \begin{center}

  1000 \vspace*{-0.5cm}

  1001 \begin{subfigure}[b]{0.55\textwidth}

  1002 \begin{figure}[H]

  1003 \begin{tikzpicture}[trim axis left, trim axis right]

  1004 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

  1005 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1006   west},scaled x ticks = false,x tick label style={/pgf/number

  1007   format/1000 sep = \thinspace}]

  1008 %\addplot+[only marks] table {proteinsOrig.txt};

  1009 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark

  1010   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};

  1011 \end{axis}

  1012 \end{tikzpicture}

  1013 \caption{On protein graphs, VF2 Plus has a super linear time

  1014   complexity, while VF2++ runs in near constant time. The difference

  1015   is about two order of magnitude on large graphs.}\label{fig:ISOProt}

  1016 \end{figure}

  1017 \end{subfigure}

  1018 \end{center}

  1019 \vspace*{-0.6cm}

  1020 \caption{\normalsize{Graph isomorphism on biological graphs}}\label{fig:bioISO}

  1021 \end{figure}

  1026 \subsection{Random graphs}

  1027 This section compares VF2++ with VF2 Plus on random graphs of a large

  1028 size. The node labels are uniformly distributed.  Let $\delta$ denote

  1029 the average degree.  For the parameters of problems solved in the

  1030 experiments, please see the top of each chart.

  1031 \subsubsection{Graph isomorphism}

  1032 To evaluate the efficiency of the algorithms in the case of graph

  1033 isomorphism, random connected graphs of less than 20 000 nodes have been

  1034 considered. Generating a random graph and shuffling its nodes, an

  1035 isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results

  1036 on graph sets of various density.

  1041 \begin{figure}

  1042 \vspace*{-1.5cm}

  1043 \hspace*{-1.5cm}

  1044 \begin{subfigure}[b]{0.55\textwidth}

  1045 \begin{center}

  1046 \begin{tikzpicture}

  1047 \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1048 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1049   west},scaled x ticks = false,x tick label style={/pgf/number

  1050   format/1000 sep = \space}]

  1051 %\addplot+[only marks] table {proteinsOrig.txt};

  1052 \addplot table {randGraph/iso/vf2pIso5_1.txt};

  1053 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1054         {randGraph/iso/vf2ppIso5_1.txt};

  1055 \end{axis}

  1056 \end{tikzpicture}

  1057 \end{center}

  1058 \end{subfigure}

  1059 %\hspace{1cm}

  1060 \begin{subfigure}[b]{0.55\textwidth}

  1061 \begin{center}

  1062 \begin{tikzpicture}

  1063 \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1064 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1065   west},scaled x ticks = false,x tick label style={/pgf/number

  1066   format/1000 sep = \space}]

  1067 %\addplot+[only marks] table {proteinsOrig.txt};

  1068 \addplot table {randGraph/iso/vf2pIso10_1.txt};

  1069 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1070         {randGraph/iso/vf2ppIso10_1.txt};

  1071 \end{axis}

  1072 \end{tikzpicture}

  1073 \end{center}

  1074 \end{subfigure}

  1075 %%\hspace{1cm}

  1076 \hspace*{-1.5cm}

  1077 \begin{subfigure}[b]{0.55\textwidth}

  1078 \begin{center}

  1079 \begin{tikzpicture}

  1080 \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1081 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1082   west},scaled x ticks = false,x tick label style={/pgf/number

  1083   format/1000 sep = \space}]

  1084 %\addplot+[only marks] table {proteinsOrig.txt};

  1085 \addplot table {randGraph/iso/vf2pIso15_1.txt};

  1086 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1087         {randGraph/iso/vf2ppIso15_1.txt};

  1088 \end{axis}

  1089 \end{tikzpicture}

  1090 \end{center}

  1091      \end{subfigure}

  1092      \begin{subfigure}[b]{0.55\textwidth}

  1093 \begin{center}

  1094 \begin{tikzpicture}

  1095 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1096 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1097   west},scaled x ticks = false,x tick label style={/pgf/number

  1098   format/1000 sep = \thinspace}]

  1099 %\addplot+[only marks] table {proteinsOrig.txt};

  1100 \addplot table {randGraph/iso/vf2pIso100_1.txt};

  1101 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1102         {randGraph/iso/vf2ppIso100_1.txt};

  1103 \end{axis}

  1104 \end{tikzpicture}

  1105 \end{center}

  1106 \end{subfigure}

  1107 \vspace*{-0.8cm}

  1108 \caption{ISO on random graphs.

  1109 }\label{fig:randISO}

  1110 \end{figure}

  1113 \subsubsection{Induced subgraph isomorphism}

  1114 This section presents a comparison of VF2++ and VF2 Plus in the case

  1115 of induced subgraph isomorphism. In addition to the size of the large

  1116 graph, that of the small graph dramatically influences the hardness of

  1117 a given problem too, so the overall picture is provided by examining

  1118 small graphs of various size.

  1120 For each chart, a number $0<\rho< 1$ has been fixed, and the following

  1121 has been executed 150 times. Generating a large graph $G_{2}$ of an average degree of $\delta$,

  1122 choose 10 of its induced subgraphs having $\rho\ |V_{2}|$ nodes,

  1123 and for all the 10 subgraphs find a mapping by using both the graph

  1124 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,

  1125 0.3, 0.8$ cases have been examined, see

  1126 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and

  1127 \ref{fig:randIND35}.

  1133 \begin{figure}

  1134 \vspace*{-1.5cm}

  1135 \hspace*{-1.5cm}

  1136 \begin{subfigure}[b]{0.55\textwidth}

  1137 \begin{center}

  1138 \begin{tikzpicture}

  1139 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1140 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1141   west},scaled x ticks = false,x tick label style={/pgf/number

  1142   format/1000 sep = \space}]

  1143 %\addplot+[only marks] table {proteinsOrig.txt};

  1144 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};

  1145 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1146         {randGraph/ind/vf2ppInd5_0.05.txt};

  1147 \end{axis}

  1148 \end{tikzpicture}

  1149 \end{center}

  1150      \end{subfigure}

  1151      \begin{subfigure}[b]{0.55\textwidth}

  1152 \begin{center}

  1153 \begin{tikzpicture}

  1154 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1155 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1156   west},scaled x ticks = false,x tick label style={/pgf/number

  1157   format/1000 sep = \space}]

  1158 %\addplot+[only marks] table {proteinsOrig.txt};

  1159 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};

  1160 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1161         {randGraph/ind/vf2ppInd5_0.1.txt};

  1162 \end{axis}

  1163 \end{tikzpicture}

  1164 \end{center}

  1165 \end{subfigure}

  1166 \hspace*{-1.5cm}

  1167 \begin{subfigure}[b]{0.55\textwidth}

  1168 \begin{center}

  1169 \begin{tikzpicture}

  1170 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1171 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1172   west},scaled x ticks = false,x tick label style={/pgf/number

  1173   format/1000 sep = \space}]

  1174 %\addplot+[only marks] table {proteinsOrig.txt};

  1175 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};

  1176 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1177         {randGraph/ind/vf2ppInd5_0.3.txt};

  1178 \end{axis}

  1179 \end{tikzpicture}

  1180 \end{center}

  1181      \end{subfigure}

  1182      \begin{subfigure}[b]{0.55\textwidth}

  1183 \begin{center}

  1184 \begin{tikzpicture}

  1185 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1186 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1187   west},scaled x ticks = false,x tick label style={/pgf/number

  1188   format/1000 sep = \space}]

  1189 %\addplot+[only marks] table {proteinsOrig.txt};

  1190 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};

  1191 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1192         {randGraph/ind/vf2ppInd5_0.8.txt};

  1193 \end{axis}

  1194 \end{tikzpicture}

  1195 \end{center}

  1196 \end{subfigure}

  1197 \vspace*{-0.8cm}

  1198 \caption{IND on graphs having an average degree of

  1199   5.}\label{fig:randIND5}

  1200 \end{figure}

  1203 \begin{figure}

  1204 \vspace*{-1.5cm}

  1205 \hspace*{-1.5cm}

  1206 \begin{subfigure}[b]{0.55\textwidth}

  1207 \begin{center}

  1208 \hspace*{-0.5cm}

  1209 \begin{tikzpicture}

  1210 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1211 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1212   west},scaled x ticks = false,x tick label style={/pgf/number

  1213   format/1000 sep = \space}]

  1214 %\addplot+[only marks] table {proteinsOrig.txt};

  1215 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};

  1216 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1217         {randGraph/ind/vf2ppInd10_0.05.txt};

  1218 \end{axis}

  1219 \end{tikzpicture}

  1220 \end{center}

  1221      \end{subfigure}

  1222      \begin{subfigure}[b]{0.55\textwidth}

  1223 \begin{center}

  1224      \hspace*{-0.5cm}

  1225 \begin{tikzpicture}

  1226 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1227 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1228   west},scaled x ticks = false,x tick label style={/pgf/number

  1229   format/1000 sep = \space}]

  1230 %\addplot+[only marks] table {proteinsOrig.txt};

  1231 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};

  1232 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1233         {randGraph/ind/vf2ppInd10_0.1.txt};

  1234 \end{axis}

  1235 \end{tikzpicture}

  1236 \end{center}

  1237 \end{subfigure}

  1238 \hspace*{-1.5cm}

  1239 \begin{subfigure}[b]{0.55\textwidth}

  1240 \begin{center}

  1241 \begin{tikzpicture}

  1242 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1243 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1244   west},scaled x ticks = false,x tick label style={/pgf/number

  1245   format/1000 sep = \space}]

  1246 %\addplot+[only marks] table {proteinsOrig.txt};

  1247 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};

  1248 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1249         {randGraph/ind/vf2ppInd10_0.3.txt};

  1250 \end{axis}

  1251 \end{tikzpicture}

  1252 \end{center}

  1253      \end{subfigure}

  1254      \begin{subfigure}[b]{0.55\textwidth}

  1255 \begin{center}

  1256 \begin{tikzpicture}

  1257 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1258 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1259   west},scaled x ticks = false,x tick label style={/pgf/number

  1260   format/1000 sep = \space}]

  1261 %\addplot+[only marks] table {proteinsOrig.txt};

  1262 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};

  1263 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1264         {randGraph/ind/vf2ppInd10_0.8.txt};

  1265 \end{axis}

  1266 \end{tikzpicture}

  1267 \end{center}

  1268 \end{subfigure}

  1269 \vspace*{-0.8cm}

  1270 \caption{IND on graphs having an average degree of

  1271   10.}\label{fig:randIND10}

  1272 \end{figure}

  1276 \begin{figure}

  1277 \vspace*{-1.5cm}

  1278 \hspace*{-1.5cm}

  1279 \begin{subfigure}[b]{0.55\textwidth}

  1280 \begin{center}

  1281 \begin{tikzpicture}

  1282 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1283 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1284   west},scaled x ticks = false,x tick label style={/pgf/number

  1285   format/1000 sep = \space}]

  1286 %\addplot+[only marks] table {proteinsOrig.txt};

  1287 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};

  1288 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1289         {randGraph/ind/vf2ppInd35_0.05.txt};

  1290 \end{axis}

  1291 \end{tikzpicture}

  1292 \end{center}

  1293      \end{subfigure}

  1294      \begin{subfigure}[b]{0.55\textwidth}

  1295 \begin{center}

  1296 \begin{tikzpicture}

  1297 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1298 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1299   west},scaled x ticks = false,x tick label style={/pgf/number

  1300   format/1000 sep = \space}]

  1301 %\addplot+[only marks] table {proteinsOrig.txt};

  1302 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};

  1303 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1304         {randGraph/ind/vf2ppInd35_0.1.txt};

  1305 \end{axis}

  1306 \end{tikzpicture}

  1307 \end{center}

  1308 \end{subfigure}

  1309 \hspace*{-1.5cm}

  1310 \begin{subfigure}[b]{0.55\textwidth}

  1311 \begin{center}

  1312 \begin{tikzpicture}

  1313 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1314 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1315   west},scaled x ticks = false,x tick label style={/pgf/number

  1316   format/1000 sep = \space}]

  1317 %\addplot+[only marks] table {proteinsOrig.txt};

  1318 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};

  1319 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1320         {randGraph/ind/vf2ppInd35_0.3.txt};

  1321 \end{axis}

  1322 \end{tikzpicture}

  1323 \end{center}

  1324      \end{subfigure}

  1325      \begin{subfigure}[b]{0.55\textwidth}

  1326 \begin{center}

  1327 \begin{tikzpicture}

  1328 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1329 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1330   west},scaled x ticks = false,x tick label style={/pgf/number

  1331   format/1000 sep = \space}]

  1332 %\addplot+[only marks] table {proteinsOrig.txt};

  1333 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};

  1334 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1335         {randGraph/ind/vf2ppInd35_0.8.txt};

  1336 \end{axis}

  1337 \end{tikzpicture}

  1338 \end{center}

  1339 \end{subfigure}

  1340 \vspace*{-0.8cm}

  1341 \caption{IND on graphs having an average degree of

  1342   35.}\label{fig:randIND35}

  1343 \end{figure}

  1346 Based on these experiments, VF2++ is faster than VF2 Plus and able to

  1347 handle really large graphs in milliseconds. Note that when $IND$ was

  1348 considered and the small graphs had proportionally few nodes ($\rho =

  1349 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node

  1350 orders (e.g. see the $\delta=10$ case on

  1351 Figure~\ref{fig:randIND10}). If these instances had been excluded, the

  1352 charts would have seemed to be similar to the other ones.

  1353 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2

  1354 Plus slow slightly down, but remain practically usable even on graphs

  1355 having 10 000 nodes.

  1361 \section{Conclusion}

  1362 This paper presented VF2++, a new graph matching algorithm based on VF2, called VF2++, and analyzed it from a practical viewpoint.

  1364 Recognizing the importance of the node order and determining an

  1365 efficient one, VF2++ is able to match graphs of thousands of nodes in

  1366 near practically linear time including preprocessing. In addition to

  1367 the proper order, VF2++ uses more efficient consistency and cutting

  1368 rules which are easy to compute and make the algorithm able to prune

  1369 most of the unfruitful branches without going astray.

  1371 In order to show the efficiency of the new method, it has been

  1372 compared to VF2 Plus\cite{VF2Plus}, which is the best contemporary algorithm.

  1373 .

  1375 The experiments show that VF2++ consistently outperforms VF2 Plus on

  1376 biological graphs. It seems to be asymptotically faster on protein and

  1377 on contact map graphs in the case of induced subgraph isomorphism,

  1378 while in the case of graph isomorphism, it has definitely better

  1379 asymptotic behaviour on protein graphs.

  1381 Regarding random sparse graphs, not only has VF2++ proved itself to be

  1382 faster than VF2 Plus, but it also has a practically linear behaviour both

  1383 in the case of induced subgraph- and graph isomorphism.

  1385 %%%%%%%%%%%%%%%%

  1386 \section*{Acknowledgement} \label{sec:ack}

  1387 %%%%%%%%%%%%%%%%

  1388 This research project was initiated and sponsored by QuantumBio

  1389 Inc.\cite{QUANTUMBIO}.

  1391 The authors were supported by the Hungarian Scientific Research Fund -

  1392 OTKA, K109240 and by the J\'anos Bolyai Research Fellowship program of

  1393 the Hungarian Academy of Sciences.

  1396 %% The Appendices part is started with the command \appendix;

  1397 %% appendix sections are then done as normal sections

  1398 %% \appendix

  1400 %% \section{}

  1401 %% \label{}

  1403 %% If you have bibdatabase file and want bibtex to generate the

  1404 %% bibitems, please use

  1405 %%

  1406 \bibliographystyle{elsarticle-num} \bibliography{bibliography}

  1408 %% else use the following coding to input the bibitems directly in the

  1409 %% TeX file.

  1411 %% \begin{thebibliography}{00}

  1413 %% %% \bibitem{label}

  1414 %% %% Text of bibliographic item

  1416 %% \bibitem{}

  1418 %% \end{thebibliography}

  1420 \end{document}

  1421 \endinput

  1422 %%

  1423 %% End of file `elsarticle-template-num.tex'.

author	Alpar Juttner <alpar@cs.elte.hu>
	Wed, 30 Nov 2016 23:26:43 +0100
changeset 27	497868c58d36
parent 25	217340b8dec7
parent 26	42fbe17f0e3b
child 28	523fddfd7a01
permissions	-rw-r--r--