lemon-vf2: damecco.tex@b098561f70fe

     1 %%

     2 %% Copyright 2007, 2008, 2009 Elsevier Ltd

     3 %%

     4 %% This file is part of the 'Elsarticle Bundle'.

     5 %% ---------------------------------------------

     6 %%

     7 %% It may be distributed under the conditions of the LaTeX Project Public

     8 %% License, either version 1.2 of this license or (at your option) any

     9 %% later version.  The latest version of this license is in

    10 %%    http://www.latex-project.org/lppl.txt

    11 %% and version 1.2 or later is part of all distributions of LaTeX

    12 %% version 1999/12/01 or later.

    13 %%

    14 %% The list of all files belonging to the 'Elsarticle Bundle' is

    15 %% given in the file `manifest.txt'.

    16 %%

    18 %% Template article for Elsevier's document class `elsarticle'

    19 %% with numbered style bibliographic references

    20 %% SP 2008/03/01

    22 \documentclass[preprint,12pt]{elsarticle}

    24 %% Use the option review to obtain double line spacing

    25 %% \documentclass[authoryear,preprint,review,12pt]{elsarticle}

    27 %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn

    28 %% for a journal layout:

    29 %% \documentclass[final,1p,times]{elsarticle}

    30 %% \documentclass[final,1p,times,twocolumn]{elsarticle}

    31 %% \documentclass[final,3p,times]{elsarticle}

    32 %% \documentclass[final,3p,times,twocolumn]{elsarticle}

    33 %% \documentclass[final,5p,times]{elsarticle}

    34 %% \documentclass[final,5p,times,twocolumn]{elsarticle}

    36 %% For including figures, graphicx.sty has been loaded in

    37 %% elsarticle.cls. If you prefer to use the old commands

    38 %% please give \usepackage{epsfig}

    40 %% The amssymb package provides various useful mathematical symbols

    41 \usepackage{amssymb}

    42 %% The amsthm package provides extended theorem environments

    43 %% \usepackage{amsthm}

    45 %% The lineno packages adds line numbers. Start line numbering with

    46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on

    47 %% for the whole article with \linenumbers.

    48 %% \usepackage{lineno}

    50 \usepackage{amsmath}

    51 %% \usepackage[pdftex]{graphicx}

    53 \usepackage{pgfplots}

    54 \pgfplotsset{width=9cm}

    55 \pgfplotsset{compat=1.8}

    57 \usepackage{caption}

    58 \usepackage{subcaption}

    60 \usepackage{algorithm}

    61 \usepackage{algpseudocode}

    62 \usepackage{tikz}

    64 \usepackage{amsthm,amssymb}

    65 \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}

    67 \newtheorem{theorem}{Theorem}[subsection]

    68 \newtheorem{corollary}{Corollary}[theorem]

    69 \newtheorem{claim}[theorem]{Claim}

    71 \newtheorem{definition}{Definition}[subsection]

    72 \newtheorem{notation}{Notation}[subsection]

    73 \newtheorem{example}{Example}[subsection]

    74 \usetikzlibrary{decorations.markings}

    75 \let\oldproofname=\proofname

    76 %% \renewcommand{\proofname}{\rm\bf{Proof:}}

    78 \captionsetup{font=normalsize}

    80 \journal{Discrete Applied Mathematics}

    82 \begin{document}

    84 \begin{frontmatter}

    86 %% Title, authors and addresses

    88 %% use the tnoteref command within \title for footnotes;

    89 %% use the tnotetext command for theassociated footnote;

    90 %% use the fnref command within \author or \address for footnotes;

    91 %% use the fntext command for theassociated footnote;

    92 %% use the corref command within \author for corresponding author footnotes;

    93 %% use the cortext command for theassociated footnote;

    94 %% use the ead command for the email address,

    95 %% and the form \ead[url] for the home page:

    96 %% \title{Title\tnoteref{label1}}

    97 %% \tnotetext[label1]{}

    98 %% \author{Name\corref{cor1}\fnref{label2}}

    99 %% \ead{email address}

   100 %% \ead[url]{home page}

   101 %% \fntext[label2]{}

   102 %% \cortext[cor1]{}

   103 %% \address{Address\fnref{label3}}

   104 %% \fntext[label3]{}

   106 \title{Improved Algorithms for Matching Biological Graphs}

   108 %% use optional labels to link authors explicitly to addresses:

   109 %% \author[label1,label2]{}

   110 %% \address[label1]{}

   111 %% \address[label2]{}

   113 \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi}

   115 \address{Dept of Operations Research, ELTE}

   117 \begin{abstract}

   118 Subgraph isomorphism is a well-known NP-Complete problem, while its

   119 special case, the graph isomorphism problem is one of the few problems

   120 in NP neither known to be in P nor NP-Complete. Their appearance in

   121 many fields of application such as pattern analysis, computer vision

   122 questions and the analysis of chemical and biological systems has

   123 fostered the design of various algorithms for handling special graph

   124 structures.

   126 This paper presents VF2++, a new algorithm based on the original VF2,

   127 which runs significantly faster on most test cases and performs

   128 especially well on special graph classes stemming from biological

   129 questions. VF2++ handles graphs of thousands of nodes in practically

   130 near linear time including preprocessing. Not only is it an improved

   131 version of VF2, but in fact, it is by far the fastest existing

   132 algorithm especially on biological graphs.

   134 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking

   135 into account the structure and the node labeling of the graph, VF2++

   136 determines a state order in which most of the unfruitful branches of

   137 the search space can be pruned immediately. Secondly, introducing more

   138 efficient - nevertheless still easier to compute - cutting rules

   139 reduces the chance of going astray even further.

   141 In addition to the usual subgraph isomorphism, specialized versions

   142 for induced subgraph isomorphism and for graph isomorphism are

   143 presented. VF2++ has gained a runtime improvement of one order of

   144 magnitude respecting induced subgraph isomorphism and a better

   145 asymptotical behaviour in the case of graph isomorphism problem.

   147 After having provided the description of VF2++, in order to evaluate

   148 its effectiveness, an extensive comparison to the contemporary other

   149 algorithms is shown, using a wide range of inputs, including both real

   150 life biological and chemical datasets and standard randomly generated

   151 graph series.

   153 The work was motivated and sponsored by QuantumBio Inc., and all the

   154 developed algorithms are available as the part of the open source

   155 LEMON graph and network optimization library

   156 (http://lemon.cs.elte.hu).

   157 \end{abstract}

   159 \begin{keyword}

   160 %% keywords here, in the form: keyword \sep keyword

   162 %% PACS codes here, in the form: \PACS code \sep code

   164 %% MSC codes here, in the form: \MSC code \sep code

   165 %% or \MSC[2008] code \sep code (2000 is the default)

   167 \end{keyword}

   169 \end{frontmatter}

   171 %% \linenumbers

   173 %% main text

   174 \section{Introduction}

   175 \label{sec:intro}

   177 In the last decades, combinatorial structures, and especially graphs

   178 have been considered with ever increasing interest, and applied to the

   179 solution of several new and revised questions.  The expressiveness,

   180 the simplicity and the studiedness of graphs make them practical for

   181 modelling and appear constantly in several seemingly independent

   182 fields, such as bioinformatics and chemistry.

   184 Complex biological systems arise from the interaction and cooperation

   185 of plenty of molecular components. Getting acquainted with such

   186 systems at the molecular level is of primary importance, since

   187 protein-protein interaction, DNA-protein interaction, metabolic

   188 interaction, transcription factor binding, neuronal networks, and

   189 hormone signaling networks can be understood this way.

   191 Many chemical and biological structures can easily be modeled

   192 as graphs, for instance, a molecular structure can be

   193 considered as a graph, whose nodes correspond to atoms and whose

   194 edges to chemical bonds. The similarity and dissimilarity of

   195 objects corresponding to nodes are incorporated to the model

   196 by \emph{node labels}. Understanding such networks basically

   197 requires finding specific subgraphs, thus calls for efficient

   198 graph matching algorithms.

   200 Other real-world fields related to some

   201 variants of graph matching include pattern recognition

   202 and machine vision \cite{HorstBunkeApplications}, symbol recognition

   203 \cite{CordellaVentoSymbolRecognition}, face identification

   204 \cite{JianzhuangYongFaceIdentification}.  \\

   206 Subgraph and induced subgraph matching problems are known to be

   207 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is

   208 one of the few problems in NP neither known to be in P nor

   209 NP-Complete. Although polynomial time isomorphism algorithms are known

   210 for various graph classes, like trees and planar

   211 graphs\cite{PlanarGraphIso}, bounded valence

   212 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}

   213 or permutation graphs\cite{PermGraphIso}, and recently, an FPT algorithm has been presented for the coloured hypergraph isomorphism problem in \cite{ColoredHiperGraphIso}.

   215 In the following, some algorithms based on other approaches are

   216 summarized, which do not need any restrictions on the graphs. Even though,

   217 an overall polynomial behaviour is not expectable from such an

   218 alternative, it may often have good practical performance, in fact,

   219 it might be the best choice even on a graph class for which polynomial

   220 algorithm is known.

   222 The first practically usable approach was due to

   223 \emph{Ullmann}\cite{Ullmann} which is a commonly used depth-first

   224 search based algorithm with a complex heuristic for reducing the

   225 number of visited states. A major problem is its $\Theta(n^3)$ space

   226 complexity, which makes it impractical in the case of big sparse

   227 graphs.

   229 In a recent paper, Ullmann\cite{UllmannBit} presents an

   230 improved version of this algorithm based on a bit-vector solution for

   231 the binary Constraint Satisfaction Problem.

   233 The \emph{Nauty} algorithm\cite{Nauty} transforms the two graphs to

   234 a canonical form before starting to check for the isomorphism. It has

   235 been considered as one of the fastest graph isomorphism algorithms,

   236 although graph categories were shown in which it takes exponentially

   237 many steps. This algorithm handles only the graph isomorphism problem.

   239 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search

   240 strategy and formulates the matching as a Constraint Satisfaction

   241 Problem to prune the search tree. The constraints are that the mapping

   242 has to be injective and edge-preserving, hence it is possible to

   243 handle new matching types as well.

   245 The \emph{RI} algorithm\cite{RI} and its variations are based on a

   246 state space representation. After reordering the nodes of the graphs,

   247 it uses some fast executable heuristic checks without using any

   248 complex pruning rules. It seems to run really efficiently on graphs

   249 coming from biology, and won the International Contest on Pattern

   250 Search in Biological Databases\cite{Content}.

   252 The currently most commonly used algorithm is the

   253 \emph{VF2}\cite{VF2}, the improved version of \emph{VF}\cite{VF}, which was

   254 designed for solving pattern matching and computer vision problems,

   255 and has been one of the best overall algorithms for more than a

   256 decade. Although, it can't be up to new specialized algorithms, it is

   257 still widely used due to its simplicity and space efficiency. VF2 uses

   258 a state space representation and checks some conditions in each state

   259 to prune the search tree.

   261 Meanwhile, another variant called \emph{VF2 Plus}\cite{VF2Plus} has

   262 been published. It is considered to be as efficient as the RI

   263 algorithm and has a strictly better behavior on large graphs.  The

   264 main idea of VF2 Plus is to precompute a heuristic node order of the

   265 small graph, in which the VF2 works more efficiently.

   267 This paper introduces \emph{VF2++}, a new further improved algorithm

   268 for the graph and (induced)subgraph isomorphism problem, which uses

   269 efficient cutting rules and determines a node order in which VF2 runs

   270 significantly faster on practical inputs.

   272 This project was initiated and sponsored by QuantumBio

   273 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source

   274 code --- has been published as a part of LEMON\cite{LEMON} open source

   275 graph library.

   277 Outline: Section~\ref{sec:ProbStat} defines the problems to be solved, Section~\ref{sec:VF2Alg} provides a description of VF2, Section~\ref{sec:VF2ppAlg} introduces VF2++, a new graph matching algorithm, Section~\ref{sec:VF2ppImpl} presents the details of an efficient implementation of VF2++, and Section~\ref{sec:ExpRes} compares VF2++ to a state of the art algorithm.

   279 \section{Problem Statement}\label{sec:ProbStat}

   280 This section provides a formal description of the problems to be

   281 solved.

   282 \subsection{Definitions}

   284 Throughout the paper $G_{1}=(V_{1}, E_{1})$ and

   285 $G_{2}=(V_{2}, E_{2})$ denote two undirected graphs.

   287 \begin{definition}

   288 $\mathcal{L}: (V_{1}\cup V_{2}) \longrightarrow K$ is a \textbf{node

   289     label function}, where K is an arbitrary set. The elements in K

   290   are the \textbf{node labels}. Two nodes, u and v are said to be

   291   \textbf{equivalent} if $\mathcal{L}(u)=\mathcal{L}(v)$.

   292 \end{definition}

   294 For the sake of simplicity, in this paper the graph, subgraph and induced subgraph isomorphisms are defined in a more general way.

   296 \begin{definition}\label{sec:ismorphic}

   297 $G_{1}$ and $G_{2}$ are \textbf{isomorphic} (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}:

   298   V_{1} \longrightarrow V_{2}$ bijection, for which the

   299   following is true:

   300 \begin{center}

   301 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and\\

   302 $\forall u,v\in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in{E_{2}}$

   303 \end{center}

   304 \end{definition}

   306 \begin{definition}

   307 $G_{1}$ is a \textbf{subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}:

   308   V_{1}\longrightarrow V_{2}$ injection, for which the

   309   following is true:

   310 \begin{center}

   311 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and\\

   312 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Rightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$

   313 \end{center}

   314 \end{definition}

   316 \begin{definition}

   317 $G_{1}$ is an \textbf{induced subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists

   318   \mathfrak{m}: V_{1}\longrightarrow V_{2}$ injection, for which the

   319   following is true:

   320 \begin{center}

   321 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and

   323 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow

   324   (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$

   325 \end{center}

   326 \end{definition}

   329 \subsection{Common problems}\label{sec:CommProb}

   331 The focus of this paper is on two extensively studied topics, the

   332 subgraph isomorphism and its variations. However, the following

   333 problems also appear in many applications.

   335 The \textbf{subgraph matching problem} is the following: is

   336 $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node

   337 label?

   339 The \textbf{induced subgraph matching problem} asks the same about the

   340 existence of an induced subgraph.

   342 The \textbf{graph isomorphism problem} can be defined as induced

   343 subgraph matching problem where the sizes of the two graphs are equal.

   345 In addition, one may want to find a \textbf{single} mapping or \textbf{enumerate} all of them.

   347 Note that some authors refer to the term

   348 \emph{subgraph isomorphism problem} as an \emph{induced subgraph

   349   isomorphism problem}.

   351 \section{The VF2 Algorithm}\label{sec:VF2Alg}

   352 This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2

   353 is able to handle all the variations mentioned in Section

   354   \ref{sec:CommProb}.  Although it can also handle directed graphs,

   355 for the sake of simplicity, only the undirected case will be

   356 discussed.

   359 \subsection{Common notations}

   360 \indent Assume $G_{1}$ is searched in $G_{2}$.  The following

   361 definitions and notations will be used throughout the whole paper.

   362 \begin{definition}

   363 An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$.

   364 \end{definition}

   366 \begin{notation}

   367 $\mathfrak{D}(f)$ and $\mathfrak{R}(f)$ denote the domain and the range of a function $f$, respectively.

   368 \end{notation}

   370 \begin{definition}

   371 Mapping $\mathfrak{m}$ \textbf{covers} a node $u\in V_1\cup V_2$ if $u\in \mathfrak{D}(\mathfrak{m})\cup \mathfrak{R}(\mathfrak{m})$.

   372 \end{definition}

   374 \begin{definition}

   375 A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the

   376 nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$.

   377 \end{definition}

   379 \begin{definition}

   380 Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : \mathfrak{m}(w)=f(w)$ and $f(u)=v$ holds. Where $u\in V_1\setminus\mathfrak{D}(\mathfrak{m})$ and $v\in V_2\setminus\mathfrak{R}(\mathfrak{m})$, otherwise $extend(\mathfrak{m},(u,v))$ is undefined.

   381 \end{definition}

   383 \begin{notation}

   384 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type

   385 which can be substituted by any of the $\mathbf{ISO}$, $\mathbf{SUB}$

   386 and $\mathbf{IND}$ problems.

   387 \end{notation}

   389 \begin{definition}

   390 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a

   391 \textbf{consistency function by } $\mathbf{PT}$ if the following

   392 holds. If there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$.

   393 \end{definition}

   395 \begin{definition}

   396 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a

   397 \textbf{cutting function by } $\mathbf{PT}$ if the following

   398 holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$.

   399 \end{definition}

   401 \begin{definition}

   402 $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if

   403   $Cons_{PT}(\mathfrak{m})$ is true.

   404 \end{definition}

   406 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.

   407 \begin{notation}

   408 Let $\mathbf{Cons_{PT}(p, \mathfrak{m})}:=Cons_{PT}(extend(\mathfrak{m},p))$, and

   409 $\mathbf{Cut_{PT}(p, \mathfrak{m})}:=Cut_{PT}(extend(\mathfrak{m},p))$, where

   410 $p\in{V_{1}\backslash\mathfrak{D}(\mathfrak{m}) \!\times\!V_{2}\backslash\mathfrak{R}(\mathfrak{m})}$.

   411 \end{notation}

   413 $Cons_{PT}$ will be used to check the consistency of the already

   414 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if

   415 no whole consistent mapping can contain the current mapping.

   417 \subsection{Overview of the algorithm}

   418 VF2 uses a state space representation of mappings, $Cons_{PT}$ for

   419 excluding inconsistency with the problem type and $Cut_{PT}$ for

   420 pruning the search tree.

   422 Algorithm~\ref{alg:VF2Pseu} is a high level description of

   423 the VF2 matching algorithm. Each state of the matching process can

   424 be associated with a mapping $\mathfrak{m}$. The initial state

   425 is associated with a mapping $\mathfrak{m}$, for which

   426 $\mathfrak{D}(\mathfrak{m})=\emptyset$, i.e. it starts with an empty mapping.

   429 \begin{algorithm}

   430 \algtext*{EndIf}%ne nyomtasson end if-et

   431 \algtext*{EndFor}%ne

   432 \algtext*{EndProcedure}%ne nyomtasson ..

   433 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}

   434 \begin{algorithmic}[1]

   436 \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$}

   437   \If{$\mathfrak{m}$ covers

   438     $V_{1}$} \State Output($\mathfrak{m}$)

   439   \Else

   440   \State Compute the set $P_\mathfrak{m}$ of the pairs candidate for inclusion

   441   in $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$

   442     $\neg$Cut$_{PT}$($p,\mathfrak{m}$)}

   443     \State \textbf{call}

   444   VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure

   445 \end{algorithmic}

   446 \end{algorithm}

   449 For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of

   450 candidate node pairs for adding to the current mapping $\mathfrak{m}_s$.

   452 For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and

   453 $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and

   454 the latter is false, the whole process is recursively applied to

   455 $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it

   456 can be proved that $\mathfrak{m}$ can not be extended to a whole mapping.

   458 In order to make sure of the correctness, see

   459 \begin{claim}

   460 Through consistent mappings, only consistent whole mappings can be

   461 reached, and all the consistent whole mappings are reachable through

   462 consistent mappings.

   463 \end{claim}

   465 Note that a mapping may be reached in exponentially many different ways, since the

   466 order of extensions does not influence the nascent mapping.

   468 However, one may observe

   470 \begin{claim}

   471 \label{claim:claimTotOrd}

   472 Let $\prec$ be an arbitrary total ordering relation on $V_{1}$.  If

   473 the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which

   474 \begin{center}

   475 $\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$,

   476 \end{center}

   477 then no mapping can be reached more than once, and each whole mapping remains reachable.

   478 \end{claim}

   480 Note that the cornerstone of the improvements to VF2 is a proper

   481 choice of a total ordering.

   483 \subsection{The candidate set}

   484 \label{candidateComputingVF2}

   485 Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$.

   487 \begin{notation}

   488 Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and

   489  $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$.

   490 \end{notation}

   492 The set $P_\mathfrak{m}$ includes the pairs of uncovered neighbours of covered

   493 nodes, and if there is not such a node pair, all the pairs containing

   494 two uncovered nodes are added. Formally, let

   495 \[

   496  P_\mathfrak{m}\!=\!

   497   \begin{cases}

   498    T_{1}(\mathfrak{m})\times T_{2}(\mathfrak{m})&\hspace{-0.15cm}\text{if }

   499    T_{1}(\mathfrak{m})\!\neq\!\emptyset\ \text{and }T_{2}(\mathfrak{m})\!\neq

   500    \emptyset,\\ (V_{1}\!\setminus\!\mathfrak{D}(\mathfrak{m}))\!\times\!(V_{2}\!\setminus\!\mathfrak{R}(\mathfrak{m}))

   501    &\hspace{-0.15cm}\text{otherwise}.

   502   \end{cases}

   503 \]

   505 \subsection{Consistency}

   506 Suppose $p=(u,v)$, where $u\in V_{1}$ and $v\in V_{2}$, $\mathfrak{m}$ is a consistent mapping by

   507 $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether

   508 including pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$.

   510 For example, the consistency function of induced subgraph isomorphism is as follows.

   511 \begin{notation}

   512 Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} :

   513 (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2}

   514   (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$.

   515 \end{notation}

   517 $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ $\Leftrightarrow

   518 (\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}

   519 \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. The

   520 following formulation gives an efficient way of calculating

   521 $Cons_{IND}$.

   522 \begin{claim}

   523 $Cons_{IND}((u,v),\mathfrak{m}):=\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge

   524   (\forall \tilde{u}\in \Gamma_{1}(u)

   525   \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is a

   526   consistency function in the case of $IND$.

   527 \end{claim}

   529 \subsection{Cutting rules}

   530 $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently

   531 verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can

   532 be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a

   533 whole mapping.

   535 As an example, the cutting function of induced subgraph isomorphism is presented.

   536 \begin{notation}

   537 Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash

   538 \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and

   539 \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash

   540 \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$.

   541 \end{notation}

   543 \begin{claim}

   544 $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| <

   545   |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap

   546   \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap

   547   \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function by $IND$.

   548 \end{claim}

   550 \section{The VF2++ Algorithm}\label{sec:VF2ppAlg}

   551 Although any total ordering relation makes the search space of VF2 a

   552 tree, its choice turns out to dramatically influence the number of

   553 visited states. The goal is to determine an efficient one as quickly

   554 as possible.

   556 The main reason for VF2++' superiority over VF2 is twofold. Firstly,

   557 taking into account the structure and the node labeling of the graph,

   558 VF2++ determines a state order in which most of the unfruitful

   559 branches of the search space can be pruned immediately. Secondly,

   560 introducing more efficient --- nevertheless still easier to compute

   561 --- cutting rules reduces the chance of going astray even further.

   563 In addition to the usual subgraph isomorphism, specialized versions

   564 for induced subgraph isomorphism and for graph isomorphism have been

   565 designed.

   567 Note that a weaker version of the cutting rules and an efficient

   568 candidate set calculating were described in \cite{VF2Plus}.

   570 It should be noted that all the methods described in this section are

   571 extendable to handle directed graphs and edge labels as well.

   572 The basic ideas and the detailed description of VF2++ are provided in

   573 the following.\newline

   575 The goal is to find a matching order in which the algorithm is able to

   576 recognize inconsistency or prune the infeasible branches on the

   577 highest levels and goes deep only if it is needed.

   579 \begin{notation}

   580 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{1}(u)\cap H\}|$, that is the

   581 number of neighbours of u which are in H, where $u\in V_{1} $ and

   582 $H\subseteq V_{1}$.

   583 \end{notation}

   585 The principal question is the following. Suppose a mapping $\mathfrak{m}$ is

   586 given. For which node of $T_{1}(\mathfrak{m})$ is the hardest to find a

   587 consistent pair in $G_{2}$? The more covered neighbours a node in

   588 $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has

   589 ---, the more rarely satisfiable consistency constraints for its pair

   590 are given.

   592 In biology, most of the graphs are sparse, thus several nodes in

   593 $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes

   594 reasonable to define a secondary and a tertiary order between them.

   595 The observation above proves itself to be as determining, that the

   596 secondary ordering prefers nodes with the most uncovered neighbours

   597 among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase

   598 $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes so much, as possible.  The

   599 tertiary ordering prefers nodes having the rarest uncovered labels.

   601 Note that the secondary ordering is the same as the ordering by $deg$,

   602 which is a static data in front of the above used.

   604 These rules can easily result in a matching order which contains the

   605 nodes of a long path successively, whose nodes may have low $Conn$ and

   606 is easily matchable into $G_{2}$. To avoid that, a BFS order is

   607 used, which provides the shortest possible paths.

   608 \newline

   610 In the following, some examples on which the VF2 may be slow are

   611 described, although they are easily solvable by using a proper

   612 matching order.

   614 \begin{example}

   615 Suppose $G_{1}$ can be mapped into $G_{2}$ in many ways

   616 without node labels. Let $u\in V_{1}$ and $v\in V_{2}$.

   617 \newline

   618 $\mathcal{L}(u):=black$

   619 \newline

   620 $\mathcal{L}(v):=black$

   621 \newline

   622 $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in V_{1}\backslash

   623 \{u\}$

   624 \newline

   625 $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in V_{2}\backslash

   626 \{v\}$

   627 \newline

   629 Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since

   630 $u$ is black and no node in $V_{2}$ has a black label except

   631 $v$. If unfortunately $u$ were the last node which will get covered,

   632 VF2 would check only in the last steps, whether $u$ can be matched to

   633 $v$.

   634 \newline

   635 However, had $u$ been the first matched node, u would have been

   636 matched immediately to v, so all the mappings would have been

   637 precluded in which node labels can not correspond.

   638 \end{example}

   640 \begin{example}

   641 Suppose there is no node label given, $G_{1}$ is a small graph and

   642 can not be mapped into $G_{2}$ and $u\in V_{1}$.

   643 \newline

   644 Let $G'_{1}:=(V_{1}\cup

   645 \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup

   646 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,

   647 $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint

   648 from $G_{1}$ and one of its starting points is connected to $u\in

   649 V_{1}\}$.

   650 \newline

   651 Is there a subgraph of $G_{2}$, which is isomorph with

   652 $G'_{1}$?

   653 \newline

   654 If unfortunately the nodes of the path were the first $k$ nodes in the

   655 matching order, the algorithm would iterate through all the possible k

   656 long paths in $G_{2}$, and it would recognize that no path can be

   657 extended to $G'_{1}$.

   658 \newline

   659 However, had it started by the matching of $G_{1}$, it would not

   660 have matched any nodes of the path.

   661 \end{example}

   663 These examples may look artificial, but the same problems also appear

   664 in real-world instances, even though in a less obvious way.

   666 \subsection{Preparations}

   667 \begin{claim}

   668 \label{claim:claimCoverFromLeft}

   669 The total ordering relation uniquely determines a node order, in which

   670 the nodes of $V_{1}$ will be covered by VF2. From the point of

   671 view of the matching procedure, this means, that always the same node

   672 of $G_{1}$ will be covered on the d-th level.

   673 \end{claim}

   675 \begin{definition}

   676 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of

   677 $V_{1}$ is \textbf{matching order} if exists $\prec$ total

   678 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds

   679 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$.

   680 \end{definition}

   682 \begin{claim}\label{claim:MOclaim}

   683 A total ordering is matching order iff the nodes of every component

   684 form an interval in the node sequence, and every node connects to a

   685 previous node in its component except the first node of each component.

   686 \end{claim}

   688 To summing up, a total ordering always uniquely determines a matching

   689 order, and every matching order can be determined by a total ordering,

   690 however, more than one different total orderings may determine the

   691 same matching order.

   693 \subsection{Total ordering}

   694 The matching order will be searched directly.

   695 \begin{notation}

   696 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} :

   697 l=\mathcal{L}(v)\}|-|\{u\in V_{1}\backslash \mathcal{M} : l=\mathcal{L}(u)\}|$ ,

   698 where $l$ is a label and $\mathcal{M}\subseteq V_{1}$.

   699 \end{notation}

   701 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.

   702 \end{definition}

   704 \begin{algorithm}

   705 \algtext*{EndIf}

   706 \algtext*{EndProcedure}

   707 \algtext*{EndWhile}

   708 \algtext*{EndFor}

   709 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}

   710 \begin{algorithmic}[1]

   711 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$

   712 \Comment{matching order} \While{$V_{1}\backslash \mathcal{M}

   713   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg

   714 min$_{F_\mathcal{M}\circ \mathcal{L}}(V_{1}\backslash

   715 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with

   716 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the

   717 $d$-th level \State Process $V_d$ \Comment{See Algorithm

   718   \ref{alg:VF2PPProcess1}} \EndFor

   719 \EndWhile \EndProcedure

   720 \end{algorithmic}

   721 \end{algorithm}

   723 \begin{algorithm}

   724 \algtext*{EndIf}

   725 \algtext*{EndProcedure}%ne nyomtasson ..

   726 \algtext*{EndWhile}

   727 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}

   728 \begin{algorithmic}[1]

   729 \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$}

   730 \State $m\in$ arg min$_{F_{\mathcal{M}\circ\ \mathcal{L}}}($ arg max$_{deg}($arg

   731 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$

   732 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh

   733 $F_\mathcal{M}$ \EndWhile \EndProcedure

   734 \end{algorithmic}

   735 \end{algorithm}

   737 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the

   738 matching order procedure of VF2++. It computes a BFS tree for each

   739 component in ascending order of their rarest node labels and largest $deg$,

   740 whose root vertex is the component's minimal

   741 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending

   742 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately

   743 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.

   745 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}

   746 provides a matching order.

   749 \subsection{Cutting rules}

   750 \label{VF2PPCuttingRules}

   751 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels.

   752 \begin{notation}

   753 Let $\mathbf{\Gamma_{1}^{l}(u)}:=\{\tilde{u} : \mathcal{L}(\tilde{u})=l

   754 \wedge \tilde{u}\in \Gamma_{1} (u)\}$ and

   755 $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge

   756 \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in

   757 V_{2}$ and $l$ is a label.

   758 \end{notation}

   760 \subsubsection{Induced subgraph isomorphism}

   761 \begin{claim}

   762 \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by IND.

   763 \end{claim}

   764 \subsubsection{Graph isomorphism}

   765 \begin{claim}

   766 \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by ISO.

   767 \end{claim}

   769 \subsubsection{Subgraph isomorphism}

   770 \begin{claim}

   771 \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function by SUB.

   772 \end{claim}

   776 \section{Implementation details}\label{sec:VF2ppImpl}

   777 This section provides a detailed summary of an efficient

   778 implementation of VF2++.

   779 \subsection{Storing a mapping}

   780 After fixing an arbitrary node order ($u_0, u_1, ..,

   781 u_{|G_{1}|-1}$) of $G_{1}$, an array $M$ is usable to store

   782 the current mapping in the following way.

   783 \[

   784  M[i] =

   785   \begin{cases}

   786    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D &

   787    if\ no\ node\ has\ been\ mapped\ to\ u_i,

   788   \end{cases}

   789 \]

   790 where $i\in\{0,1, ..,|G_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$

   791 means "no node".

   792 \subsection{Avoiding the recurrence}

   793 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized

   794 as a \textit{while loop}, which has a loop counter $depth$ denoting the

   795 all-time depth of the recursion. Fixing a matching order, let $M$

   796 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},

   797 $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before

   798 $depth$. $M[depth]$ changes

   799 while the state is being processed, but the property is held before

   800 both stepping back to a predecessor state and exploring a successor

   801 state.

   803 The necessary part of the candidate set is easily maintainable or

   804 computable by following

   805 Section~\ref{candidateComputingVF2}. A much faster method

   806 has been designed for biological- and sparse graphs, see the next

   807 section for details.

   809 \subsection{Calculating the candidates for a node}

   810 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the

   811 task is not to maintain the candidate set, but to generate the

   812 candidate nodes in $G_{2}$ for a given node $u\in V_{1}$.  In

   813 case of any of the three problem types and a mapping $\mathfrak{m}$, if a node $v\in

   814 V_{2}$ is a potential pair of $u\in V_{1}$, then $\forall

   815 u'\in \mathfrak{D}(\mathfrak{m}) : (u,u')\in

   816 E_{1}\Rightarrow (v,\mathfrak{m}(u'))\in

   817 E_{2}$. That is, each covered neighbour of $u$ has to be mapped to

   818 a covered neighbour of $v$.

   820 Having said that, an algorithm running in $\Theta(deg)$ time is

   821 describable if there exists a covered node in the component containing

   822 $u$, and a linear one otherwise.

   825 \subsection{Determining the node order}

   826 This section describes how the node order preprocessing method of

   827 VF2++ can efficiently be implemented.

   829 For using lookup tables, the node labels are associated with the

   830 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It

   831 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order

   832 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes

   833 in $V_{1}$ having label $i$, which is easy to compute in

   834 $\Theta(|V_{1}|)$ steps.

   836 Representing $\mathcal{M}\subseteq V_{1}$ as an array of

   837 size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.

   839 \subsection{Cutting rules}

   840 In Section~\ref{VF2PPCuttingRules}, the cutting rules were

   841 described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$

   842 and $\tilde T_{2}$, which are dependent on the all-time mapping

   843 (i.e. on the all-time state). The aim is to check the labeled cutting

   844 rules of VF2++ in $\Theta(deg)$ time.

   846 Firstly, suppose that these four sets are given in such a way, that

   847 checking whether a node is in a certain set takes constant time,

   848 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an

   849 initially zero integer lookup table of size $|K|$. After incrementing

   850 $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and

   851 decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap

   852 T_{2}(s)$, the first part of the cutting rules is checkable in

   853 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$

   854 to zero takes $\Theta(deg)$ time again, which makes it possible to use

   855 the same table through the whole algorithm. The second part of the

   856 cutting rules can be verified using the same method with $\tilde

   857 T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and

   858 $T_{2}$. Thus, the overall complexity is $\Theta(deg)$.

   860 Another integer lookup table storing the number of covered neighbours

   861 of each node in $G_{2}$ gives all the information about the sets

   862 $T_{2}$ and $\tilde T_{2}$, which is maintainable in

   863 $\Theta(deg)$ time when a pair is added or substracted by incrementing

   864 or decrementing the proper indices. A further improvement is that the

   865 values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on

   866 $u$, i.e. on the size of the mapping, so for each $u\in V_{1}$ an

   867 array of pairs (label, number of such labels) can be stored to skip

   868 the maintaining operations. Note that these arrays are at most of size

   869 $deg$.

   871 Using similar techniques, the consistency function can be evaluated in

   872 $\Theta(deg)$ steps, as well.

   874 \section{Experimental results}\label{sec:ExpRes}

   875 This section compares the performance of VF2++ and VF2 Plus. According to

   876 our experience, both algorithms run faster than VF2 with orders of

   877 magnitude, thus its inclusion was not reasonable.

   879 The algorithms were implemented in C++ using the open source

   880 LEMON graph and network optimization library\cite{LEMON}. The test were carried out on a linux based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM.

   881 \subsection{Biological graphs}

   882 The tests have been executed on a recent biological dataset created

   883 for the International Contest on Pattern Search in Biological

   884 Databases\cite{Content}, which has been constructed of molecule,

   885 protein and contact map graphs extracted from the Protein Data

   886 Bank\cite{ProteinDataBank}.

   888 The molecule dataset contains small graphs with less than 100 nodes

   889 and an average degree of less than 3. The protein dataset contains

   890 graphs having 500-10 000 nodes and an average degree of 4, while the

   891 contact map dataset contains graphs with 150-800 nodes and an average

   892 degree of 20.  \\

   894 In the following, both the induced subgraph isomorphism and the graph

   895 isomorphism will be examined.

   897 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}.

   899 In an other experiment, the nodes of each graph in the database had been

   900 shuffled, and an isomorphism between the shuffled and the original

   901 graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}.

   904 \begin{figure}[H]

   905 \vspace*{-2cm}

   906 \hspace*{-1.5cm}

   907 \begin{subfigure}[b]{0.55\textwidth}

   908 \begin{figure}[H]

   909 \begin{tikzpicture}[trim axis left, trim axis right]

   910 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

   911 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

   912   west},scaled x ticks = false,x tick label style={/pgf/number

   913   format/1000 sep = \thinspace}]

   914 %\addplot+[only marks] table {proteinsOrig.txt};

   915 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark

   916   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};

   917 \end{axis}

   918 \end{tikzpicture}

   919 \caption{In the case of molecules, the algorithms have

   920   similar behaviour, but VF2++ is almost two times faster even on such

   921   small graphs.} \label{fig:INDMolecule}

   922 \end{figure}

   923 \end{subfigure}

   924 \hspace*{1.5cm}

   925 \begin{subfigure}[b]{0.55\textwidth}

   926 \begin{figure}[H]

   927 \begin{tikzpicture}[trim axis left, trim axis right]

   928 \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

   929 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

   930   west},scaled x ticks = false,x tick label style={/pgf/number

   931   format/1000 sep = \thinspace}]

   932 %\addplot+[only marks] table {proteinsOrig.txt};

   933 \addplot table {Orig/ContactMaps.128.txt};

   934 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

   935         {VF2PPLabel/ContactMaps.128.txt};

   936 \end{axis}

   937 \end{tikzpicture}

   938 \caption{On contact maps, VF2++ runs almost in constant time, while VF2

   939   Plus has a near linear behaviour.} \label{fig:INDContact}

   940 \end{figure}

   941 \end{subfigure}

   943 \begin{center}

   944 \vspace*{-0.5cm}

   945 \begin{subfigure}[b]{0.55\textwidth}

   946 \begin{figure}[H]

   947 \begin{tikzpicture}[trim axis left, trim axis right]

   948   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

   949   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

   950     west},scaled x ticks = false,x tick label style={/pgf/number

   951     format/1000 sep = \thinspace}] %\addplot+[only marks] table

   952     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]

   953     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark

   954       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};

   955   \end{axis}

   956   \end{tikzpicture}

   957 \caption{Both the algorithms have linear behaviour on protein

   958   graphs. VF2++ is more than 10 times faster than VF2

   959   Plus.} \label{fig:INDProt}

   960 \end{figure}

   961 \end{subfigure}

   962 \end{center}

   963 \vspace*{-0.5cm}

   964 \caption{\normalsize{Induced subgraph isomorphism on biological graphs}}\label{fig:bioIND}

   965 \end{figure}

   968 \begin{figure}[H]

   969 \vspace*{-2cm}

   970 \hspace*{-1.5cm}

   971 \begin{subfigure}[b]{0.55\textwidth}

   972 \begin{figure}[H]

   973 \begin{tikzpicture}[trim axis left, trim axis right]

   974 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

   975 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

   976   west},scaled x ticks = false,x tick label style={/pgf/number

   977   format/1000 sep = \thinspace}]

   978 %\addplot+[only marks] table {proteinsOrig.txt};

   979 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark

   980   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};

   981 \end{axis}

   982 \end{tikzpicture}

   983 \caption{In the case of molecules, there is not such a significant

   984   difference, but VF2++ seems to be faster as the number of nodes

   985   increases.}\label{fig:ISOMolecule}

   986 \end{figure}

   987 \end{subfigure}

   988 \hspace*{1.5cm}

   989 \begin{subfigure}[b]{0.55\textwidth}

   990 \begin{figure}[H]

   991 \begin{tikzpicture}[trim axis left, trim axis right]

   992 \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

   993 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

   994   west},scaled x ticks = false,x tick label style={/pgf/number

   995   format/1000 sep = \thinspace}]

   996 %\addplot+[only marks] table {proteinsOrig.txt};

   997 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark

   998   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};

   999 \end{axis}

  1000 \end{tikzpicture}

  1001 \caption{The results are closer to each other on contact maps, but

  1002   VF2++ still performs consistently better.}\label{fig:ISOContact}

  1003 \end{figure}

  1004 \end{subfigure}

  1006 \begin{center}

  1007 \vspace*{-0.5cm}

  1008 \begin{subfigure}[b]{0.55\textwidth}

  1009 \begin{figure}[H]

  1010 \begin{tikzpicture}[trim axis left, trim axis right]

  1011 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

  1012 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1013   west},scaled x ticks = false,x tick label style={/pgf/number

  1014   format/1000 sep = \thinspace}]

  1015 %\addplot+[only marks] table {proteinsOrig.txt};

  1016 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark

  1017   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};

  1018 \end{axis}

  1019 \end{tikzpicture}

  1020 \caption{On protein graphs, VF2 Plus has a super linear time

  1021   complexity, while VF2++ runs in near constant time. The difference

  1022   is about two order of magnitude on large graphs.}\label{fig:ISOProt}

  1023 \end{figure}

  1024 \end{subfigure}

  1025 \end{center}

  1026 \vspace*{-0.6cm}

  1027 \caption{\normalsize{Graph isomorphism on biological graphs}}\label{fig:bioISO}

  1028 \end{figure}

  1033 \subsection{Random graphs}

  1034 This section compares VF2++ with VF2 Plus on random graphs of a large

  1035 size. The node labels are uniformly distributed.  Let $\delta$ denote

  1036 the average degree.  For the parameters of problems solved in the

  1037 experiments, please see the top of each chart.

  1038 \subsubsection{Graph isomorphism}

  1039 To evaluate the efficiency of the algorithms in the case of graph

  1040 isomorphism, random connected graphs of less than 20 000 nodes have been

  1041 considered. Generating a random graph and shuffling its nodes, an

  1042 isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results

  1043 on graph sets of various density.

  1048 \begin{figure}

  1049 \vspace*{-1.5cm}

  1050 \hspace*{-1.5cm}

  1051 \begin{subfigure}[b]{0.55\textwidth}

  1052 \begin{center}

  1053 \begin{tikzpicture}

  1054 \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1055 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1056   west},scaled x ticks = false,x tick label style={/pgf/number

  1057   format/1000 sep = \space}]

  1058 %\addplot+[only marks] table {proteinsOrig.txt};

  1059 \addplot table {randGraph/iso/vf2pIso5_1.txt};

  1060 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1061         {randGraph/iso/vf2ppIso5_1.txt};

  1062 \end{axis}

  1063 \end{tikzpicture}

  1064 \end{center}

  1065 \end{subfigure}

  1066 %\hspace{1cm}

  1067 \begin{subfigure}[b]{0.55\textwidth}

  1068 \begin{center}

  1069 \begin{tikzpicture}

  1070 \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1071 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1072   west},scaled x ticks = false,x tick label style={/pgf/number

  1073   format/1000 sep = \space}]

  1074 %\addplot+[only marks] table {proteinsOrig.txt};

  1075 \addplot table {randGraph/iso/vf2pIso10_1.txt};

  1076 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1077         {randGraph/iso/vf2ppIso10_1.txt};

  1078 \end{axis}

  1079 \end{tikzpicture}

  1080 \end{center}

  1081 \end{subfigure}

  1082 %%\hspace{1cm}

  1083 \hspace*{-1.5cm}

  1084 \begin{subfigure}[b]{0.55\textwidth}

  1085 \begin{center}

  1086 \begin{tikzpicture}

  1087 \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1088 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1089   west},scaled x ticks = false,x tick label style={/pgf/number

  1090   format/1000 sep = \space}]

  1091 %\addplot+[only marks] table {proteinsOrig.txt};

  1092 \addplot table {randGraph/iso/vf2pIso15_1.txt};

  1093 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1094         {randGraph/iso/vf2ppIso15_1.txt};

  1095 \end{axis}

  1096 \end{tikzpicture}

  1097 \end{center}

  1098      \end{subfigure}

  1099      \begin{subfigure}[b]{0.55\textwidth}

  1100 \begin{center}

  1101 \begin{tikzpicture}

  1102 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1103 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1104   west},scaled x ticks = false,x tick label style={/pgf/number

  1105   format/1000 sep = \thinspace}]

  1106 %\addplot+[only marks] table {proteinsOrig.txt};

  1107 \addplot table {randGraph/iso/vf2pIso100_1.txt};

  1108 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1109         {randGraph/iso/vf2ppIso100_1.txt};

  1110 \end{axis}

  1111 \end{tikzpicture}

  1112 \end{center}

  1113 \end{subfigure}

  1114 \vspace*{-0.8cm}

  1115 \caption{ISO on random graphs.

  1116 }\label{fig:randISO}

  1117 \end{figure}

  1120 \subsubsection{Induced subgraph isomorphism}

  1121 This section presents a comparison of VF2++ and VF2 Plus in the case

  1122 of induced subgraph isomorphism. In addition to the size of the large

  1123 graph, that of the small graph dramatically influences the hardness of

  1124 a given problem too, so the overall picture is provided by examining

  1125 small graphs of various size.

  1127 For each chart, a number $0<\rho< 1$ has been fixed, and the following

  1128 has been executed 150 times. Generating a large graph $G_{2}$ of an average degree of $\delta$,

  1129 choose 10 of its induced subgraphs having $\rho\ |V_{2}|$ nodes,

  1130 and for all the 10 subgraphs find a mapping by using both the graph

  1131 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,

  1132 0.3, 0.8$ cases have been examined, see

  1133 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and

  1134 \ref{fig:randIND35}.

  1140 \begin{figure}

  1141 \vspace*{-1.5cm}

  1142 \hspace*{-1.5cm}

  1143 \begin{subfigure}[b]{0.55\textwidth}

  1144 \begin{center}

  1145 \begin{tikzpicture}

  1146 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1147 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1148   west},scaled x ticks = false,x tick label style={/pgf/number

  1149   format/1000 sep = \space}]

  1150 %\addplot+[only marks] table {proteinsOrig.txt};

  1151 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};

  1152 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1153         {randGraph/ind/vf2ppInd5_0.05.txt};

  1154 \end{axis}

  1155 \end{tikzpicture}

  1156 \end{center}

  1157      \end{subfigure}

  1158      \begin{subfigure}[b]{0.55\textwidth}

  1159 \begin{center}

  1160 \begin{tikzpicture}

  1161 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1162 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1163   west},scaled x ticks = false,x tick label style={/pgf/number

  1164   format/1000 sep = \space}]

  1165 %\addplot+[only marks] table {proteinsOrig.txt};

  1166 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};

  1167 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1168         {randGraph/ind/vf2ppInd5_0.1.txt};

  1169 \end{axis}

  1170 \end{tikzpicture}

  1171 \end{center}

  1172 \end{subfigure}

  1173 \hspace*{-1.5cm}

  1174 \begin{subfigure}[b]{0.55\textwidth}

  1175 \begin{center}

  1176 \begin{tikzpicture}

  1177 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1178 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1179   west},scaled x ticks = false,x tick label style={/pgf/number

  1180   format/1000 sep = \space}]

  1181 %\addplot+[only marks] table {proteinsOrig.txt};

  1182 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};

  1183 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1184         {randGraph/ind/vf2ppInd5_0.3.txt};

  1185 \end{axis}

  1186 \end{tikzpicture}

  1187 \end{center}

  1188      \end{subfigure}

  1189      \begin{subfigure}[b]{0.55\textwidth}

  1190 \begin{center}

  1191 \begin{tikzpicture}

  1192 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1193 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1194   west},scaled x ticks = false,x tick label style={/pgf/number

  1195   format/1000 sep = \space}]

  1196 %\addplot+[only marks] table {proteinsOrig.txt};

  1197 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};

  1198 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1199         {randGraph/ind/vf2ppInd5_0.8.txt};

  1200 \end{axis}

  1201 \end{tikzpicture}

  1202 \end{center}

  1203 \end{subfigure}

  1204 \vspace*{-0.8cm}

  1205 \caption{IND on graphs having an average degree of

  1206   5.}\label{fig:randIND5}

  1207 \end{figure}

  1210 \begin{figure}

  1211 \vspace*{-1.5cm}

  1212 \hspace*{-1.5cm}

  1213 \begin{subfigure}[b]{0.55\textwidth}

  1214 \begin{center}

  1215 \hspace*{-0.5cm}

  1216 \begin{tikzpicture}

  1217 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1218 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1219   west},scaled x ticks = false,x tick label style={/pgf/number

  1220   format/1000 sep = \space}]

  1221 %\addplot+[only marks] table {proteinsOrig.txt};

  1222 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};

  1223 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1224         {randGraph/ind/vf2ppInd10_0.05.txt};

  1225 \end{axis}

  1226 \end{tikzpicture}

  1227 \end{center}

  1228      \end{subfigure}

  1229      \begin{subfigure}[b]{0.55\textwidth}

  1230 \begin{center}

  1231      \hspace*{-0.5cm}

  1232 \begin{tikzpicture}

  1233 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1234 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1235   west},scaled x ticks = false,x tick label style={/pgf/number

  1236   format/1000 sep = \space}]

  1237 %\addplot+[only marks] table {proteinsOrig.txt};

  1238 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};

  1239 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1240         {randGraph/ind/vf2ppInd10_0.1.txt};

  1241 \end{axis}

  1242 \end{tikzpicture}

  1243 \end{center}

  1244 \end{subfigure}

  1245 \hspace*{-1.5cm}

  1246 \begin{subfigure}[b]{0.55\textwidth}

  1247 \begin{center}

  1248 \begin{tikzpicture}

  1249 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1250 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1251   west},scaled x ticks = false,x tick label style={/pgf/number

  1252   format/1000 sep = \space}]

  1253 %\addplot+[only marks] table {proteinsOrig.txt};

  1254 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};

  1255 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1256         {randGraph/ind/vf2ppInd10_0.3.txt};

  1257 \end{axis}

  1258 \end{tikzpicture}

  1259 \end{center}

  1260      \end{subfigure}

  1261      \begin{subfigure}[b]{0.55\textwidth}

  1262 \begin{center}

  1263 \begin{tikzpicture}

  1264 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1265 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1266   west},scaled x ticks = false,x tick label style={/pgf/number

  1267   format/1000 sep = \space}]

  1268 %\addplot+[only marks] table {proteinsOrig.txt};

  1269 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};

  1270 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1271         {randGraph/ind/vf2ppInd10_0.8.txt};

  1272 \end{axis}

  1273 \end{tikzpicture}

  1274 \end{center}

  1275 \end{subfigure}

  1276 \vspace*{-0.8cm}

  1277 \caption{IND on graphs having an average degree of

  1278   10.}\label{fig:randIND10}

  1279 \end{figure}

  1283 \begin{figure}

  1284 \vspace*{-1.5cm}

  1285 \hspace*{-1.5cm}

  1286 \begin{subfigure}[b]{0.55\textwidth}

  1287 \begin{center}

  1288 \begin{tikzpicture}

  1289 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1290 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1291   west},scaled x ticks = false,x tick label style={/pgf/number

  1292   format/1000 sep = \space}]

  1293 %\addplot+[only marks] table {proteinsOrig.txt};

  1294 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};

  1295 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1296         {randGraph/ind/vf2ppInd35_0.05.txt};

  1297 \end{axis}

  1298 \end{tikzpicture}

  1299 \end{center}

  1300      \end{subfigure}

  1301      \begin{subfigure}[b]{0.55\textwidth}

  1302 \begin{center}

  1303 \begin{tikzpicture}

  1304 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1305 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1306   west},scaled x ticks = false,x tick label style={/pgf/number

  1307   format/1000 sep = \space}]

  1308 %\addplot+[only marks] table {proteinsOrig.txt};

  1309 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};

  1310 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1311         {randGraph/ind/vf2ppInd35_0.1.txt};

  1312 \end{axis}

  1313 \end{tikzpicture}

  1314 \end{center}

  1315 \end{subfigure}

  1316 \hspace*{-1.5cm}

  1317 \begin{subfigure}[b]{0.55\textwidth}

  1318 \begin{center}

  1319 \begin{tikzpicture}

  1320 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1321 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1322   west},scaled x ticks = false,x tick label style={/pgf/number

  1323   format/1000 sep = \space}]

  1324 %\addplot+[only marks] table {proteinsOrig.txt};

  1325 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};

  1326 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1327         {randGraph/ind/vf2ppInd35_0.3.txt};

  1328 \end{axis}

  1329 \end{tikzpicture}

  1330 \end{center}

  1331      \end{subfigure}

  1332      \begin{subfigure}[b]{0.55\textwidth}

  1333 \begin{center}

  1334 \begin{tikzpicture}

  1335 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1336 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1337   west},scaled x ticks = false,x tick label style={/pgf/number

  1338   format/1000 sep = \space}]

  1339 %\addplot+[only marks] table {proteinsOrig.txt};

  1340 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};

  1341 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1342         {randGraph/ind/vf2ppInd35_0.8.txt};

  1343 \end{axis}

  1344 \end{tikzpicture}

  1345 \end{center}

  1346 \end{subfigure}

  1347 \vspace*{-0.8cm}

  1348 \caption{IND on graphs having an average degree of

  1349   35.}\label{fig:randIND35}

  1350 \end{figure}

  1353 Based on these experiments, VF2++ is faster than VF2 Plus and able to

  1354 handle really large graphs in milliseconds. Note that when $IND$ was

  1355 considered and the small graphs had proportionally few nodes ($\rho =

  1356 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node

  1357 orders (e.g. see the $\delta=10$ case on

  1358 Figure~\ref{fig:randIND10}). If these instances had been excluded, the

  1359 charts would have seemed to be similar to the other ones.

  1360 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2

  1361 Plus slow slightly down, but remain practically usable even on graphs

  1362 having 10 000 nodes.

  1368 \section{Conclusion}

  1369 This paper presented VF2++, a new graph matching algorithm based on VF2, called VF2++, and analyzed it from a practical viewpoint.

  1371 Recognizing the importance of the node order and determining an

  1372 efficient one, VF2++ is able to match graphs of thousands of nodes in

  1373 near practically linear time including preprocessing. In addition to

  1374 the proper order, VF2++ uses more efficient consistency and cutting

  1375 rules which are easy to compute and make the algorithm able to prune

  1376 most of the unfruitful branches without going astray.

  1378 In order to show the efficiency of the new method, it has been

  1379 compared to VF2 Plus\cite{VF2Plus}, which is the best contemporary algorithm.

  1380 .

  1382 The experiments show that VF2++ consistently outperforms VF2 Plus on

  1383 biological graphs. It seems to be asymptotically faster on protein and

  1384 on contact map graphs in the case of induced subgraph isomorphism,

  1385 while in the case of graph isomorphism, it has definitely better

  1386 asymptotic behaviour on protein graphs.

  1388 Regarding random sparse graphs, not only has VF2++ proved itself to be

  1389 faster than VF2 Plus, but it also has a practically linear behaviour both

  1390 in the case of induced subgraph- and graph isomorphism.

  1394 %% The Appendices part is started with the command \appendix;

  1395 %% appendix sections are then done as normal sections

  1396 %% \appendix

  1398 %% \section{}

  1399 %% \label{}

  1401 %% If you have bibdatabase file and want bibtex to generate the

  1402 %% bibitems, please use

  1403 %%

  1404 \bibliographystyle{elsarticle-num} \bibliography{bibliography}

  1406 %% else use the following coding to input the bibitems directly in the

  1407 %% TeX file.

  1409 %% \begin{thebibliography}{00}

  1411 %% %% \bibitem{label}

  1412 %% %% Text of bibliographic item

  1414 %% \bibitem{}

  1416 %% \end{thebibliography}

  1418 \end{document}

  1419 \endinput

  1420 %%

  1421 %% End of file `elsarticle-template-num.tex'.

author	Madarasi Peter
	Wed, 30 Nov 2016 21:51:10 +0100
changeset 23	b098561f70fe
parent 22	1a4874982d84
child 24	bdf97dafabfb
permissions	-rw-r--r--