lemon-vf2: damecco.tex@a21760ed63d6

     1 %%

     2 %% Copyright 2007, 2008, 2009 Elsevier Ltd

     3 %%

     4 %% This file is part of the 'Elsarticle Bundle'.

     5 %% ---------------------------------------------

     6 %%

     7 %% It may be distributed under the conditions of the LaTeX Project Public

     8 %% License, either version 1.2 of this license or (at your option) any

     9 %% later version.  The latest version of this license is in

    10 %%    http://www.latex-project.org/lppl.txt

    11 %% and version 1.2 or later is part of all distributions of LaTeX

    12 %% version 1999/12/01 or later.

    13 %%

    14 %% The list of all files belonging to the 'Elsarticle Bundle' is

    15 %% given in the file `manifest.txt'.

    16 %%

    18 %% Template article for Elsevier's document class `elsarticle'

    19 %% with numbered style bibliographic references

    20 %% SP 2008/03/01

    22 \documentclass[preprint,12pt]{elsarticle}

    24 %% Use the option review to obtain double line spacing

    25 %% \documentclass[authoryear,preprint,review,12pt]{elsarticle}

    27 %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn

    28 %% for a journal layout:

    29 %% \documentclass[final,1p,times]{elsarticle}

    30 %% \documentclass[final,1p,times,twocolumn]{elsarticle}

    31 %% \documentclass[final,3p,times]{elsarticle}

    32 %% \documentclass[final,3p,times,twocolumn]{elsarticle}

    33 %% \documentclass[final,5p,times]{elsarticle}

    34 %% \documentclass[final,5p,times,twocolumn]{elsarticle}

    36 %% For including figures, graphicx.sty has been loaded in

    37 %% elsarticle.cls. If you prefer to use the old commands

    38 %% please give \usepackage{epsfig}

    40 %% The amssymb package provides various useful mathematical symbols

    41 \usepackage{amssymb}

    42 %% The amsthm package provides extended theorem environments

    43 %% \usepackage{amsthm}

    45 %% The lineno packages adds line numbers. Start line numbering with

    46 %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on

    47 %% for the whole article with \linenumbers.

    48 %% \usepackage{lineno}

    50 \usepackage{amsmath}

    51 %% \usepackage[pdftex]{graphicx}

    53 \usepackage{pgfplots}

    54 \pgfplotsset{width=9cm}

    55 \pgfplotsset{compat=1.8}

    57 \usepackage{caption}

    58 \usepackage{subcaption}

    60 \usepackage{algorithm}

    61 \usepackage{algpseudocode}

    62 \usepackage{tikz}

    64 \usepackage{amsthm,amssymb}

    65 \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}

    67 \newtheorem{theorem}{Theorem}[subsection]

    68 \newtheorem{corollary}{Corollary}[theorem]

    69 \newtheorem{claim}[theorem]{Claim}

    71 \newtheorem{definition}{Definition}[subsection]

    72 \newtheorem{notation}{Notation}[subsection]

    73 \newtheorem{example}{Example}[subsection]

    74 \usetikzlibrary{decorations.markings}

    75 \let\oldproofname=\proofname

    76 %% \renewcommand{\proofname}{\rm\bf{Proof:}}

    78 \captionsetup{font=normalsize}

    80 \journal{Discrete Applied Mathematics}

    82 \begin{document}

    84 \begin{frontmatter}

    86 %% Title, authors and addresses

    88 %% use the tnoteref command within \title for footnotes;

    89 %% use the tnotetext command for theassociated footnote;

    90 %% use the fnref command within \author or \address for footnotes;

    91 %% use the fntext command for theassociated footnote;

    92 %% use the corref command within \author for corresponding author footnotes;

    93 %% use the cortext command for theassociated footnote;

    94 %% use the ead command for the email address,

    95 %% and the form \ead[url] for the home page:

    96 %% \title{Title\tnoteref{label1}}

    97 %% \tnotetext[label1]{}

    98 %% \author{Name\corref{cor1}\fnref{label2}}

    99 %% \ead{email address}

   100 %% \ead[url]{home page}

   101 %% \fntext[label2]{}

   102 %% \cortext[cor1]{}

   103 %% \address{Address\fnref{label3}}

   104 %% \fntext[label3]{}

   106 \title{Improved Algorithms for Matching Biological Graphs}

   108 %% use optional labels to link authors explicitly to addresses:

   109 %% \author[label1,label2]{}

   110 %% \address[label1]{}

   111 %% \address[label2]{}

   113 \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi}

   115 \address{Dept of Operations Research, ELTE}

   117 \begin{abstract}

   118 Subgraph isomorphism is a well-known NP-Complete problem, while its

   119 special case, the graph isomorphism problem is one of the few problems

   120 in NP neither known to be in P nor NP-Complete. Their appearance in

   121 many fields of application such as pattern analysis, computer vision

   122 questions and the analysis of chemical and biological systems has

   123 fostered the design of various algorithms for handling special graph

   124 structures.

   126 The idea of using state space representation and checking some

   127 conditions in each state to prune the search tree has made the VF2

   128 algorithm one of the state of the art graph matching algorithms for

   129 more than a decade. Recently, biological questions of ever increasing

   130 importance have required more efficient, specialized algorithms.

   132 This paper presents VF2++, a new algorithm based on the original VF2,

   133 which runs significantly faster on most test cases and performs

   134 especially well on special graph classes stemming from biological

   135 questions. VF2++ handles graphs of thousands of nodes in practically

   136 near linear time including preprocessing. Not only is it an improved

   137 version of VF2, but in fact, it is by far the fastest existing

   138 algorithm regarding biological graphs.

   140 The reason for VF2++' superiority over VF2 is twofold. Firstly, taking

   141 into account the structure and the node labeling of the graph, VF2++

   142 determines a state order in which most of the unfruitful branches of

   143 the search space can be pruned immediately. Secondly, introducing more

   144 efficient - nevertheless still easier to compute - cutting rules

   145 reduces the chance of going astray even further.

   147 In addition to the usual subgraph isomorphism, specialized versions

   148 for induced subgraph isomorphism and for graph isomorphism are

   149 presented. VF2++ has gained a runtime improvement of one order of

   150 magnitude respecting induced subgraph isomorphism and a better

   151 asymptotical behaviour in the case of graph isomorphism problem.

   153 After having provided the description of VF2++, in order to evaluate

   154 its effectiveness, an extensive comparison to the contemporary other

   155 algorithms is shown, using a wide range of inputs, including both real

   156 life biological and chemical datasets and standard randomly generated

   157 graph series.

   159 The work was motivated and sponsored by QuantumBio Inc., and all the

   160 developed algorithms are available as the part of the open source

   161 LEMON graph and network optimization library

   162 (http://lemon.cs.elte.hu).

   163 \end{abstract}

   165 \begin{keyword}

   166 %% keywords here, in the form: keyword \sep keyword

   168 %% PACS codes here, in the form: \PACS code \sep code

   170 %% MSC codes here, in the form: \MSC code \sep code

   171 %% or \MSC[2008] code \sep code (2000 is the default)

   173 \end{keyword}

   175 \end{frontmatter}

   177 %% \linenumbers

   179 %% main text

   180 \section{Introduction}

   181 \label{sec:intro}

   183 In the last decades, combinatorial structures, and especially graphs

   184 have been considered with ever increasing interest, and applied to the

   185 solution of several new and revised questions.  The expressiveness,

   186 the simplicity and the studiedness of graphs make them practical for

   187 modelling and appear constantly in several seemingly independent

   188 fields.  Bioinformatics and chemistry are amongst the most relevant

   189 and most important fields.

   191 Complex biological systems arise from the interaction and cooperation

   192 of plenty of molecular components. Getting acquainted with such

   193 systems at the molecular level has primary importance, since

   194 protein-protein interaction, DNA-protein interaction, metabolic

   195 interaction, transcription factor binding, neuronal networks, and

   196 hormone signaling networks can be understood only this way.

   198 For instance, a molecular structure can be considered as a graph,

   199 whose nodes correspond to atoms and whose edges to chemical bonds. The

   200 secondary structure of a protein can also be represented as a graph,

   201 where nodes are associated with aminoacids and the edges with hydrogen

   202 bonds. The nodes are often whole molecular components and the edges

   203 represent some relationships among them.  The similarity and

   204 dissimilarity of objects corresponding to nodes are incorporated to

   205 the model by \emph{node labels}.  Many other chemical and biological

   206 structures can easily be modeled in a similar way. Understanding such

   207 networks basically requires finding specific subgraphs, which can not

   208 avoid the application of graph matching algorithms.

   210 Finally, let some of the other real-world fields related to some

   211 variants of graph matching be briefly mentioned: pattern recognition

   212 and machine vision \cite{HorstBunkeApplications}, symbol recognition

   213 \cite{CordellaVentoSymbolRecognition}, face identification

   214 \cite{JianzhuangYongFaceIdentification}.  \\

   216 Subgraph and induced subgraph matching problems are known to be

   217 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is

   218 one of the few problems in NP neither known to be in P nor

   219 NP-Complete. Although polynomial time isomorphism algorithms are known

   220 for various graph classes, like trees and planar

   221 graphs\cite{PlanarGraphIso}, bounded valence

   222 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}

   223 or permutation graphs\cite{PermGraphIso}.

   225 In the following, some algorithms based on other approaches are

   226 summarized, which do not need any restrictions on the graphs. However,

   227 an overall polynomial behaviour is not expectable from such an

   228 alternative, it may often have good performance, even on a graph class

   229 for which polynomial algorithm is known. Note that this summary

   230 containing only exact matching algorithms is far not complete, neither

   231 does it cover all the recent algorithms.

   233 The first practically usable approach was due to

   234 Ullmann\cite{Ullmann} which is a commonly used depth-first

   235 search based algorithm with a complex heuristic for reducing the

   236 number of visited states. A major problem is its $\Theta(n^3)$ space

   237 complexity, which makes it impractical in the case of big sparse

   238 graphs.

   240 In a recent paper, Ullmann\cite{UllmannBit} presents an

   241 improved version of this algorithm based on a bit-vector solution for

   242 the binary Constraint Satisfaction Problem.

   244 The Nauty algorithm\cite{Nauty} transforms the two graphs to

   245 a canonical form before starting to check for the isomorphism. It has

   246 been considered as one of the fastest graph isomorphism algorithms,

   247 although graph categories were shown in which it takes exponentially

   248 many steps. This algorithm handles only the graph isomorphism problem.

   250 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search

   251 strategy and formulates the matching as a Constraint Satisfaction

   252 Problem to prune the search tree. The constraints are that the mapping

   253 has to be injective and edge-preserving, hence it is possible to

   254 handle new matching types as well.

   256 The \textbf{RI} algorithm\cite{RI} and its variations are based on a

   257 state space representation. After reordering the nodes of the graphs,

   258 it uses some fast executable heuristic checks without using any

   259 complex pruning rules. It seems to run really efficiently on graphs

   260 coming from biology, and won the International Contest on Pattern

   261 Search in Biological Databases\cite{Content}.

   263 The currently most commonly used algorithm is the

   264 \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was

   265 designed for solving pattern matching and computer vision problems,

   266 and has been one of the best overall algorithms for more than a

   267 decade. Although, it can't be up to new specialized algorithms, it is

   268 still widely used due to its simplicity and space efficiency. VF2 uses

   269 a state space representation and checks some conditions in each state

   270 to prune the search tree.

   272 Our first graph matching algorithm was the first version of VF2 which

   273 recognizes the significance of the node ordering, more opportunities

   274 to increase the cutting efficiency and reduce its computational

   275 complexity. This project was initiated and sponsored by QuantumBio

   276 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source

   277 code --- has been published as a part of LEMON\cite{LEMON} open source

   278 graph library.

   280 This paper introduces \textbf{VF2++}, a new further improved algorithm

   281 for the graph and (induced)subgraph isomorphism problem, which uses

   282 efficient cutting rules and determines a node order in which VF2 runs

   283 significantly faster on practical inputs.

   285 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has

   286 been published. It is considered to be as efficient as the RI

   287 algorithm and has a strictly better behavior on large graphs.  The

   288 main idea of VF2 Plus is to precompute a heuristic node order of the

   289 small graph, in which the VF2 works more efficiently.

   291 \section{Problem Statement}

   292 This section provides a detailed description of the problems to be

   293 solved.

   294 \subsection{Definitions}

   296 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and

   297 $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs.

   298 \begin{definition}\label{sec:ismorphic}

   299 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M:

   300   V_{small} \longrightarrow V_{large}$ bijection, for which the

   301   following is true:

   302 \begin{center}

   303 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow

   304   (M(u),M(v))\in{E_{large}}$

   305 \end{center}

   306 \end{definition}

   307 For the sake of simplicity in this paper subgraphs and induced

   308 subgraphs are defined in a more general way than usual:

   309 \begin{definition}

   310 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I:

   311   V_{small}\longrightarrow V_{large}$ injection, for which the

   312   following is true:

   313 \begin{center}

   314 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$

   315 \end{center}

   316 \end{definition}

   318 \begin{definition}

   319 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists

   320   I: V_{small}\longrightarrow V_{large}$ injection, for which the

   321   following is true:

   322 \begin{center}

   323 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow

   324   (I(u),I(v))\in E_{large}$

   325 \end{center}

   326 \end{definition}

   328 \begin{definition}

   329 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node

   330     label function}, where K is an arbitrary set. The elements in K

   331   are the \textbf{node labels}. Two nodes, u and v are said to be

   332   \textbf{equivalent}, if $lab(u)=lab(v)$.

   333 \end{definition}

   335 When node labels are also given, the matched nodes must have the same

   336 labels.  For example, the node labeled isomorphism is phrased by

   337 \begin{definition}

   338 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label

   339     function lab} if $\exists M: V_{small} \longrightarrow V_{large}$

   340   bijection, for which the following is true:

   341 \begin{center}

   342 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow

   343   (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} :

   344   lab(u)=lab(M(u)))$

   345 \end{center}

   346 \end{definition}

   348 The other two definitions can be extended in the same way.

   350 Note that edge label function can be defined similarly to node label

   351 function, and all the definitions can be extended with additional

   352 conditions, but it is out of the scope of this work.

   354 The equivalence of two nodes is usually defined by another relation,

   355 $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the

   356 definition given above if R is an equivalence relation, which does not

   357 mean restriction in biological and chemical applications.

   359 \subsection{Common problems}\label{sec:CommProb}

   361 The focus of this paper is on two extensively studied topics, the

   362 subgraph isomorphism and its variations. However, the following

   363 problems also appear in many applications.

   365 The \textbf{subgraph matching problem} is the following: is

   366 $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node

   367 label?

   369 The \textbf{induced subgraph matching problem} asks the same about the

   370 existence of an induced subgraph.

   372 The \textbf{graph isomorphism problem} can be defined as induced

   373 subgraph matching problem where the sizes of the two graphs are equal.

   375 In addition to existence, it may be needed to show such a subgraph, or

   376 it may be necessary to list all of them.

   378 It should be noted that some authors misleadingly refer to the term

   379 \emph{subgraph isomorphism problem} as an \emph{induced subgraph

   380   isomorphism problem}.

   382 The following sections give the descriptions of VF2, VF2++, VF2 Plus

   383 and a particular comparison.

   385 \section{The VF2 Algorithm}

   386 This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2

   387 is able to handle all the variations mentioned in Section

   388   \ref{sec:CommProb}.  Although it can also handle directed graphs,

   389 for the sake of simplicity, only the undirected case will be

   390 discussed.

   393 \subsection{Common notations}

   394 \indent Assume $G_{small}$ is searched in $G_{large}$.  The following

   395 definitions and notations will be used throughout the whole paper.

   396 \begin{definition}

   397 A set $M\subseteq V_{small}\times V_{large}$ is called

   398 \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears

   399 in more than one pair in M.  That is, M uniquely associates some of

   400 the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice

   401 versa.

   402 \end{definition}

   404 \begin{definition}

   405 Mapping M \textbf{covers} a node v, if there exists a pair in M, which

   406 contains v.

   407 \end{definition}

   409 \begin{definition}

   410 A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the

   411 nodes in $V_{small}$.

   412 \end{definition}

   414 \begin{notation}

   415 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in

   416 V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in

   417 V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$.

   418 \end{notation}

   420 \begin{notation}

   421 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node

   422 exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$

   423 and $v\in V_{small}\cup V_{large}$.

   424 \end{notation}

   426 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique

   428 The definitions of the isomorphism types can be rephrased on the

   429 existence of a special whole mapping $M$, since it represents a

   430 bijection. For example

   431 \begin{center}

   432 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph

   433   isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v

   434   \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow

   435   (Pair(M,u),Pair(M,v))\in E_{large}$.

   436 \end{center}

   438 \begin{definition}

   439 A set of whole mappings is called \textbf{problem type}.

   440 \end{definition}

   441 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type

   442 which can be substituted by any problem type.

   444 A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using

   445 this notations, VF2 searches a whole mapping $W$ of type $PT$.

   447 For example the problem type of graph isomorphism problem is the

   448 following.  A whole mapping $W$ is in $\mathbf{ISO}$, iff the

   449 bijection represented by $W$ satisfies Definition~\ref{sec:ismorphic}.

   450 The subgraph- and induced subgraph matching problems can be formalized

   451 in a similar way. Let their problem types be denoted as $\mathbf{SUB}$

   452 and $\mathbf{IND}$.

   454 \begin{definition}

   455 \label{expPT}

   456 $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in

   457 PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow

   458 (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of

   459 $G_{small}$ has to be mapped to an edge of $G_{large}$ for each

   460 mapping in $PT$.

   461 \end{definition}

   463 Note that $ISO$, $SUB$ and $IND$ are expanding problem types.

   465 This paper deals with the three problem types mentioned above only,

   466 but the following generic definitions make it possible to handle other

   467 types as well.  Although it may be challenging to find a proper

   468 consistency function and an efficient cutting function.

   470 \begin{definition}

   471 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a

   472 \textbf{consistency function by } $\mathbf{PT}$, if the following

   473 holds. If there exists whole mapping $W$ of type PT for which

   474 $M\subseteq W$, then $Cons_{PT}(M)$ is true.

   475 \end{definition}

   477 \begin{definition}

   478 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a

   479 \textbf{cutting function by } $\mathbf{PT}$, if the following

   480 holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a

   481 whole mapping W of type PT.

   482 \end{definition}

   484 \begin{definition}

   485 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if

   486   $Cons_{PT}(M)$ is true.

   487 \end{definition}

   489 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.

   490 \begin{notation}

   491 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and

   492 $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where

   493 $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping.

   494 \end{notation}

   496 $Cons_{PT}$ will be used to check the consistency of the already

   497 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if

   498 no whole consistent mapping can contain the current mapping.

   500 \subsection{Overview of the algorithm}

   501 VF2 uses a state space representation of mappings, $Cons_{PT}$ for

   502 excluding inconsistency with the problem type and $Cut_{PT}$ for

   503 pruning the search tree.  Each state $s$ of the matching process can

   504 be associated with a mapping $M(s)$.

   506 Algorithm~\ref{alg:VF2Pseu} is a high level description of

   507 the VF2 matching algorithm.

   510 \begin{algorithm}

   511 \algtext*{EndIf}%ne nyomtasson end if-et

   512 \algtext*{EndFor}%ne

   513 \algtext*{EndProcedure}%ne nyomtasson ..

   514 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}

   515 \begin{algorithmic}[1]

   517 \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers

   518   $V_{small}$} \State Output($M(s)$) \Else

   520   \State Compute the set $P(s)$ of the pairs candidate for inclusion

   521   in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$

   522     $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state

   523   $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call}

   524   VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure

   525 \end{algorithmic}

   526 \end{algorithm}

   529 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it

   530 starts with an empty mapping.

   532 For each state $s$, the algorithm computes $P(s)$, the set of

   533 candidate node pairs for adding to the current state $s$.

   535 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and

   536 $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and

   537 $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup

   538 \{p\}$ is computed, and the whole process is recursively applied to

   539 $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it

   540 can be proved that $s$ can not be extended to a whole mapping.

   542 In order to make sure of the correctness, see

   543 \begin{claim}

   544 Through consistent mappings, only consistent whole mappings can be

   545 reached, and all of the whole mappings are reachable through

   546 consistent mappings.

   547 \end{claim}

   549 Note that a state may be reached in many different ways, since the

   550 order of insertions into M does not influence the nascent mapping. In

   551 fact, the number of different ways which lead to the same state can be

   552 exponentially large. If $G_{small}$ and $G_{large}$ are circles with n

   553 nodes and n different node labels, there exists exactly one graph

   554 isomorphism between them, but it will be reached in $n!$ different

   555 ways.

   557 However, one may observe

   559 \begin{claim}

   560 \label{claim:claimTotOrd}

   561 Let $\prec$ an arbitrary total ordering relation on $V_{small}$.  If

   562 the algorithm ignores each $p=(u,v) \in P(s)$, for which

   563 \begin{center}

   564 $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$,

   565 \end{center}

   566 then no state can be reached more than ones and each state associated

   567 with a whole mapping remains reachable.

   568 \end{claim}

   570 Note that the cornerstone of the improvements to VF2 is a proper

   571 choice of a total ordering.

   573 \subsection{The candidate set P(s)}

   574 \label{candidateComputingVF2}

   575 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.

   576 Suppose that $PT$ is an expanding problem type, see

   577 Definition~\ref{expPT}.

   579 \begin{notation}

   580 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by

   581 $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}}

   582 \wedge \tilde{u}$ is covered by $M(s)\}$, and

   583 \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not

   584 covered by

   585 $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}}

   586 \wedge \tilde{v}$ is covered by $M(s)\}$

   587 \end{notation}

   589 The set $P(s)$ includes the pairs of uncovered neighbours of covered

   590 nodes and if there is not such a node pair, all the pairs containing

   591 two uncovered nodes are added. Formally, let

   592 \[

   593  P(s)\!=\!

   594   \begin{cases}

   595    T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if }

   596    T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq

   597    \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s))

   598    &\hspace{-0.15cm}otherwise.

   599   \end{cases}

   600 \]

   602 \subsection{Consistency}

   603 This section defines the consistency functions for the different

   604 problem types mentioned in Section~\ref{sec:CommProb}.

   605 \begin{notation}

   606 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} :

   607 (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large}

   608   (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$

   609 \end{notation}

   610 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is

   611 a state of the matching procedure, $M(s)$ is consistent mapping by

   612 $PT$ and $lab(u)=lab(v)$.  $Cons_{PT}(p,M(s))$ checks whether

   613 including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$.

   615 \subsubsection{Induced subgraph isomorphism}

   616 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow

   617 (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small}

   618 \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline The

   619 following formulation gives an efficient way of calculating

   620 $Cons_{IND}$.

   621 \begin{claim}

   622 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v)

   623   \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge

   624   (\forall \tilde{u}\in \Gamma_{small}(u)

   625   \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a

   626   consistency function in the case of $IND$.

   627 \end{claim}

   629 \subsubsection{Graph isomorphism}

   630 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$

   631 $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by

   632 $IND$.

   633 \begin{claim}

   634 $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and

   635   only if it is a consistency function by $IND$.

   636 \end{claim}

   637 \subsubsection{Subgraph isomorphism}

   638 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow

   639 (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small}

   640 \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.

   641 \newline

   642 The following formulation gives an efficient way of calculating

   643 $Cons_{SUB}$.

   644 \begin{claim}

   645 $Cons_{SUB}((u,v),M(s)):= (\forall \tilde{u}\in \Gamma_{small}(u)

   646   \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a

   647   consistency function by $SUB$.

   648 \end{claim}

   650 \subsection{Cutting rules}

   651 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently

   652 verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can

   653 be true only if it is impossible to extended $M(s)\cup \{p\}$ to a

   654 whole mapping.

   655 \begin{notation}

   657 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash

   658 M_{small}(s))\backslash T_{small}(s)$, and

   659 \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash

   660 M_{large}(s))\backslash T_{large}(s)$.

   661 \end{notation}

   662 \subsubsection{Induced subgraph isomorphism}

   663 \begin{claim}

   664 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <

   665   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap

   666   \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap

   667   \tilde{T}_{small}(s)|$ is a cutting function by $IND$.

   668 \end{claim}

   669 \subsubsection{Graph isomorphism}

   670 Note that the cutting function of induced subgraph isomorphism defined

   671 above is a cutting function by $ISO$, too, however it is less

   672 efficient than the following while their computational complexity is

   673 the same.

   674 \begin{claim}

   675 $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq

   676   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap

   677   \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap

   678   \tilde{T}_{small}(s)|$ is a cutting function by $ISO$.

   679 \end{claim}

   681 \subsubsection{Subgraph isomorphism}

   682 \begin{claim}

   683 $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <

   684   |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by

   685   $SUB$.

   686 \end{claim}

   687 Note that there is a significant difference between induced and

   688 non-induced subgraph isomorphism:

   690 \begin{claim}

   691 \label{claimSUB}

   692 $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <

   693 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap

   694 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$

   695 is \textbf{not} a cutting function by $SUB$.

   696 \end{claim}

   698 \section{The VF2++ Algorithm}

   699 Although any total ordering relation makes the search space of VF2 a

   700 tree, its choice turns out to dramatically influence the number of

   701 visited states. The goal is to determine an efficient one as quickly

   702 as possible.

   704 The main reason for VF2++' superiority over VF2 is twofold. Firstly,

   705 taking into account the structure and the node labeling of the graph,

   706 VF2++ determines a state order in which most of the unfruitful

   707 branches of the search space can be pruned immediately. Secondly,

   708 introducing more efficient --- nevertheless still easier to compute

   709 --- cutting rules reduces the chance of going astray even further.

   711 In addition to the usual subgraph isomorphism, specialized versions

   712 for induced subgraph isomorphism and for graph isomorphism have been

   713 designed. VF2++ has gained a runtime improvement of one order of

   714 magnitude respecting induced subgraph isomorphism and a better

   715 asymptotical behaviour in the case of graph isomorphism problem.

   717 Note that a weaker version of the cutting rules and the more efficient

   718 candidate set calculating were described in \cite{VF2Plus}, too.

   720 It should be noted that all the methods described in this section are

   721 extendable to handle directed graphs and edge labels as well.

   723 The basic ideas and the detailed description of VF2++ are provided in

   724 the following.

   726 \subsection{Preparations}

   727 \begin{claim}

   728 \label{claim:claimCoverFromLeft}

   729 The total ordering relation uniquely determines a node order, in which

   730 the nodes of $V_{small}$ will be covered by VF2. From the point of

   731 view of the matching procedure, this means, that always the same node

   732 of $G_{small}$ will be covered on the d-th level.

   733 \end{claim}

   735 \begin{definition}

   736 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of

   737 $V_{small}$ is \textbf{matching order}, if exists $\prec$ total

   738 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds

   739 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$.

   740 \end{definition}

   742 \begin{claim}\label{claim:MOclaim}

   743 A total ordering is matching order, iff the nodes of every component

   744 form an interval in the node sequence, and every node connects to a

   745 previous node in its component except the first node of the

   746 component. The order of the components is arbitrary.  \\Formally

   747 spoken, an order

   748 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of

   749 $V_{small}$ is matching order $\Leftrightarrow$ $\forall

   750 G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall

   751 i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in

   752 V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq

   753 l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge

   754 (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in

   755 \{1,..,|V_{small}|\}$\newline

   756 \end{claim}

   758 To summing up, a total ordering always uniquely determines a matching

   759 order, and every matching order can be determined by a total ordering,

   760 however, more than one different total orderings may determine the

   761 same matching order.

   762 \subsection{Idea behind the algorithm}

   763 The goal is to find a matching order in which the algorithm is able to

   764 recognize inconsistency or prune the infeasible branches on the

   765 highest levels and goes deep only if it is needed.

   767 \begin{notation}

   768 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the

   769 number of neighbours of u which are in H, where $u\in V_{small} $ and

   770 $H\subseteq V_{small}$.

   771 \end{notation}

   773 The principal question is the following. Suppose a state $s$ is

   774 given. For which node of $T_{small}(s)$ is the hardest to find a

   775 consistent pair in $G_{large}$? The more covered neighbours a node in

   776 $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has

   777 ---, the more rarely satisfiable consistency constraints for its pair

   778 are given.

   780 In biology, most of the graphs are sparse, thus several nodes in

   781 $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes

   782 reasonable to define a secondary and a tertiary order between them.

   783 The observation above proves itself to be as determining, that the

   784 secondary ordering prefers nodes with the most uncovered neighbours

   785 among which have the same $Conn_{M_{small}(s)}$ to increase

   786 $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible.  The

   787 tertiary ordering prefers nodes having the rarest uncovered labels.

   789 Note that the secondary ordering is the same as the ordering by $deg$,

   790 which is a static data in front of the above used.

   792 These rules can easily result in a matching order which contains the

   793 nodes of a long path successively, whose nodes may have low $Conn$ and

   794 is easily matchable into $G_{large}$. To avoid that, a BFS order is

   795 used, which provides the shortest possible paths.

   796 \newline

   798 In the following, some examples on which the VF2 may be slow are

   799 described, although they are easily solvable by using a proper

   800 matching order.

   802 \begin{example}

   803 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways

   804 without node labels. Let $u\in V_{small}$ and $v\in V_{large}$.

   805 \newline

   806 $lab(u):=black$

   807 \newline

   808 $lab(v):=black$

   809 \newline

   810 $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash

   811 \{u\})$

   812 \newline

   813 $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash

   814 \{v\})$

   815 \newline

   817 Now, any mapping by the node label $lab$ must contain $(u,v)$, since

   818 $u$ is black and no node in $V_{large}$ has a black label except

   819 $v$. If unfortunately $u$ were the last node which will get covered,

   820 VF2 would check only in the last steps, whether $u$ can be matched to

   821 $v$.

   822 \newline

   823 However, had $u$ been the first matched node, u would have been

   824 matched immediately to v, so all the mappings would have been

   825 precluded in which node labels can not correspond.

   826 \end{example}

   828 \begin{example}

   829 Suppose there is no node label given, $G_{small}$ is a small graph and

   830 can not be mapped into $G_{large}$ and $u\in V_{small}$.

   831 \newline

   832 Let $G'_{small}:=(V_{small}\cup

   833 \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup

   834 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,

   835 $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint

   836 from $G_{small}$ and one of its starting points is connected to $u\in

   837 V_{small}\}$.

   838 \newline

   839 Is there a subgraph of $G_{large}$, which is isomorph with

   840 $G'_{small}$?

   841 \newline

   842 If unfortunately the nodes of the path were the first $k$ nodes in the

   843 matching order, the algorithm would iterate through all the possible k

   844 long paths in $G_{large}$, and it would recognize that no path can be

   845 extended to $G'_{small}$.

   846 \newline

   847 However, had it started by the matching of $G_{small}$, it would not

   848 have matched any nodes of the path.

   849 \end{example}

   851 These examples may look artificial, but the same problems also appear

   852 in real-world instances, even though in a less obvious way.

   854 \subsection{Total ordering}

   855 Instead of the total ordering relation, the matching order will be

   856 searched directly.

   857 \begin{notation}

   858 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} :

   859 l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ ,

   860 where $l$ is a label and $\mathcal{M}\subseteq V_{small}$.

   861 \end{notation}

   863 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.

   864 \end{definition}

   866 \begin{algorithm}

   867 \algtext*{EndIf}

   868 \algtext*{EndProcedure}

   869 \algtext*{EndWhile}

   870 \algtext*{EndFor}

   871 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}

   872 \begin{algorithmic}[1]

   873 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$

   874 \Comment{matching order} \While{$V_{small}\backslash \mathcal{M}

   875   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg

   876 min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash

   877 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with

   878 root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the

   879 $d$-th level \State Process $V_d$ \Comment{See Algorithm

   880   \ref{alg:VF2PPProcess1}} \EndFor

   881 \EndWhile \EndProcedure

   882 \end{algorithmic}

   883 \end{algorithm}

   885 \begin{algorithm}

   886 \algtext*{EndIf}

   887 \algtext*{EndProcedure}%ne nyomtasson ..

   888 \algtext*{EndWhile}

   889 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}

   890 \begin{algorithmic}[1]

   891 \Procedure{VF2++ProcessLevel1}{$V_{d}$} \While{$V_d\neq\emptyset$}

   892 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg

   893 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$

   894 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh

   895 $F_\mathcal{M}$ \EndWhile \EndProcedure

   896 \end{algorithmic}

   897 \end{algorithm}

   899 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the

   900 matching order procedure of VF2++. It computes a BFS tree for each

   901 component in ascending order of their rarest $lab$ and largest $deg$,

   902 whose root vertex is the component's minimal

   903 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending

   904 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately

   905 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.

   907 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}

   908 provides a matching order.

   911 \subsection{Cutting rules}

   912 \label{VF2PPCuttingRules}

   913 This section presents the cutting rules of VF2++, which are improved

   914 by using extra information coming from the node labels.

   915 \begin{notation}

   916 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l

   917 \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and

   918 $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge

   919 \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in

   920 V_{large}$ and $l$ is a label.

   921 \end{notation}

   923 \subsubsection{Induced subgraph isomorphism}

   924 \begin{claim}

   925 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND.

   926 \end{claim}

   928 \subsubsection{Graph isomorphism}

   929 \begin{claim}

   930 \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO.

   931 \end{claim}

   933 \subsubsection{Subgraph isomorphism}

   934 \begin{claim}

   935 \[LabCut_{SUB}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\] is a cutting function by SUB.

   936 \end{claim}

   940 \subsection{Implementation details}

   941 This section provides a detailed summary of an efficient

   942 implementation of VF2++.

   943 \subsubsection{Storing a mapping}

   944 After fixing an arbitrary node order ($u_0, u_1, ..,

   945 u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store

   946 the current mapping in the following way.

   947 \[

   948  M[i] =

   949   \begin{cases}

   950    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INVALID &

   951    if\ no\ node\ has\ been\ mapped\ to\ u_i.

   952   \end{cases}

   953 \]

   954 Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$

   955 means "no node".

   956 \subsubsection{Avoiding the recurrence}

   957 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized

   958 as a \textit{while loop}, which has a loop counter $depth$ denoting the

   959 all-time depth of the recursion. Fixing a matching order, let $M$

   960 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},

   961 $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before

   962 $depth$. $M[depth]$ changes

   963 while the state is being processed, but the property is held before

   964 both stepping back to a predecessor state and exploring a successor

   965 state.

   967 The necessary part of the candidate set is easily maintainable or

   968 computable by following

   969 Section~\ref{candidateComputingVF2}. A much faster method

   970 has been designed for biological- and sparse graphs, see the next

   971 section for details.

   973 \subsubsection{Calculating the candidates for a node}

   974 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the

   975 task is not to maintain the candidate set, but to generate the

   976 candidate nodes in $G_{large}$ for a given node $u\in V_{small}$.  In

   977 case of an expanding problem type and $M$ mapping, if a node $v\in

   978 V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall

   979 u'\in V_{small} : (u,u')\in

   980 E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in

   981 E_{large}$. That is, each covered neighbour of $u$ has to be mapped to

   982 a covered neighbour of $v$.

   984 Having said that, an algorithm running in $\Theta(deg)$ time is

   985 describable if there exists a covered node in the component containing

   986 $u$, and a linear one other wise.

   989 \subsubsection{Determining the node order}

   990 This section describes how the node order preprocessing method of

   991 VF2++ can efficiently be implemented.

   993 For using lookup tables, the node labels are associated with the

   994 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It

   995 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order

   996 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes

   997 in $V_{small}$ having label i, which is easy to compute in

   998 $\Theta(|V_{small}|)$ steps.

  1000 Representing $\mathcal{M}\subseteq V_{small}$ as an array of

  1001 size $|V_{small}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.

  1003 \subsubsection{Cutting rules}

  1004 In Section~\ref{VF2PPCuttingRules}, the cutting rules were

  1005 described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$

  1006 and $\tilde T_{large}$, which are dependent on the all-time mapping

  1007 (i.e. on the all-time state). The aim is to check the labeled cutting

  1008 rules of VF2++ in $\Theta(deg)$ time.

  1010 Firstly, suppose that these four sets are given in such a way, that

  1011 checking whether a node is in a certain set takes constant time,

  1012 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an

  1013 initially zero integer lookup table of size $|K|$. After incrementing

  1014 $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and

  1015 decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap

  1016 T_{large}(s)$, the first part of the cutting rules is checkable in

  1017 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$

  1018 to zero takes $\Theta(deg)$ time again, which makes it possible to use

  1019 the same table through the whole algorithm. The second part of the

  1020 cutting rules can be verified using the same method with $\tilde

  1021 T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and

  1022 $T_{large}$. Thus, the overall complexity is $\Theta(deg)$.

  1024 An other integer lookup table storing the number of covered neighbours

  1025 of each node in $G_{large}$ gives all the information about the sets

  1026 $T_{large}$ and $\tilde T_{large}$, which is maintainable in

  1027 $\Theta(deg)$ time when a pair is added or substracted by incrementing

  1028 or decrementing the proper indices. A further improvement is that the

  1029 values of $L[lab(u')]$ in case of checking $u$ is dependent only on

  1030 $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an

  1031 array of pairs (label, number of such labels) can be stored to skip

  1032 the maintaining operations. Note that these arrays are at most of size

  1033 $deg$. Skipping this trick, the number of covered neighbours has to be

  1034 stored for each node of $G_{small}$ as well to get the sets

  1035 $T_{small}$ and $\tilde T_{small}$.

  1037 Using similar tricks, the consistency function can be evaluated in

  1038 $\Theta(deg)$ steps, as well.

  1040 \section{The VF2 Plus Algorithm}

  1041 The VF2 Plus algorithm is a recently improved version of VF2. It was

  1042 compared with the state of the art algorithms in \cite{VF2Plus} and

  1043 has proven itself to be competitive with RI, the best algorithm on

  1044 biological graphs.  \\ A short summary of VF2 Plus follows, which uses

  1045 the notation and the conventions of the original paper.

  1047 \subsection{Ordering procedure}

  1048 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$

  1049 with the lowest probability to find a pair in $V_{small}$ and the

  1050 highest number of connections with the nodes already sorted by the

  1051 algorithm.

  1053 \begin{definition}

  1054 $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and

  1055   $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$.

  1056 \end{definition}

  1057 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in

  1058 $V_{large}$

  1059 \newline

  1060 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in

  1061 $V_{large}$

  1062 \newline

  1063 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of

  1064 already sorted nodes, $T$ is the set of nodes candidate to be

  1065 selected, and $degreeM$ of a node is the number of its neighbours in

  1066 $M$.

  1067 \begin{algorithm}

  1068 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne

  1069 nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..

  1070 \algtext*{EndWhile}

  1071 \caption{}\label{alg:VF2PlusPseu}

  1072 \begin{algorithmic}[1]

  1073 \Procedure{VF2 Plus order}{} \State Select the node with the lowest

  1074 $P$.  \If {more nodes share the same $P$} \State select the one with

  1075 maximum degree \EndIf \If {more nodes share the same $P$ and have the

  1076   max degree} \State select the first \EndIf \State Put the selected

  1077 node in the set $M$. \label{alg:putIn} \State Put all its unsorted

  1078 neighbours in the set $T$.  \If {$M\neq V_{small}$} \State From set

  1079 $T$ select the node with maximum $degreeM$.  \If {more nodes have

  1080   maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf

  1081 \If {more nodes have maximum $degreeM$ and $P$} \State Select the

  1082 first.  \EndIf \State \textbf{goto \ref{alg:putIn}.}  \EndIf

  1083 \EndProcedure

  1084 \end{algorithmic}

  1085 \end{algorithm}

  1087 Using these notations, Algorithm~\ref{alg:VF2PlusPseu}

  1088 provides the description of the sorting procedure.

  1090 Note that $P(u)$ is not the exact probability of finding a consistent

  1091 pair for $u$ by choosing a node of $V_{large}$ randomly, since

  1092 $P_{lab}$ and $P_{deg}$ are not independent, though calculating the

  1093 real probability would take quadratic time, which may be reduced by

  1094 using fittingly lookup tables.

  1096 \section{Experimental results}

  1097 This section compares the performance of VF2++ and VF2 Plus. Both

  1098 algorithms have run faster with orders of magnitude than VF2, thus its

  1099 inclusion was not reasonable.

  1100 \subsection{Biological graphs}

  1101 The tests have been executed on a recent biological dataset created

  1102 for the International Contest on Pattern Search in Biological

  1103 Databases\cite{Content}, which has been constructed of molecule,

  1104 protein and contact map graphs extracted from the Protein Data

  1105 Bank\cite{ProteinDataBank}.

  1107 The molecule dataset contains small graphs with less than 100 nodes

  1108 and an average degree of less than 3. The protein dataset contains

  1109 graphs having 500-10 000 nodes and an average degree of 4, while the

  1110 contact map dataset contains graphs with 150-800 nodes and an average

  1111 degree of 20.  \\

  1113 In the following, the induced subgraph isomorphism and the graph

  1114 isomorphism will be examined.

  1116 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For run time results, please see Figure~\ref{fig:bioIND}.

  1118 In an other experiment, the nodes of each graph in the database had been

  1119 shuffled, and an isomorphism between the shuffled and the original

  1120 graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}.

  1124 \begin{figure}[H]

  1125 \vspace*{-2cm}

  1126 \hspace*{-1.5cm}

  1127 \begin{subfigure}[b]{0.55\textwidth}

  1128 \begin{figure}[H]

  1129 \begin{tikzpicture}[trim axis left, trim axis right]

  1130 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

  1131 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1132   west},scaled x ticks = false,x tick label style={/pgf/number

  1133   format/1000 sep = \thinspace}]

  1134 %\addplot+[only marks] table {proteinsOrig.txt};

  1135 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark

  1136   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};

  1137 \end{axis}

  1138 \end{tikzpicture}

  1139 \caption{In the case of molecules, there is not such a significant

  1140   difference, but VF2++ seems to be faster as the number of nodes

  1141   increases.}\label{fig:ISOMolecule}

  1142 \end{figure}

  1143 \end{subfigure}

  1144 \hspace*{1.5cm}

  1145 \begin{subfigure}[b]{0.55\textwidth}

  1146 \begin{figure}[H]

  1147 \begin{tikzpicture}[trim axis left, trim axis right]

  1148 \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

  1149 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1150   west},scaled x ticks = false,x tick label style={/pgf/number

  1151   format/1000 sep = \thinspace}]

  1152 %\addplot+[only marks] table {proteinsOrig.txt};

  1153 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark

  1154   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};

  1155 \end{axis}

  1156 \end{tikzpicture}

  1157 \caption{The results are closer to each other on contact maps, but

  1158   VF2++ still performs consistently better.}\label{fig:ISOContact}

  1159 \end{figure}

  1160 \end{subfigure}

  1162 \begin{center}

  1163 \vspace*{-0.5cm}

  1164 \begin{subfigure}[b]{0.55\textwidth}

  1165 \begin{figure}[H]

  1166 \begin{tikzpicture}[trim axis left, trim axis right]

  1167 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

  1168 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1169   west},scaled x ticks = false,x tick label style={/pgf/number

  1170   format/1000 sep = \thinspace}]

  1171 %\addplot+[only marks] table {proteinsOrig.txt};

  1172 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark

  1173   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};

  1174 \end{axis}

  1175 \end{tikzpicture}

  1176 \caption{On protein graphs, VF2 Plus has a super linear time

  1177   complexity, while VF2++ runs in near constant time. The difference

  1178   is about two order of magnitude on large graphs.}\label{fig:ISOProt}

  1179 \end{figure}

  1180 \end{subfigure}

  1181 \end{center}

  1182 \vspace*{-0.6cm}

  1183 \caption{\normalsize{Graph isomomorphism on biological graphs}}\label{fig:bioISO}

  1184 \end{figure}

  1187 \begin{figure}[H]

  1188 \vspace*{-2cm}

  1189 \hspace*{-1.5cm}

  1190 \begin{subfigure}[b]{0.55\textwidth}

  1191 \begin{figure}[H]

  1192 \begin{tikzpicture}[trim axis left, trim axis right]

  1193 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

  1194 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1195   west},scaled x ticks = false,x tick label style={/pgf/number

  1196   format/1000 sep = \thinspace}]

  1197 %\addplot+[only marks] table {proteinsOrig.txt};

  1198 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark

  1199   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};

  1200 \end{axis}

  1201 \end{tikzpicture}

  1202 \caption{In the case of molecules, the algorithms have

  1203   similar behaviour, but VF2++ is almost two times faster even on such

  1204   small graphs.} \label{fig:INDMolecule}

  1205 \end{figure}

  1206 \end{subfigure}

  1207 \hspace*{1.5cm}

  1208 \begin{subfigure}[b]{0.55\textwidth}

  1209 \begin{figure}[H]

  1210 \begin{tikzpicture}[trim axis left, trim axis right]

  1211 \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

  1212 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1213   west},scaled x ticks = false,x tick label style={/pgf/number

  1214   format/1000 sep = \thinspace}]

  1215 %\addplot+[only marks] table {proteinsOrig.txt};

  1216 \addplot table {Orig/ContactMaps.128.txt};

  1217 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1218         {VF2PPLabel/ContactMaps.128.txt};

  1219 \end{axis}

  1220 \end{tikzpicture}

  1221 \caption{On contact maps, VF2++ runs in near constant time, while VF2

  1222   Plus has a near linear behaviour.} \label{fig:INDContact}

  1223 \end{figure}

  1224 \end{subfigure}

  1226 \begin{center}

  1227 \vspace*{-0.5cm}

  1228 \begin{subfigure}[b]{0.55\textwidth}

  1229 \begin{figure}[H]

  1230 \begin{tikzpicture}[trim axis left, trim axis right]

  1231   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid

  1232   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1233     west},scaled x ticks = false,x tick label style={/pgf/number

  1234     format/1000 sep = \thinspace}] %\addplot+[only marks] table

  1235     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]

  1236     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark

  1237       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};

  1238   \end{axis}

  1239   \end{tikzpicture}

  1240 \caption{Both the algorithms have linear behaviour on protein

  1241   graphs. VF2++ is more than 10 times faster than VF2

  1242   Plus.} \label{fig:INDProt}

  1243 \end{figure}

  1244 \end{subfigure}

  1245 \end{center}

  1246 \vspace*{-0.5cm}

  1247 \caption{\normalsize{Graph isomomorphism on biological graphs}}\label{fig:bioIND}

  1248 \end{figure}

  1254 \subsection{Random graphs}

  1255 This section compares VF2++ with VF2 Plus on random graphs of a large

  1256 size. The node labels are uniformly distributed.  Let $\delta$ denote

  1257 the average degree.  For the parameters of problems solved in the

  1258 experiments, please see the top of each chart.

  1259 \subsubsection{Graph isomorphism}

  1260 To evaluate the efficiency of the algorithms in the case of graph

  1261 isomorphism, connected graphs of less than 20 000 nodes have been

  1262 considered. Generating a random graph and shuffling its nodes, an

  1263 isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results

  1264 on graph sets of various density.

  1269 \begin{figure}

  1270 \vspace*{-1.5cm}

  1271 \hspace*{-1.5cm}

  1272 \begin{subfigure}[b]{0.55\textwidth}

  1273 \begin{center}

  1274 \begin{tikzpicture}

  1275 \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1276 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1277   west},scaled x ticks = false,x tick label style={/pgf/number

  1278   format/1000 sep = \space}]

  1279 %\addplot+[only marks] table {proteinsOrig.txt};

  1280 \addplot table {randGraph/iso/vf2pIso5_1.txt};

  1281 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1282         {randGraph/iso/vf2ppIso5_1.txt};

  1283 \end{axis}

  1284 \end{tikzpicture}

  1285 \end{center}

  1286 \end{subfigure}

  1287 %\hspace{1cm}

  1288 \begin{subfigure}[b]{0.55\textwidth}

  1289 \begin{center}

  1290 \begin{tikzpicture}

  1291 \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1292 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1293   west},scaled x ticks = false,x tick label style={/pgf/number

  1294   format/1000 sep = \space}]

  1295 %\addplot+[only marks] table {proteinsOrig.txt};

  1296 \addplot table {randGraph/iso/vf2pIso10_1.txt};

  1297 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1298         {randGraph/iso/vf2ppIso10_1.txt};

  1299 \end{axis}

  1300 \end{tikzpicture}

  1301 \end{center}

  1302 \end{subfigure}

  1303 %%\hspace{1cm}

  1304 \hspace*{-1.5cm}

  1305 \begin{subfigure}[b]{0.55\textwidth}

  1306 \begin{center}

  1307 \begin{tikzpicture}

  1308 \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1309 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1310   west},scaled x ticks = false,x tick label style={/pgf/number

  1311   format/1000 sep = \space}]

  1312 %\addplot+[only marks] table {proteinsOrig.txt};

  1313 \addplot table {randGraph/iso/vf2pIso15_1.txt};

  1314 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1315         {randGraph/iso/vf2ppIso15_1.txt};

  1316 \end{axis}

  1317 \end{tikzpicture}

  1318 \end{center}

  1319      \end{subfigure}

  1320      \begin{subfigure}[b]{0.55\textwidth}

  1321 \begin{center}

  1322 \begin{tikzpicture}

  1323 \begin{axis}[title={Random ISO, $\delta = 35$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1324 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1325   west},scaled x ticks = false,x tick label style={/pgf/number

  1326   format/1000 sep = \space}]

  1327 %\addplot+[only marks] table {proteinsOrig.txt};

  1328 \addplot table {randGraph/iso/vf2pIso35_1.txt};

  1329 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1330         {randGraph/iso/vf2ppIso35_1.txt};

  1331 \end{axis}

  1332 \end{tikzpicture}

  1333 \end{center}

  1334 \end{subfigure}

  1335 \begin{subfigure}[b]{0.55\textwidth}

  1336 \hspace*{-1.5cm}

  1337 \begin{tikzpicture}

  1338 \begin{axis}[title={Random ISO, $\delta = 45$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1339 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1340   west},scaled x ticks = false,x tick label style={/pgf/number

  1341   format/1000 sep = \space}]

  1342 %\addplot+[only marks] table {proteinsOrig.txt};

  1343 \addplot table {randGraph/iso/vf2pIso45_1.txt};

  1344 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1345         {randGraph/iso/vf2ppIso45_1.txt};

  1346 \end{axis}

  1347 \end{tikzpicture}

  1348 \end{subfigure}

  1349 \hspace*{-1.5cm}

  1350 \begin{subfigure}[b]{0.55\textwidth}

  1351 \begin{tikzpicture}

  1352 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1353 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1354   west},scaled x ticks = false,x tick label style={/pgf/number

  1355   format/1000 sep = \thinspace}]

  1356 %\addplot+[only marks] table {proteinsOrig.txt};

  1357 \addplot table {randGraph/iso/vf2pIso100_1.txt};

  1358 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1359         {randGraph/iso/vf2ppIso100_1.txt};

  1360 \end{axis}

  1361 \end{tikzpicture}

  1362 \end{subfigure}

  1363 \vspace*{-0.8cm}

  1364 \caption{IND on graphs having an average degree of

  1365   5.}\label{fig:randISO}

  1366 \end{figure}

  1377 Considering the graph isomorphism problem, VF2++ consistently

  1378 outperforms its rival especially on sparse graphs. The reason for the

  1379 slightly super linear behaviour of VF2++ on denser graphs is the

  1380 larger number of nodes in the BFS tree constructed in

  1381 Algorithm~\ref{alg:VF2PPPseu}.

  1383 \subsubsection{Induced subgraph isomorphism}

  1384 This section provides a comparison of VF2++ and VF2 Plus in the case

  1385 of induced subgraph isomorphism. In addition to the size of the large

  1386 graph, that of the small graph dramatically influences the hardness of

  1387 a given problem too, so the overall picture is provided by examining

  1388 small graphs of various size.

  1390 For each chart, a number $0<\rho< 1$ has been fixed and the following

  1391 has been executed 150 times. Generating a large graph $G_{large}$,

  1392 choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes,

  1393 and for all the 10 subgraphs find a mapping by using both the graph

  1394 matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,

  1395 0.3, 0.6, 0.8, 0.95$ cases have been examined, see

  1396 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and

  1397 \ref{fig:randIND35}.

  1403 \begin{figure}

  1404 \vspace*{-1.5cm}

  1405 \hspace*{-1.5cm}

  1406 \begin{subfigure}[b]{0.55\textwidth}

  1407 \begin{center}

  1408 \begin{tikzpicture}

  1409 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1410 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1411   west},scaled x ticks = false,x tick label style={/pgf/number

  1412   format/1000 sep = \space}]

  1413 %\addplot+[only marks] table {proteinsOrig.txt};

  1414 \addplot table {randGraph/ind/vf2pInd5_0.05.txt};

  1415 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1416         {randGraph/ind/vf2ppInd5_0.05.txt};

  1417 \end{axis}

  1418 \end{tikzpicture}

  1419 \end{center}

  1420      \end{subfigure}

  1421      \begin{subfigure}[b]{0.55\textwidth}

  1422 \begin{center}

  1423 \begin{tikzpicture}

  1424 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1425 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1426   west},scaled x ticks = false,x tick label style={/pgf/number

  1427   format/1000 sep = \space}]

  1428 %\addplot+[only marks] table {proteinsOrig.txt};

  1429 \addplot table {randGraph/ind/vf2pInd5_0.1.txt};

  1430 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1431         {randGraph/ind/vf2ppInd5_0.1.txt};

  1432 \end{axis}

  1433 \end{tikzpicture}

  1434 \end{center}

  1435 \end{subfigure}

  1436 \hspace*{-1.5cm}

  1437 \begin{subfigure}[b]{0.55\textwidth}

  1438 \begin{center}

  1439 \begin{tikzpicture}

  1440 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1441 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1442   west},scaled x ticks = false,x tick label style={/pgf/number

  1443   format/1000 sep = \space}]

  1444 %\addplot+[only marks] table {proteinsOrig.txt};

  1445 \addplot table {randGraph/ind/vf2pInd5_0.3.txt};

  1446 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1447         {randGraph/ind/vf2ppInd5_0.3.txt};

  1448 \end{axis}

  1449 \end{tikzpicture}

  1450 \end{center}

  1451      \end{subfigure}

  1452      \begin{subfigure}[b]{0.55\textwidth}

  1453 \begin{center}

  1454 \begin{tikzpicture}

  1455 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1456 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1457   west},scaled x ticks = false,x tick label style={/pgf/number

  1458   format/1000 sep = \space}]

  1459 %\addplot+[only marks] table {proteinsOrig.txt};

  1460 \addplot table {randGraph/ind/vf2pInd5_0.6.txt};

  1461 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1462         {randGraph/ind/vf2ppInd5_0.6.txt};

  1463 \end{axis}

  1464 \end{tikzpicture}

  1465 \end{center}

  1466 \end{subfigure}

  1467 \begin{subfigure}[b]{0.55\textwidth}

  1468 \hspace*{-1.5cm}

  1469 \begin{tikzpicture}

  1470 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1471 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1472   west},scaled x ticks = false,x tick label style={/pgf/number

  1473   format/1000 sep = \space}]

  1474 %\addplot+[only marks] table {proteinsOrig.txt};

  1475 \addplot table {randGraph/ind/vf2pInd5_0.8.txt};

  1476 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1477         {randGraph/ind/vf2ppInd5_0.8.txt};

  1478 \end{axis}

  1479 \end{tikzpicture}

  1480      \end{subfigure}

  1481      \hspace*{-1.5cm}

  1482      \begin{subfigure}[b]{0.55\textwidth}

  1483 \begin{tikzpicture}

  1484 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1485 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1486   west},scaled x ticks = false,x tick label style={/pgf/number

  1487   format/1000 sep = \thinspace}]

  1488 %\addplot+[only marks] table {proteinsOrig.txt};

  1489 \addplot table {randGraph/ind/vf2pInd5_0.95.txt};

  1490 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1491         {randGraph/ind/vf2ppInd5_0.95.txt};

  1492 \end{axis}

  1493 \end{tikzpicture}

  1494 \end{subfigure}

  1495 \vspace*{-0.8cm}

  1496 \caption{IND on graphs having an average degree of

  1497   5.}\label{fig:randIND5}

  1498 \end{figure}

  1501 \begin{figure}[H]

  1502 \vspace*{-1.5cm}

  1503 \hspace*{-1.5cm}

  1504 \begin{subfigure}[b]{0.55\textwidth}

  1505 \begin{center}

  1506 \hspace*{-0.5cm}

  1507 \begin{tikzpicture}

  1508 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1509 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1510   west},scaled x ticks = false,x tick label style={/pgf/number

  1511   format/1000 sep = \space}]

  1512 %\addplot+[only marks] table {proteinsOrig.txt};

  1513 \addplot table {randGraph/ind/vf2pInd10_0.05.txt};

  1514 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1515         {randGraph/ind/vf2ppInd10_0.05.txt};

  1516 \end{axis}

  1517 \end{tikzpicture}

  1518 \end{center}

  1519      \end{subfigure}

  1520      \begin{subfigure}[b]{0.55\textwidth}

  1521 \begin{center}

  1522      \hspace*{-0.5cm}

  1523 \begin{tikzpicture}

  1524 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1525 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1526   west},scaled x ticks = false,x tick label style={/pgf/number

  1527   format/1000 sep = \space}]

  1528 %\addplot+[only marks] table {proteinsOrig.txt};

  1529 \addplot table {randGraph/ind/vf2pInd10_0.1.txt};

  1530 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1531         {randGraph/ind/vf2ppInd10_0.1.txt};

  1532 \end{axis}

  1533 \end{tikzpicture}

  1534 \end{center}

  1535 \end{subfigure}

  1536 \hspace*{-1.5cm}

  1537 \begin{subfigure}[b]{0.55\textwidth}

  1538 \begin{center}

  1539 \begin{tikzpicture}

  1540 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1541 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1542   west},scaled x ticks = false,x tick label style={/pgf/number

  1543   format/1000 sep = \space}]

  1544 %\addplot+[only marks] table {proteinsOrig.txt};

  1545 \addplot table {randGraph/ind/vf2pInd10_0.3.txt};

  1546 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1547         {randGraph/ind/vf2ppInd10_0.3.txt};

  1548 \end{axis}

  1549 \end{tikzpicture}

  1550 \end{center}

  1551      \end{subfigure}

  1552      \begin{subfigure}[b]{0.55\textwidth}

  1553 \begin{center}

  1554 \begin{tikzpicture}

  1555 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1556 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1557   west},scaled x ticks = false,x tick label style={/pgf/number

  1558   format/1000 sep = \space}]

  1559 %\addplot+[only marks] table {proteinsOrig.txt};

  1560 \addplot table {randGraph/ind/vf2pInd10_0.6.txt};

  1561 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1562         {randGraph/ind/vf2ppInd10_0.6.txt};

  1563 \end{axis}

  1564 \end{tikzpicture}

  1565 \end{center}

  1566 \end{subfigure}

  1567 \hspace*{-1.5cm}

  1568 \begin{subfigure}[b]{0.55\textwidth}

  1569 \begin{tikzpicture}

  1570 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1571 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1572   west},scaled x ticks = false,x tick label style={/pgf/number

  1573   format/1000 sep = \space}]

  1574 %\addplot+[only marks] table {proteinsOrig.txt};

  1575 \addplot table {randGraph/ind/vf2pInd10_0.8.txt};

  1576 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1577         {randGraph/ind/vf2ppInd10_0.8.txt};

  1578 \end{axis}

  1579 \end{tikzpicture}

  1580      \end{subfigure}

  1581      \begin{subfigure}[b]{0.55\textwidth}

  1582 \begin{tikzpicture}

  1583 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1584 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1585   west},scaled x ticks = false,x tick label style={/pgf/number

  1586   format/1000 sep = \thinspace}]

  1587 %\addplot+[only marks] table {proteinsOrig.txt};

  1588 \addplot table {randGraph/ind/vf2pInd10_0.95.txt};

  1589 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1590         {randGraph/ind/vf2ppInd10_0.95.txt};

  1591 \end{axis}

  1592 \end{tikzpicture}

  1593 \end{subfigure}

  1594 \vspace*{-0.8cm}

  1595 \caption{IND on graphs having an average degree of

  1596   10.}\label{fig:randIND10}

  1597 \end{figure}

  1601 \begin{figure}[H]

  1602 \vspace*{-1.5cm}

  1603 \hspace*{-1.5cm}

  1604 \begin{subfigure}[b]{0.55\textwidth}

  1605 \begin{center}

  1606 \begin{tikzpicture}

  1607 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1608 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1609   west},scaled x ticks = false,x tick label style={/pgf/number

  1610   format/1000 sep = \space}]

  1611 %\addplot+[only marks] table {proteinsOrig.txt};

  1612 \addplot table {randGraph/ind/vf2pInd35_0.05.txt};

  1613 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1614         {randGraph/ind/vf2ppInd35_0.05.txt};

  1615 \end{axis}

  1616 \end{tikzpicture}

  1617 \end{center}

  1618      \end{subfigure}

  1619      \begin{subfigure}[b]{0.55\textwidth}

  1620 \begin{center}

  1621 \begin{tikzpicture}

  1622 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1623 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1624   west},scaled x ticks = false,x tick label style={/pgf/number

  1625   format/1000 sep = \space}]

  1626 %\addplot+[only marks] table {proteinsOrig.txt};

  1627 \addplot table {randGraph/ind/vf2pInd35_0.1.txt};

  1628 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1629         {randGraph/ind/vf2ppInd35_0.1.txt};

  1630 \end{axis}

  1631 \end{tikzpicture}

  1632 \end{center}

  1633 \end{subfigure}

  1634 \hspace*{-1.5cm}

  1635 \begin{subfigure}[b]{0.55\textwidth}

  1636 \begin{center}

  1637 \begin{tikzpicture}

  1638 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1639 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1640   west},scaled x ticks = false,x tick label style={/pgf/number

  1641   format/1000 sep = \space}]

  1642 %\addplot+[only marks] table {proteinsOrig.txt};

  1643 \addplot table {randGraph/ind/vf2pInd35_0.3.txt};

  1644 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1645         {randGraph/ind/vf2ppInd35_0.3.txt};

  1646 \end{axis}

  1647 \end{tikzpicture}

  1648 \end{center}

  1649      \end{subfigure}

  1650      \begin{subfigure}[b]{0.55\textwidth}

  1651 \begin{center}

  1652 \begin{tikzpicture}

  1653 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1654 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1655   west},scaled x ticks = false,x tick label style={/pgf/number

  1656   format/1000 sep = \space}]

  1657 %\addplot+[only marks] table {proteinsOrig.txt};

  1658 \addplot table {randGraph/ind/vf2pInd35_0.6.txt};

  1659 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1660         {randGraph/ind/vf2ppInd35_0.6.txt};

  1661 \end{axis}

  1662 \end{tikzpicture}

  1663 \end{center}

  1664 \end{subfigure}

  1665 \hspace*{-1.5cm}

  1666 \begin{subfigure}[b]{0.55\textwidth}

  1667 \begin{tikzpicture}

  1668 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1669 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1670   west},scaled x ticks = false,x tick label style={/pgf/number

  1671   format/1000 sep = \space}]

  1672 %\addplot+[only marks] table {proteinsOrig.txt};

  1673 \addplot table {randGraph/ind/vf2pInd35_0.8.txt};

  1674 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1675         {randGraph/ind/vf2ppInd35_0.8.txt};

  1676 \end{axis}

  1677 \end{tikzpicture}

  1678      \end{subfigure}

  1679      \begin{subfigure}[b]{0.55\textwidth}

  1680 \begin{tikzpicture}

  1681 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid

  1682 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north

  1683   west},scaled x ticks = false,x tick label style={/pgf/number

  1684   format/1000 sep = \thinspace}]

  1685 %\addplot+[only marks] table {proteinsOrig.txt};

  1686 \addplot table {randGraph/ind/vf2pInd35_0.95.txt};

  1687 \addplot[mark=triangle*,mark size=1.8pt,color=red] table

  1688         {randGraph/ind/vf2ppInd35_0.95.txt};

  1689 \end{axis}

  1690 \end{tikzpicture}

  1691 \end{subfigure}

  1692 \vspace*{-0.8cm}

  1693 \caption{IND on graphs having an average degree of

  1694   35.}\label{fig:randIND35}

  1695 \end{figure}

  1698 Based on these experiments, VF2++ is faster than VF2 Plus and able to

  1699 handle really large graphs in milliseconds. Note that when $IND$ was

  1700 considered and the small graphs had proportionally few nodes ($\rho =

  1701 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node

  1702 orders (e.g. see the $\delta=10$ case on

  1703 Figure~\ref{fig:randIND10}). If these examples had been excluded, the

  1704 charts would have seemed to be similar to the other ones.

  1705 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2

  1706 Plus slow slightly down, but remain practically usable even on graphs

  1707 having 10 000 nodes.

  1713 \section{Conclusion}

  1714 In this paper, after providing a short summary of the recent

  1715 algorithms, a new graph matching algorithm based on VF2, called VF2++,

  1716 has been presented and analyzed from a practical viewpoint.

  1718 Recognizing the importance of the node order and determining an

  1719 efficient one, VF2++ is able to match graphs of thousands of nodes in

  1720 near practically linear time including preprocessing. In addition to

  1721 the proper order, VF2++ uses more efficient consistency and cutting

  1722 rules which are easy to compute and make the algorithm able to prune

  1723 most of the unfruitful branches without going astray.

  1725 In order to show the efficiency of the new method, it has been

  1726 compared to VF2 Plus, which is the best concurrent algorithm based on

  1727 \cite{VF2Plus}.

  1729 The experiments show that VF2++ consistently outperforms VF2 Plus on

  1730 biological graphs. It seems to be asymptotically faster on protein and

  1731 on contact map graphs in the case of induced subgraph isomorphism,

  1732 while in the case of graph isomorphism, it has definitely better

  1733 asymptotic behaviour on protein graphs.

  1735 Regarding random sparse graphs, not only has VF2++ proved itself to be

  1736 faster than VF2 Plus, but it has a practically linear behaviour both

  1737 in the case of induced subgraph- and graph isomorphism, as well.

  1741 %% The Appendices part is started with the command \appendix;

  1742 %% appendix sections are then done as normal sections

  1743 %% \appendix

  1745 %% \section{}

  1746 %% \label{}

  1748 %% If you have bibdatabase file and want bibtex to generate the

  1749 %% bibitems, please use

  1750 %%

  1751 \bibliographystyle{elsarticle-num} \bibliography{bibliography}

  1753 %% else use the following coding to input the bibitems directly in the

  1754 %% TeX file.

  1756 %% \begin{thebibliography}{00}

  1758 %% %% \bibitem{label}

  1759 %% %% Text of bibliographic item

  1761 %% \bibitem{}

  1763 %% \end{thebibliography}

  1765 \end{document}

  1766 \endinput

  1767 %%

  1768 %% End of file `elsarticle-template-num.tex'.

author	Madarasi Peter
	Wed, 23 Nov 2016 21:45:11 +0100
changeset 13	a21760ed63d6
parent 12	d35847f14178
child 14	b45bac511108
permissions	-rw-r--r--