alpar@0: %% 
alpar@0: %% Copyright 2007, 2008, 2009 Elsevier Ltd
alpar@0: %% 
alpar@0: %% This file is part of the 'Elsarticle Bundle'.
alpar@0: %% ---------------------------------------------
alpar@0: %% 
alpar@0: %% It may be distributed under the conditions of the LaTeX Project Public
alpar@0: %% License, either version 1.2 of this license or (at your option) any
alpar@0: %% later version.  The latest version of this license is in
alpar@0: %%    http://www.latex-project.org/lppl.txt
alpar@0: %% and version 1.2 or later is part of all distributions of LaTeX
alpar@0: %% version 1999/12/01 or later.
alpar@0: %% 
alpar@0: %% The list of all files belonging to the 'Elsarticle Bundle' is
alpar@0: %% given in the file `manifest.txt'.
alpar@0: %% 
alpar@0: 
alpar@0: %% Template article for Elsevier's document class `elsarticle'
alpar@0: %% with numbered style bibliographic references
alpar@0: %% SP 2008/03/01
alpar@0: 
alpar@0: \documentclass[preprint,12pt]{elsarticle}
alpar@0: 
alpar@0: %% Use the option review to obtain double line spacing
alpar@0: %% \documentclass[authoryear,preprint,review,12pt]{elsarticle}
alpar@0: 
alpar@0: %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
alpar@0: %% for a journal layout:
alpar@0: %% \documentclass[final,1p,times]{elsarticle}
alpar@0: %% \documentclass[final,1p,times,twocolumn]{elsarticle}
alpar@0: %% \documentclass[final,3p,times]{elsarticle}
alpar@0: %% \documentclass[final,3p,times,twocolumn]{elsarticle}
alpar@0: %% \documentclass[final,5p,times]{elsarticle}
alpar@0: %% \documentclass[final,5p,times,twocolumn]{elsarticle}
alpar@0: 
alpar@0: %% For including figures, graphicx.sty has been loaded in
alpar@0: %% elsarticle.cls. If you prefer to use the old commands
alpar@0: %% please give \usepackage{epsfig}
alpar@0: 
alpar@0: %% The amssymb package provides various useful mathematical symbols
alpar@0: \usepackage{amssymb}
alpar@0: %% The amsthm package provides extended theorem environments
alpar@0: %% \usepackage{amsthm}
alpar@0: 
alpar@0: %% The lineno packages adds line numbers. Start line numbering with
alpar@0: %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
alpar@0: %% for the whole article with \linenumbers.
alpar@0: %% \usepackage{lineno}
alpar@0: 
alpar@2: \usepackage{amsmath}
alpar@2: %% \usepackage[pdftex]{graphicx}
alpar@2: 
alpar@2: \usepackage{pgfplots}
alpar@2: \pgfplotsset{width=9cm}
alpar@2: \pgfplotsset{compat=1.8}
alpar@2: 
alpar@2: \usepackage{caption}
alpar@2: \usepackage{subcaption} 
alpar@2: 
alpar@2: \usepackage{algorithm}
alpar@2: \usepackage{algpseudocode}
alpar@2: \usepackage{tikz}
alpar@2: 
alpar@2: \usepackage{amsthm,amssymb}
alpar@2: \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}}
alpar@2: 
alpar@2: \newtheorem{theorem}{Theorem}[subsection]
alpar@2: \newtheorem{corollary}{Corollary}[theorem]
alpar@2: \newtheorem{claim}[theorem]{Claim}
alpar@2: 
alpar@2: \newtheorem{definition}{Definition}[subsection]
alpar@2: \newtheorem{notation}{Notation}[subsection]
alpar@2: \newtheorem{example}{Example}[subsection]
alpar@2: \usetikzlibrary{decorations.markings}
alpar@2: \let\oldproofname=\proofname
alpar@2: %% \renewcommand{\proofname}{\rm\bf{Proof:}}
alpar@2: 
Madarasi@7: \captionsetup{font=normalsize}
Madarasi@7: 
alpar@1: \journal{Discrete Applied Mathematics}
alpar@0: 
alpar@0: \begin{document}
alpar@0: 
alpar@0: \begin{frontmatter}
alpar@0: 
alpar@0: %% Title, authors and addresses
alpar@0: 
alpar@0: %% use the tnoteref command within \title for footnotes;
alpar@0: %% use the tnotetext command for theassociated footnote;
alpar@0: %% use the fnref command within \author or \address for footnotes;
alpar@0: %% use the fntext command for theassociated footnote;
alpar@0: %% use the corref command within \author for corresponding author footnotes;
alpar@0: %% use the cortext command for theassociated footnote;
alpar@0: %% use the ead command for the email address,
alpar@0: %% and the form \ead[url] for the home page:
alpar@0: %% \title{Title\tnoteref{label1}}
alpar@0: %% \tnotetext[label1]{}
alpar@0: %% \author{Name\corref{cor1}\fnref{label2}}
alpar@0: %% \ead{email address}
alpar@0: %% \ead[url]{home page}
alpar@0: %% \fntext[label2]{}
alpar@0: %% \cortext[cor1]{}
alpar@0: %% \address{Address\fnref{label3}}
alpar@0: %% \fntext[label3]{}
alpar@0: 
alpar@1: \title{Improved Algorithms for Matching Biological Graphs}
alpar@0: 
alpar@0: %% use optional labels to link authors explicitly to addresses:
alpar@0: %% \author[label1,label2]{}
alpar@0: %% \address[label1]{}
alpar@0: %% \address[label2]{}
alpar@0: 
alpar@1: \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi}
alpar@0: 
alpar@1: \address{Dept of Operations Research, ELTE}
alpar@0: 
alpar@0: \begin{abstract}
alpar@1: Subgraph isomorphism is a well-known NP-Complete problem, while its
alpar@1: special case, the graph isomorphism problem is one of the few problems
alpar@1: in NP neither known to be in P nor NP-Complete. Their appearance in
alpar@1: many fields of application such as pattern analysis, computer vision
alpar@1: questions and the analysis of chemical and biological systems has
alpar@1: fostered the design of various algorithms for handling special graph
alpar@1: structures.
alpar@0: 
alpar@1: The idea of using state space representation and checking some
alpar@1: conditions in each state to prune the search tree has made the VF2
alpar@1: algorithm one of the state of the art graph matching algorithms for
alpar@1: more than a decade. Recently, biological questions of ever increasing
alpar@1: importance have required more efficient, specialized algorithms.
alpar@1: 
alpar@1: This paper presents VF2++, a new algorithm based on the original VF2,
alpar@1: which runs significantly faster on most test cases and performs
alpar@1: especially well on special graph classes stemming from biological
alpar@1: questions. VF2++ handles graphs of thousands of nodes in practically
alpar@1: near linear time including preprocessing. Not only is it an improved
alpar@1: version of VF2, but in fact, it is by far the fastest existing
alpar@1: algorithm regarding biological graphs.
alpar@1: 
alpar@1: The reason for VF2++' superiority over VF2 is twofold. Firstly, taking
alpar@1: into account the structure and the node labeling of the graph, VF2++
alpar@1: determines a state order in which most of the unfruitful branches of
alpar@1: the search space can be pruned immediately. Secondly, introducing more
alpar@1: efficient - nevertheless still easier to compute - cutting rules
alpar@1: reduces the chance of going astray even further.
alpar@1: 
alpar@1: In addition to the usual subgraph isomorphism, specialized versions
alpar@1: for induced subgraph isomorphism and for graph isomorphism are
alpar@1: presented. VF2++ has gained a runtime improvement of one order of
alpar@1: magnitude respecting induced subgraph isomorphism and a better
alpar@1: asymptotical behaviour in the case of graph isomorphism problem.
alpar@1: 
alpar@1: After having provided the description of VF2++, in order to evaluate
alpar@1: its effectiveness, an extensive comparison to the contemporary other
alpar@1: algorithms is shown, using a wide range of inputs, including both real
alpar@1: life biological and chemical datasets and standard randomly generated
alpar@1: graph series.
alpar@1: 
alpar@1: The work was motivated and sponsored by QuantumBio Inc., and all the
alpar@1: developed algorithms are available as the part of the open source
alpar@1: LEMON graph and network optimization library
alpar@1: (http://lemon.cs.elte.hu).
alpar@0: \end{abstract}
alpar@0: 
alpar@0: \begin{keyword}
alpar@0: %% keywords here, in the form: keyword \sep keyword
alpar@0: 
alpar@0: %% PACS codes here, in the form: \PACS code \sep code
alpar@0: 
alpar@0: %% MSC codes here, in the form: \MSC code \sep code
alpar@0: %% or \MSC[2008] code \sep code (2000 is the default)
alpar@0: 
alpar@0: \end{keyword}
alpar@0: 
alpar@0: \end{frontmatter}
alpar@0: 
alpar@0: %% \linenumbers
alpar@0: 
alpar@0: %% main text
alpar@2: \section{Introduction}
alpar@2: \label{sec:intro}
alpar@2: 
alpar@3: In the last decades, combinatorial structures, and especially graphs
alpar@3: have been considered with ever increasing interest, and applied to the
alpar@3: solution of several new and revised questions.  The expressiveness,
alpar@3: the simplicity and the studiedness of graphs make them practical for
alpar@3: modelling and appear constantly in several seemingly independent
alpar@3: fields.  Bioinformatics and chemistry are amongst the most relevant
alpar@3: and most important fields.
alpar@2: 
alpar@3: Complex biological systems arise from the interaction and cooperation
alpar@3: of plenty of molecular components. Getting acquainted with such
alpar@3: systems at the molecular level has primary importance, since
alpar@3: protein-protein interaction, DNA-protein interaction, metabolic
alpar@3: interaction, transcription factor binding, neuronal networks, and
alpar@3: hormone signaling networks can be understood only this way.
alpar@2: 
alpar@3: For instance, a molecular structure can be considered as a graph,
alpar@3: whose nodes correspond to atoms and whose edges to chemical bonds. The
alpar@3: secondary structure of a protein can also be represented as a graph,
alpar@3: where nodes are associated with aminoacids and the edges with hydrogen
alpar@3: bonds. The nodes are often whole molecular components and the edges
alpar@3: represent some relationships among them.  The similarity and
alpar@3: dissimilarity of objects corresponding to nodes are incorporated to
alpar@3: the model by \emph{node labels}.  Many other chemical and biological
alpar@3: structures can easily be modeled in a similar way. Understanding such
alpar@3: networks basically requires finding specific subgraphs, which can not
alpar@3: avoid the application of graph matching algorithms.
alpar@2: 
alpar@3: Finally, let some of the other real-world fields related to some
alpar@3: variants of graph matching be briefly mentioned: pattern recognition
alpar@3: and machine vision \cite{HorstBunkeApplications}, symbol recognition
alpar@3: \cite{CordellaVentoSymbolRecognition}, face identification
alpar@3: \cite{JianzhuangYongFaceIdentification}.  \\
alpar@2: 
alpar@3: Subgraph and induced subgraph matching problems are known to be
alpar@3: NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is
alpar@3: one of the few problems in NP neither known to be in P nor
alpar@3: NP-Complete. Although polynomial time isomorphism algorithms are known
alpar@3: for various graph classes, like trees and planar
alpar@3: graphs\cite{PlanarGraphIso}, bounded valence
alpar@3: graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso}
alpar@3: or permutation graphs\cite{PermGraphIso}.
alpar@2: 
alpar@3: In the following, some algorithms based on other approaches are
alpar@3: summarized, which do not need any restrictions on the graphs. However,
alpar@3: an overall polynomial behaviour is not expectable from such an
alpar@3: alternative, it may often have good performance, even on a graph class
alpar@3: for which polynomial algorithm is known. Note that this summary
alpar@3: containing only exact matching algorithms is far not complete, neither
alpar@3: does it cover all the recent algorithms.
alpar@2: 
alpar@3: The first practically usable approach was due to
alpar@4: Ullmann\cite{Ullmann} which is a commonly used depth-first
alpar@3: search based algorithm with a complex heuristic for reducing the
alpar@3: number of visited states. A major problem is its $\Theta(n^3)$ space
alpar@3: complexity, which makes it impractical in the case of big sparse
alpar@3: graphs.
alpar@2: 
alpar@4: In a recent paper, Ullmann\cite{UllmannBit} presents an
alpar@3: improved version of this algorithm based on a bit-vector solution for
alpar@3: the binary Constraint Satisfaction Problem.
alpar@2: 
alpar@4: The Nauty algorithm\cite{Nauty} transforms the two graphs to
alpar@3: a canonical form before starting to check for the isomorphism. It has
alpar@3: been considered as one of the fastest graph isomorphism algorithms,
alpar@3: although graph categories were shown in which it takes exponentially
alpar@3: many steps. This algorithm handles only the graph isomorphism problem.
alpar@2: 
alpar@4: The \emph{LAD} algorithm\cite{Lad} uses a depth-first search
alpar@3: strategy and formulates the matching as a Constraint Satisfaction
alpar@3: Problem to prune the search tree. The constraints are that the mapping
alpar@3: has to be injective and edge-preserving, hence it is possible to
alpar@3: handle new matching types as well.
alpar@2: 
alpar@3: The \textbf{RI} algorithm\cite{RI} and its variations are based on a
alpar@3: state space representation. After reordering the nodes of the graphs,
alpar@3: it uses some fast executable heuristic checks without using any
alpar@3: complex pruning rules. It seems to run really efficiently on graphs
alpar@3: coming from biology, and won the International Contest on Pattern
alpar@3: Search in Biological Databases\cite{Content}.
alpar@2: 
alpar@3: The currently most commonly used algorithm is the
alpar@3: \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was
alpar@3: designed for solving pattern matching and computer vision problems,
alpar@3: and has been one of the best overall algorithms for more than a
alpar@3: decade. Although, it can't be up to new specialized algorithms, it is
alpar@3: still widely used due to its simplicity and space efficiency. VF2 uses
alpar@3: a state space representation and checks some conditions in each state
alpar@3: to prune the search tree.
alpar@2: 
alpar@3: Our first graph matching algorithm was the first version of VF2 which
alpar@3: recognizes the significance of the node ordering, more opportunities
alpar@3: to increase the cutting efficiency and reduce its computational
alpar@3: complexity. This project was initiated and sponsored by QuantumBio
alpar@3: Inc.\cite{QUANTUMBIO} and the implementation --- along with a source
alpar@3: code --- has been published as a part of LEMON\cite{LEMON} open source
alpar@3: graph library.
alpar@2: 
alpar@3: This paper introduces \textbf{VF2++}, a new further improved algorithm
alpar@3: for the graph and (induced)subgraph isomorphism problem, which uses
alpar@3: efficient cutting rules and determines a node order in which VF2 runs
alpar@3: significantly faster on practical inputs.
alpar@2: 
alpar@3: Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has
alpar@3: been published. It is considered to be as efficient as the RI
alpar@3: algorithm and has a strictly better behavior on large graphs.  The
alpar@3: main idea of VF2 Plus is to precompute a heuristic node order of the
alpar@3: small graph, in which the VF2 works more efficiently.
alpar@2: 
alpar@2: \section{Problem Statement}
alpar@3: This section provides a detailed description of the problems to be
alpar@3: solved.
alpar@2: \subsection{Definitions}
alpar@2: 
alpar@3: Throughout the paper $G_{small}=(V_{small}, E_{small})$ and
alpar@3: $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs.
alpar@2: \begin{definition}\label{sec:ismorphic}
alpar@3: $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M:
alpar@3:   V_{small} \longrightarrow V_{large}$ bijection, for which the
alpar@3:   following is true:
alpar@2: \begin{center}
alpar@3: $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
alpar@3:   (M(u),M(v))\in{E_{large}}$
alpar@2: \end{center}
alpar@2: \end{definition}
alpar@3: For the sake of simplicity in this paper subgraphs and induced
alpar@3: subgraphs are defined in a more general way than usual:
alpar@2: \begin{definition}
alpar@3: $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I:
alpar@3:   V_{small}\longrightarrow V_{large}$ injection, for which the
alpar@3:   following is true:
alpar@2: \begin{center}
alpar@2: $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$
alpar@2: \end{center}
alpar@2: \end{definition}
alpar@2: 
alpar@2: \begin{definition} 
alpar@3: $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists
alpar@3:   I: V_{small}\longrightarrow V_{large}$ injection, for which the
alpar@3:   following is true:
alpar@2: \begin{center}
alpar@3: $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
alpar@3:   (I(u),I(v))\in E_{large}$
alpar@2: \end{center}
alpar@2: \end{definition}
alpar@2: 
alpar@2: \begin{definition}
alpar@3: $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node
alpar@3:     label function}, where K is an arbitrary set. The elements in K
alpar@3:   are the \textbf{node labels}. Two nodes, u and v are said to be
alpar@3:   \textbf{equivalent}, if $lab(u)=lab(v)$.
alpar@2: \end{definition}
alpar@2: 
alpar@3: When node labels are also given, the matched nodes must have the same
alpar@3: labels.  For example, the node labeled isomorphism is phrased by
alpar@2: \begin{definition}
alpar@3: $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label
alpar@3:     function lab} if $\exists M: V_{small} \longrightarrow V_{large}$
alpar@3:   bijection, for which the following is true:
alpar@2: \begin{center}
alpar@3: $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
alpar@3:   (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} :
alpar@3:   lab(u)=lab(M(u)))$
alpar@2: \end{center}
alpar@2: \end{definition}
alpar@2: 
alpar@2: The other two definitions can be extended in the same way.
alpar@2: 
alpar@3: Note that edge label function can be defined similarly to node label
alpar@3: function, and all the definitions can be extended with additional
alpar@3: conditions, but it is out of the scope of this work.
alpar@2: 
alpar@3: The equivalence of two nodes is usually defined by another relation,
alpar@3: $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the
alpar@3: definition given above if R is an equivalence relation, which does not
alpar@3: mean restriction in biological and chemical applications.
alpar@2: 
alpar@2: \subsection{Common problems}\label{sec:CommProb}
alpar@2: 
alpar@3: The focus of this paper is on two extensively studied topics, the
alpar@3: subgraph isomorphism and its variations. However, the following
alpar@3: problems also appear in many applications.
alpar@2: 
alpar@3: The \textbf{subgraph matching problem} is the following: is
alpar@3: $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node
alpar@3: label?
alpar@2: 
alpar@3: The \textbf{induced subgraph matching problem} asks the same about the
alpar@3: existence of an induced subgraph.
alpar@2: 
alpar@3: The \textbf{graph isomorphism problem} can be defined as induced
alpar@3: subgraph matching problem where the sizes of the two graphs are equal.
alpar@2: 
alpar@3: In addition to existence, it may be needed to show such a subgraph, or
alpar@3: it may be necessary to list all of them.
alpar@2: 
alpar@3: It should be noted that some authors misleadingly refer to the term
alpar@3: \emph{subgraph isomorphism problem} as an \emph{induced subgraph
alpar@3:   isomorphism problem}.
alpar@2: 
alpar@3: The following sections give the descriptions of VF2, VF2++, VF2 Plus
alpar@3: and a particular comparison.
alpar@2: 
alpar@2: \section{The VF2 Algorithm}
alpar@3: This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2
alpar@4: is able to handle all the variations mentioned in Section
alpar@4:   \ref{sec:CommProb}.  Although it can also handle directed graphs,
alpar@3: for the sake of simplicity, only the undirected case will be
alpar@3: discussed.
alpar@2: 
alpar@2: 
alpar@2: \subsection{Common notations}
alpar@3: \indent Assume $G_{small}$ is searched in $G_{large}$.  The following
alpar@3: definitions and notations will be used throughout the whole paper.
alpar@2: \begin{definition}
alpar@3: A set $M\subseteq V_{small}\times V_{large}$ is called
alpar@3: \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears
alpar@3: in more than one pair in M.  That is, M uniquely associates some of
alpar@3: the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice
alpar@3: versa.
alpar@2: \end{definition}
alpar@2: 
alpar@2: \begin{definition}
alpar@3: Mapping M \textbf{covers} a node v, if there exists a pair in M, which
alpar@3: contains v.
alpar@2: \end{definition}
alpar@2: 
alpar@2: \begin{definition}
alpar@3: A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the
alpar@3: nodes in $V_{small}$.
alpar@2: \end{definition}
alpar@2: 
alpar@2: \begin{notation}
alpar@3: Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in
alpar@3: V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in
alpar@3: V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$.
alpar@2: \end{notation}
alpar@2: 
alpar@2: \begin{notation}
alpar@3: Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node
alpar@3: exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$
alpar@3: and $v\in V_{small}\cup V_{large}$.
alpar@2: \end{notation}
alpar@2: 
alpar@2: Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique
alpar@2: 
alpar@3: The definitions of the isomorphism types can be rephrased on the
alpar@3: existence of a special whole mapping $M$, since it represents a
alpar@3: bijection. For example
alpar@2: \begin{center}
alpar@3: $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph
alpar@3:   isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v
alpar@3:   \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow
alpar@3:   (Pair(M,u),Pair(M,v))\in E_{large}$.
alpar@2: \end{center}
alpar@2: 
alpar@2: \begin{definition}
alpar@2: A set of whole mappings is called \textbf{problem type}.
alpar@2: \end{definition}
alpar@3: Throughout the paper, $\mathbf{PT}$ denotes a generic problem type
alpar@3: which can be substituted by any problem type.
alpar@2: 
alpar@3: A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using
alpar@3: this notations, VF2 searches a whole mapping $W$ of type $PT$.
alpar@2: 
alpar@3: For example the problem type of graph isomorphism problem is the
alpar@3: following.  A whole mapping $W$ is in $\mathbf{ISO}$, iff the
alpar@4: bijection represented by $W$ satisfies Definition~\ref{sec:ismorphic}.
alpar@4: The subgraph- and induced subgraph matching problems can be formalized
alpar@4: in a similar way. Let their problem types be denoted as $\mathbf{SUB}$
alpar@4: and $\mathbf{IND}$.
alpar@2: 
alpar@2: \begin{definition}
alpar@2: \label{expPT}
alpar@3: $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in
alpar@3: PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow
alpar@3: (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of
alpar@3: $G_{small}$ has to be mapped to an edge of $G_{large}$ for each
alpar@3: mapping in $PT$.
alpar@2: \end{definition}
alpar@2: 
alpar@2: Note that $ISO$, $SUB$ and $IND$ are expanding problem types.
alpar@2: 
alpar@3: This paper deals with the three problem types mentioned above only,
alpar@3: but the following generic definitions make it possible to handle other
alpar@3: types as well.  Although it may be challenging to find a proper
alpar@3: consistency function and an efficient cutting function.
alpar@2: 
alpar@2: \begin{definition}
alpar@3: Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a
alpar@3: \textbf{consistency function by } $\mathbf{PT}$, if the following
alpar@3: holds. If there exists whole mapping $W$ of type PT for which
alpar@3: $M\subseteq W$, then $Cons_{PT}(M)$ is true.
alpar@2: \end{definition}
alpar@2: 
alpar@2: \begin{definition} 
alpar@3: Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a
alpar@3: \textbf{cutting function by } $\mathbf{PT}$, if the following
alpar@3: holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a
alpar@3: whole mapping W of type PT.
alpar@2: \end{definition}
alpar@2: 
alpar@2: \begin{definition}
alpar@3: $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if
alpar@3:   $Cons_{PT}(M)$ is true.
alpar@2: \end{definition}
alpar@2: 
alpar@2: $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form.
alpar@2: \begin{notation}
alpar@3: Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and
alpar@3: $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where
alpar@3: $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping.
alpar@2: \end{notation}
alpar@2: 
alpar@3: $Cons_{PT}$ will be used to check the consistency of the already
alpar@3: covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if
alpar@3: no whole consistent mapping can contain the current mapping.
alpar@2: 
alpar@2: \subsection{Overview of the algorithm}
alpar@3: VF2 uses a state space representation of mappings, $Cons_{PT}$ for
alpar@3: excluding inconsistency with the problem type and $Cut_{PT}$ for
alpar@3: pruning the search tree.  Each state $s$ of the matching process can
alpar@3: be associated with a mapping $M(s)$.
alpar@2: 
alpar@4: Algorithm~\ref{alg:VF2Pseu} is a high level description of
alpar@3: the VF2 matching algorithm.
alpar@2: 
alpar@2: 
alpar@2: \begin{algorithm}
Madarasi@13: \algtext*{EndIf}%ne nyomtasson end if-et
Madarasi@13: \algtext*{EndFor}%ne
Madarasi@13: \algtext*{EndProcedure}%ne nyomtasson ..
alpar@2: \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu}
alpar@2: \begin{algorithmic}[1]
alpar@2: 
alpar@3: \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers
alpar@3:   $V_{small}$} \State Output($M(s)$) \Else
alpar@2:   
alpar@3:   \State Compute the set $P(s)$ of the pairs candidate for inclusion
alpar@3:   in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$
alpar@3:     $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state
alpar@3:   $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call}
alpar@3:   VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure
alpar@2: \end{algorithmic}
alpar@2: \end{algorithm}
alpar@2: 
alpar@2: 
alpar@3: The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it
alpar@3: starts with an empty mapping.
alpar@2: 
alpar@3: For each state $s$, the algorithm computes $P(s)$, the set of
alpar@3: candidate node pairs for adding to the current state $s$.
alpar@2: 
alpar@3: For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and
alpar@3: $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and
alpar@3: $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup
alpar@3: \{p\}$ is computed, and the whole process is recursively applied to
alpar@3: $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it
alpar@3: can be proved that $s$ can not be extended to a whole mapping.
alpar@2: 
Madarasi@11: In order to make sure of the correctness, see
alpar@2: \begin{claim}
alpar@3: Through consistent mappings, only consistent whole mappings can be
alpar@3: reached, and all of the whole mappings are reachable through
alpar@3: consistent mappings.
alpar@2: \end{claim}
alpar@2: 
alpar@3: Note that a state may be reached in many different ways, since the
alpar@3: order of insertions into M does not influence the nascent mapping. In
alpar@3: fact, the number of different ways which lead to the same state can be
alpar@3: exponentially large. If $G_{small}$ and $G_{large}$ are circles with n
alpar@3: nodes and n different node labels, there exists exactly one graph
alpar@3: isomorphism between them, but it will be reached in $n!$ different
alpar@3: ways.
alpar@2: 
alpar@2: However, one may observe
alpar@2: 
alpar@2: \begin{claim}
alpar@2: \label{claim:claimTotOrd}
alpar@3: Let $\prec$ an arbitrary total ordering relation on $V_{small}$.  If
alpar@3: the algorithm ignores each $p=(u,v) \in P(s)$, for which
alpar@2: \begin{center}
alpar@2: $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$,
alpar@2: \end{center}
alpar@3: then no state can be reached more than ones and each state associated
alpar@3: with a whole mapping remains reachable.
alpar@2: \end{claim}
alpar@2: 
alpar@3: Note that the cornerstone of the improvements to VF2 is a proper
alpar@3: choice of a total ordering.
alpar@2: 
alpar@2: \subsection{The candidate set P(s)}
alpar@2: \label{candidateComputingVF2}
alpar@2: $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.
alpar@3: Suppose that $PT$ is an expanding problem type, see
alpar@4: Definition~\ref{expPT}.
alpar@2: 
alpar@2: \begin{notation}
alpar@3: Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by
alpar@3: $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}}
alpar@3: \wedge \tilde{u}$ is covered by $M(s)\}$, and
alpar@3: \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not
alpar@3: covered by
alpar@3: $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}}
alpar@3: \wedge \tilde{v}$ is covered by $M(s)\}$
alpar@2: \end{notation}
alpar@2: 
alpar@3: The set $P(s)$ includes the pairs of uncovered neighbours of covered
alpar@3: nodes and if there is not such a node pair, all the pairs containing
alpar@3: two uncovered nodes are added. Formally, let
alpar@2: \[
alpar@2:  P(s)\!=\!
alpar@2:   \begin{cases} 
alpar@3:    T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if }
alpar@3:    T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq
alpar@3:    \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s))
alpar@3:    &\hspace{-0.15cm}otherwise.
alpar@2:   \end{cases}
alpar@2: \]
alpar@2: 
alpar@2: \subsection{Consistency}
alpar@3: This section defines the consistency functions for the different
alpar@4: problem types mentioned in Section~\ref{sec:CommProb}.
alpar@2: \begin{notation}
alpar@3: Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} :
alpar@3: (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large}
alpar@3:   (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$
alpar@2: \end{notation}
alpar@3: Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is
alpar@3: a state of the matching procedure, $M(s)$ is consistent mapping by
alpar@3: $PT$ and $lab(u)=lab(v)$.  $Cons_{PT}(p,M(s))$ checks whether
alpar@3: including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$.
alpar@2: 
alpar@2: \subsubsection{Induced subgraph isomorphism}
alpar@3: $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow
alpar@3: (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small}
alpar@3: \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline The
alpar@3: following formulation gives an efficient way of calculating
alpar@3: $Cons_{IND}$.
alpar@2: \begin{claim}
alpar@3: $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v)
alpar@3:   \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge
alpar@3:   (\forall \tilde{u}\in \Gamma_{small}(u)
alpar@3:   \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
alpar@3:   consistency function in the case of $IND$.
alpar@2: \end{claim}
alpar@2: 
alpar@2: \subsubsection{Graph isomorphism}
alpar@3: $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$
alpar@3: $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by
alpar@3: $IND$.
alpar@2: \begin{claim}
alpar@3: $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and
alpar@3:   only if it is a consistency function by $IND$.
alpar@2: \end{claim}
alpar@2: \subsubsection{Subgraph isomorphism}
alpar@3: $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow
alpar@3: (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small}
alpar@3: \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.
alpar@2: \newline
alpar@3: The following formulation gives an efficient way of calculating
alpar@3: $Cons_{SUB}$.
alpar@2: \begin{claim}
alpar@3: $Cons_{SUB}((u,v),M(s)):= (\forall \tilde{u}\in \Gamma_{small}(u)
alpar@3:   \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a
alpar@3:   consistency function by $SUB$.
alpar@2: \end{claim}
alpar@2: 
alpar@2: \subsection{Cutting rules}
alpar@3: $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently
alpar@3: verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can
alpar@3: be true only if it is impossible to extended $M(s)\cup \{p\}$ to a
alpar@3: whole mapping.
alpar@2: \begin{notation}
alpar@2: 
alpar@3: Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash
alpar@3: M_{small}(s))\backslash T_{small}(s)$, and
alpar@3: \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash
alpar@3: M_{large}(s))\backslash T_{large}(s)$.
alpar@2: \end{notation}
alpar@2: \subsubsection{Induced subgraph isomorphism}
alpar@2: \begin{claim}
alpar@3: $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
alpar@3:   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
alpar@3:   \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap
alpar@3:   \tilde{T}_{small}(s)|$ is a cutting function by $IND$.
alpar@2: \end{claim}
alpar@2: \subsubsection{Graph isomorphism}
alpar@3: Note that the cutting function of induced subgraph isomorphism defined
alpar@3: above is a cutting function by $ISO$, too, however it is less
alpar@3: efficient than the following while their computational complexity is
alpar@3: the same.
alpar@2: \begin{claim}
alpar@3: $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq
alpar@3:   |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
alpar@3:   \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap
alpar@3:   \tilde{T}_{small}(s)|$ is a cutting function by $ISO$.
alpar@2: \end{claim}
alpar@2: 
alpar@2: \subsubsection{Subgraph isomorphism}
alpar@2: \begin{claim}
alpar@3: $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
alpar@3:   |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by
alpar@3:   $SUB$.
alpar@2: \end{claim}
alpar@3: Note that there is a significant difference between induced and
alpar@3: non-induced subgraph isomorphism:
alpar@2: 
alpar@2: \begin{claim}
alpar@2: \label{claimSUB}
alpar@3: $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| <
alpar@3: |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap
alpar@3: \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$
alpar@3: is \textbf{not} a cutting function by $SUB$.
alpar@2: \end{claim}
alpar@2: 
alpar@2: \section{The VF2++ Algorithm}
alpar@3: Although any total ordering relation makes the search space of VF2 a
alpar@3: tree, its choice turns out to dramatically influence the number of
alpar@3: visited states. The goal is to determine an efficient one as quickly
alpar@3: as possible.
alpar@2: 
alpar@3: The main reason for VF2++' superiority over VF2 is twofold. Firstly,
alpar@3: taking into account the structure and the node labeling of the graph,
alpar@3: VF2++ determines a state order in which most of the unfruitful
alpar@3: branches of the search space can be pruned immediately. Secondly,
alpar@3: introducing more efficient --- nevertheless still easier to compute
alpar@3: --- cutting rules reduces the chance of going astray even further.
alpar@2: 
alpar@3: In addition to the usual subgraph isomorphism, specialized versions
alpar@3: for induced subgraph isomorphism and for graph isomorphism have been
alpar@3: designed. VF2++ has gained a runtime improvement of one order of
alpar@3: magnitude respecting induced subgraph isomorphism and a better
alpar@3: asymptotical behaviour in the case of graph isomorphism problem.
alpar@2: 
alpar@3: Note that a weaker version of the cutting rules and the more efficient
alpar@3: candidate set calculating were described in \cite{VF2Plus}, too.
alpar@2: 
alpar@3: It should be noted that all the methods described in this section are
alpar@3: extendable to handle directed graphs and edge labels as well.
alpar@2: 
alpar@3: The basic ideas and the detailed description of VF2++ are provided in
alpar@3: the following.
alpar@2: 
alpar@2: \subsection{Preparations}
alpar@2: \begin{claim}
alpar@2: \label{claim:claimCoverFromLeft}
alpar@3: The total ordering relation uniquely determines a node order, in which
alpar@3: the nodes of $V_{small}$ will be covered by VF2. From the point of
alpar@3: view of the matching procedure, this means, that always the same node
alpar@3: of $G_{small}$ will be covered on the d-th level.
alpar@2: \end{claim}
alpar@2: 
alpar@2: \begin{definition}
alpar@3: An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
alpar@3: $V_{small}$ is \textbf{matching order}, if exists $\prec$ total
alpar@3: ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds
alpar@3: pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$.
alpar@2: \end{definition}
alpar@2: 
alpar@2: \begin{claim}\label{claim:MOclaim}
alpar@3: A total ordering is matching order, iff the nodes of every component
alpar@3: form an interval in the node sequence, and every node connects to a
alpar@3: previous node in its component except the first node of the
alpar@3: component. The order of the components is arbitrary.  \\Formally
alpar@3: spoken, an order
alpar@3: $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of
alpar@3: $V_{small}$ is matching order $\Leftrightarrow$ $\forall
alpar@3: G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall
alpar@3: i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in
alpar@3: V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq
alpar@3: l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge
alpar@3: (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in
alpar@3: \{1,..,|V_{small}|\}$\newline
alpar@2: \end{claim}
alpar@2: 
alpar@3: To summing up, a total ordering always uniquely determines a matching
alpar@3: order, and every matching order can be determined by a total ordering,
alpar@3: however, more than one different total orderings may determine the
alpar@3: same matching order.
alpar@2: \subsection{Idea behind the algorithm}
alpar@3: The goal is to find a matching order in which the algorithm is able to
alpar@3: recognize inconsistency or prune the infeasible branches on the
alpar@3: highest levels and goes deep only if it is needed.
alpar@2: 
alpar@2: \begin{notation}
alpar@3: Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the
alpar@3: number of neighbours of u which are in H, where $u\in V_{small} $ and
alpar@3: $H\subseteq V_{small}$.
alpar@2: \end{notation}
alpar@2: 
alpar@3: The principal question is the following. Suppose a state $s$ is
alpar@3: given. For which node of $T_{small}(s)$ is the hardest to find a
alpar@3: consistent pair in $G_{large}$? The more covered neighbours a node in
alpar@3: $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has
alpar@3: ---, the more rarely satisfiable consistency constraints for its pair
alpar@3: are given.
alpar@2: 
alpar@3: In biology, most of the graphs are sparse, thus several nodes in
alpar@3: $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes
alpar@3: reasonable to define a secondary and a tertiary order between them.
alpar@3: The observation above proves itself to be as determining, that the
alpar@3: secondary ordering prefers nodes with the most uncovered neighbours
alpar@3: among which have the same $Conn_{M_{small}(s)}$ to increase
alpar@3: $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible.  The
alpar@3: tertiary ordering prefers nodes having the rarest uncovered labels.
alpar@2: 
alpar@3: Note that the secondary ordering is the same as the ordering by $deg$,
alpar@3: which is a static data in front of the above used.
alpar@2: 
alpar@3: These rules can easily result in a matching order which contains the
alpar@3: nodes of a long path successively, whose nodes may have low $Conn$ and
alpar@3: is easily matchable into $G_{large}$. To avoid that, a BFS order is
alpar@3: used, which provides the shortest possible paths.
alpar@2: \newline
alpar@2: 
alpar@3: In the following, some examples on which the VF2 may be slow are
alpar@3: described, although they are easily solvable by using a proper
alpar@3: matching order.
alpar@2: 
alpar@2: \begin{example}
alpar@3: Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways
alpar@3: without node labels. Let $u\in V_{small}$ and $v\in V_{large}$.
alpar@2: \newline
alpar@2: $lab(u):=black$
alpar@2: \newline
alpar@2: $lab(v):=black$
alpar@2: \newline
alpar@3: $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash
alpar@3: \{u\})$
alpar@2: \newline
alpar@3: $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash
alpar@3: \{v\})$
alpar@2: \newline
alpar@2: 
alpar@3: Now, any mapping by the node label $lab$ must contain $(u,v)$, since
alpar@3: $u$ is black and no node in $V_{large}$ has a black label except
alpar@3: $v$. If unfortunately $u$ were the last node which will get covered,
alpar@3: VF2 would check only in the last steps, whether $u$ can be matched to
alpar@3: $v$.
alpar@2: \newline
alpar@3: However, had $u$ been the first matched node, u would have been
alpar@3: matched immediately to v, so all the mappings would have been
alpar@3: precluded in which node labels can not correspond.
alpar@2: \end{example}
alpar@2: 
alpar@2: \begin{example}
alpar@3: Suppose there is no node label given, $G_{small}$ is a small graph and
alpar@3: can not be mapped into $G_{large}$ and $u\in V_{small}$.
alpar@2: \newline
alpar@3: Let $G'_{small}:=(V_{small}\cup
alpar@3: \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup
alpar@3: \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is,
alpar@3: $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint
alpar@3: from $G_{small}$ and one of its starting points is connected to $u\in
alpar@3: V_{small}\}$.
alpar@2: \newline
alpar@3: Is there a subgraph of $G_{large}$, which is isomorph with
alpar@3: $G'_{small}$?
alpar@2: \newline
alpar@3: If unfortunately the nodes of the path were the first $k$ nodes in the
alpar@3: matching order, the algorithm would iterate through all the possible k
alpar@3: long paths in $G_{large}$, and it would recognize that no path can be
alpar@3: extended to $G'_{small}$.
alpar@2: \newline
alpar@3: However, had it started by the matching of $G_{small}$, it would not
alpar@3: have matched any nodes of the path.
alpar@2: \end{example}
alpar@2: 
alpar@3: These examples may look artificial, but the same problems also appear
Madarasi@7: in real-world instances, even though in a less obvious way.
alpar@2: 
alpar@2: \subsection{Total ordering}
alpar@3: Instead of the total ordering relation, the matching order will be
alpar@3: searched directly.
alpar@2: \begin{notation}
alpar@3: Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} :
alpar@3: l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ ,
alpar@3: where $l$ is a label and $\mathcal{M}\subseteq V_{small}$.
alpar@2: \end{notation}
alpar@2: 
alpar@2: \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$.
alpar@2: \end{definition}
alpar@2: 
alpar@2: \begin{algorithm}
Madarasi@8: \algtext*{EndIf}
Madarasi@8: \algtext*{EndProcedure}
alpar@2: \algtext*{EndWhile}
Madarasi@13: \algtext*{EndFor}
alpar@2: \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu}
alpar@2: \begin{algorithmic}[1]
alpar@3: \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$
alpar@3: \Comment{matching order} \While{$V_{small}\backslash \mathcal{M}
alpar@3:   \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg
alpar@3: min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash
alpar@3: \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
alpar@3: root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
alpar@3: $d$-th level \State Process $V_d$ \Comment{See Algorithm
Madarasi@8:   \ref{alg:VF2PPProcess1}} \EndFor
alpar@3: \EndWhile \EndProcedure
alpar@2: \end{algorithmic}
alpar@2: \end{algorithm}
alpar@2: 
alpar@2: \begin{algorithm}
Madarasi@8: \algtext*{EndIf}
Madarasi@8: \algtext*{EndProcedure}%ne nyomtasson ..
alpar@2: \algtext*{EndWhile}
Madarasi@8: \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1}
alpar@2: \begin{algorithmic}[1]
alpar@3: \Procedure{VF2++ProcessLevel1}{$V_{d}$} \While{$V_d\neq\emptyset$}
alpar@3: \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg
alpar@3: max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$
alpar@3: \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh
alpar@3: $F_\mathcal{M}$ \EndWhile \EndProcedure
alpar@2: \end{algorithmic}
alpar@2: \end{algorithm}
alpar@2: 
alpar@4: Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
alpar@4: matching order procedure of VF2++. It computes a BFS tree for each
alpar@3: component in ascending order of their rarest $lab$ and largest $deg$,
alpar@4: whose root vertex is the component's minimal
Madarasi@8: node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending
Madarasi@8: lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately
Madarasi@8: to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately.
alpar@2: 
alpar@4: Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}
alpar@4: provides a matching order.
alpar@2: 
alpar@2: 
alpar@2: \subsection{Cutting rules}
alpar@2: \label{VF2PPCuttingRules}
alpar@3: This section presents the cutting rules of VF2++, which are improved
alpar@3: by using extra information coming from the node labels.
alpar@2: \begin{notation}
alpar@3: Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l
alpar@3: \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and
alpar@3: $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge
alpar@3: \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in
alpar@3: V_{large}$ and $l$ is a label.
alpar@2: \end{notation}
alpar@2: 
alpar@2: \subsubsection{Induced subgraph isomorphism}
alpar@2: \begin{claim}
alpar@2: \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND.
alpar@2: \end{claim}
Madarasi@13: 
alpar@2: \subsubsection{Graph isomorphism}
alpar@2: \begin{claim}
alpar@2: \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\  \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO.
alpar@2: \end{claim}
alpar@2: 
alpar@2: \subsubsection{Subgraph isomorphism}
alpar@2: \begin{claim}
Madarasi@7: \[LabCut_{SUB}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\] is a cutting function by SUB.
alpar@2: \end{claim}
alpar@2: 
alpar@2: 
alpar@2: 
alpar@2: \subsection{Implementation details}
alpar@3: This section provides a detailed summary of an efficient
alpar@3: implementation of VF2++.
alpar@2: \subsubsection{Storing a mapping}
alpar@3: After fixing an arbitrary node order ($u_0, u_1, ..,
alpar@3: u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store
alpar@3: the current mapping in the following way.
alpar@2: \[
alpar@3:  M[i] =
alpar@2:   \begin{cases} 
alpar@3:    v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INVALID &
alpar@3:    if\ no\ node\ has\ been\ mapped\ to\ u_i.
alpar@2:   \end{cases}
alpar@2: \]
alpar@3: Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$
alpar@3: means "no node".
alpar@2: \subsubsection{Avoiding the recurrence}
alpar@4: The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
Madarasi@9: as a \textit{while loop}, which has a loop counter $depth$ denoting the
Madarasi@9: all-time depth of the recursion. Fixing a matching order, let $M$
Madarasi@9: denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft},
alpar@3: $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before
Madarasi@9: $depth$. $M[depth]$ changes
alpar@3: while the state is being processed, but the property is held before
alpar@3: both stepping back to a predecessor state and exploring a successor
alpar@3: state.
alpar@2: 
alpar@3: The necessary part of the candidate set is easily maintainable or
alpar@3: computable by following
alpar@4: Section~\ref{candidateComputingVF2}. A much faster method
alpar@3: has been designed for biological- and sparse graphs, see the next
alpar@3: section for details.
alpar@2: 
alpar@2: \subsubsection{Calculating the candidates for a node}
alpar@4: Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
alpar@3: task is not to maintain the candidate set, but to generate the
alpar@3: candidate nodes in $G_{large}$ for a given node $u\in V_{small}$.  In
alpar@3: case of an expanding problem type and $M$ mapping, if a node $v\in
alpar@3: V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall
alpar@3: u'\in V_{small} : (u,u')\in
alpar@3: E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in
alpar@3: E_{large}$. That is, each covered neighbour of $u$ has to be mapped to
alpar@3: a covered neighbour of $v$.
alpar@2: 
alpar@3: Having said that, an algorithm running in $\Theta(deg)$ time is
alpar@3: describable if there exists a covered node in the component containing
Madarasi@9: $u$, and a linear one other wise.
alpar@2: 
alpar@2: 
alpar@2: \subsubsection{Determining the node order}
alpar@3: This section describes how the node order preprocessing method of
alpar@3: VF2++ can efficiently be implemented.
alpar@2: 
alpar@3: For using lookup tables, the node labels are associated with the
alpar@3: numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It
Madarasi@9: enables $F_\mathcal{M}$ to be stored in an array. At first, the node order
alpar@3: $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes
alpar@3: in $V_{small}$ having label i, which is easy to compute in
alpar@3: $\Theta(|V_{small}|)$ steps.
alpar@2: 
Madarasi@9: Representing $\mathcal{M}\subseteq V_{small}$ as an array of
Madarasi@9: size $|V_{small}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes.
alpar@2: 
alpar@2: \subsubsection{Cutting rules}
alpar@4: In Section~\ref{VF2PPCuttingRules}, the cutting rules were
alpar@3: described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$
alpar@3: and $\tilde T_{large}$, which are dependent on the all-time mapping
alpar@3: (i.e. on the all-time state). The aim is to check the labeled cutting
alpar@3: rules of VF2++ in $\Theta(deg)$ time.
alpar@2: 
alpar@3: Firstly, suppose that these four sets are given in such a way, that
alpar@3: checking whether a node is in a certain set takes constant time,
alpar@3: e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an
alpar@3: initially zero integer lookup table of size $|K|$. After incrementing
alpar@3: $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and
alpar@3: decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap
alpar@3: T_{large}(s)$, the first part of the cutting rules is checkable in
alpar@3: $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$
alpar@3: to zero takes $\Theta(deg)$ time again, which makes it possible to use
Madarasi@9: the same table through the whole algorithm. The second part of the
alpar@3: cutting rules can be verified using the same method with $\tilde
alpar@3: T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and
alpar@3: $T_{large}$. Thus, the overall complexity is $\Theta(deg)$.
alpar@2: 
alpar@3: An other integer lookup table storing the number of covered neighbours
alpar@3: of each node in $G_{large}$ gives all the information about the sets
alpar@3: $T_{large}$ and $\tilde T_{large}$, which is maintainable in
alpar@3: $\Theta(deg)$ time when a pair is added or substracted by incrementing
alpar@3: or decrementing the proper indices. A further improvement is that the
alpar@3: values of $L[lab(u')]$ in case of checking $u$ is dependent only on
alpar@3: $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an
alpar@3: array of pairs (label, number of such labels) can be stored to skip
alpar@3: the maintaining operations. Note that these arrays are at most of size
alpar@3: $deg$. Skipping this trick, the number of covered neighbours has to be
alpar@3: stored for each node of $G_{small}$ as well to get the sets
alpar@3: $T_{small}$ and $\tilde T_{small}$.
alpar@2: 
alpar@3: Using similar tricks, the consistency function can be evaluated in
alpar@3: $\Theta(deg)$ steps, as well.
alpar@2: 
alpar@2: \section{The VF2 Plus Algorithm}
alpar@3: The VF2 Plus algorithm is a recently improved version of VF2. It was
alpar@3: compared with the state of the art algorithms in \cite{VF2Plus} and
alpar@3: has proven itself to be competitive with RI, the best algorithm on
alpar@3: biological graphs.  \\ A short summary of VF2 Plus follows, which uses
alpar@3: the notation and the conventions of the original paper.
alpar@2: 
alpar@2: \subsection{Ordering procedure}
alpar@3: VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$
alpar@3: with the lowest probability to find a pair in $V_{small}$ and the
alpar@3: highest number of connections with the nodes already sorted by the
alpar@3: algorithm.
alpar@2: 
alpar@2: \begin{definition}
alpar@3: $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and
alpar@3:   $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$.
alpar@2: \end{definition}
alpar@3: $P_{lab}(L):=$ a priori probability to find a node with label $L$ in
alpar@3: $V_{large}$
alpar@2: \newline
alpar@3: $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in
alpar@3: $V_{large}$
alpar@2: \newline
alpar@3: $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of
alpar@3: already sorted nodes, $T$ is the set of nodes candidate to be
alpar@3: selected, and $degreeM$ of a node is the number of its neighbours in
alpar@3: $M$.
alpar@2: \begin{algorithm}
alpar@3: \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne
alpar@3: nyomtasson ..  \algtext*{EndProcedure}%ne nyomtasson ..
alpar@2: \algtext*{EndWhile}
alpar@2: \caption{}\label{alg:VF2PlusPseu}
alpar@2: \begin{algorithmic}[1]
alpar@3: \Procedure{VF2 Plus order}{} \State Select the node with the lowest
alpar@3: $P$.  \If {more nodes share the same $P$} \State select the one with
alpar@3: maximum degree \EndIf \If {more nodes share the same $P$ and have the
alpar@3:   max degree} \State select the first \EndIf \State Put the selected
alpar@3: node in the set $M$. \label{alg:putIn} \State Put all its unsorted
alpar@3: neighbours in the set $T$.  \If {$M\neq V_{small}$} \State From set
alpar@3: $T$ select the node with maximum $degreeM$.  \If {more nodes have
alpar@3:   maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf
alpar@3: \If {more nodes have maximum $degreeM$ and $P$} \State Select the
alpar@3: first.  \EndIf \State \textbf{goto \ref{alg:putIn}.}  \EndIf
alpar@2: \EndProcedure
alpar@2: \end{algorithmic}
alpar@2: \end{algorithm}
alpar@2: 
alpar@4: Using these notations, Algorithm~\ref{alg:VF2PlusPseu}
alpar@3: provides the description of the sorting procedure.
alpar@2: 
alpar@3: Note that $P(u)$ is not the exact probability of finding a consistent
alpar@3: pair for $u$ by choosing a node of $V_{large}$ randomly, since
alpar@3: $P_{lab}$ and $P_{deg}$ are not independent, though calculating the
alpar@3: real probability would take quadratic time, which may be reduced by
alpar@3: using fittingly lookup tables.
alpar@2: 
alpar@2: \section{Experimental results}
alpar@3: This section compares the performance of VF2++ and VF2 Plus. Both
alpar@3: algorithms have run faster with orders of magnitude than VF2, thus its
alpar@3: inclusion was not reasonable.
alpar@2: \subsection{Biological graphs}
alpar@3: The tests have been executed on a recent biological dataset created
alpar@3: for the International Contest on Pattern Search in Biological
Madarasi@7: Databases\cite{Content}, which has been constructed of molecule,
Madarasi@7: protein and contact map graphs extracted from the Protein Data
alpar@3: Bank\cite{ProteinDataBank}.
alpar@2: 
alpar@3: The molecule dataset contains small graphs with less than 100 nodes
alpar@3: and an average degree of less than 3. The protein dataset contains
alpar@3: graphs having 500-10 000 nodes and an average degree of 4, while the
alpar@3: contact map dataset contains graphs with 150-800 nodes and an average
alpar@3: degree of 20.  \\
alpar@2: 
alpar@3: In the following, the induced subgraph isomorphism and the graph
alpar@3: isomorphism will be examined.
alpar@2: 
Madarasi@7: This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For run time results, please see Figure~\ref{fig:bioIND}.
Madarasi@7: 
Madarasi@7: In an other experiment, the nodes of each graph in the database had been
Madarasi@7: shuffled, and an isomorphism between the shuffled and the original
Madarasi@7: graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}.
Madarasi@7: 
Madarasi@7: 
alpar@2: 
alpar@2: \begin{figure}[H]
Madarasi@7: \vspace*{-2cm}
Madarasi@7: \hspace*{-1.5cm}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
Madarasi@7: \begin{figure}[H]
Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right]
Madarasi@7: \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
Madarasi@7:   west},scaled x ticks = false,x tick label style={/pgf/number
Madarasi@7:   format/1000 sep = \thinspace}]
Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt};
Madarasi@7: \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark
Madarasi@7:   size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt};
Madarasi@7: \end{axis}
Madarasi@7: \end{tikzpicture}
Madarasi@7: \caption{In the case of molecules, there is not such a significant
Madarasi@7:   difference, but VF2++ seems to be faster as the number of nodes
Madarasi@7:   increases.}\label{fig:ISOMolecule}
Madarasi@7: \end{figure}
Madarasi@7: \end{subfigure}
Madarasi@7: \hspace*{1.5cm}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
Madarasi@7: \begin{figure}[H]
Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right]
Madarasi@7: \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
Madarasi@7:   west},scaled x ticks = false,x tick label style={/pgf/number
Madarasi@7:   format/1000 sep = \thinspace}]
Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt};
Madarasi@7: \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark
Madarasi@7:   size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt};
Madarasi@7: \end{axis}
Madarasi@7: \end{tikzpicture}
Madarasi@7: \caption{The results are closer to each other on contact maps, but
Madarasi@7:   VF2++ still performs consistently better.}\label{fig:ISOContact}
Madarasi@7: \end{figure}
Madarasi@7: \end{subfigure}
Madarasi@7: 
alpar@2: \begin{center}
Madarasi@7: \vspace*{-0.5cm}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
Madarasi@7: \begin{figure}[H]
Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right]
Madarasi@7: \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
Madarasi@7:   west},scaled x ticks = false,x tick label style={/pgf/number
Madarasi@7:   format/1000 sep = \thinspace}]
Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt};
Madarasi@7: \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark
Madarasi@7:   size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt};
Madarasi@7: \end{axis}
Madarasi@7: \end{tikzpicture}
Madarasi@7: \caption{On protein graphs, VF2 Plus has a super linear time
Madarasi@7:   complexity, while VF2++ runs in near constant time. The difference
Madarasi@7:   is about two order of magnitude on large graphs.}\label{fig:ISOProt}
Madarasi@7: \end{figure}
Madarasi@7: \end{subfigure}
Madarasi@7: \end{center}
Madarasi@7: \vspace*{-0.6cm}
Madarasi@7: \caption{\normalsize{Graph isomomorphism on biological graphs}}\label{fig:bioISO}
Madarasi@7: \end{figure}
Madarasi@7: 
Madarasi@7: 
Madarasi@7: \begin{figure}[H]
Madarasi@7: \vspace*{-2cm}
Madarasi@7: \hspace*{-1.5cm}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
Madarasi@7: \begin{figure}[H]
Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right]
Madarasi@7: \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
Madarasi@7:   west},scaled x ticks = false,x tick label style={/pgf/number
Madarasi@7:   format/1000 sep = \thinspace}]
Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt};
Madarasi@7: \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark
Madarasi@7:   size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt};
Madarasi@7: \end{axis}
Madarasi@7: \end{tikzpicture}
Madarasi@7: \caption{In the case of molecules, the algorithms have
Madarasi@7:   similar behaviour, but VF2++ is almost two times faster even on such
Madarasi@7:   small graphs.} \label{fig:INDMolecule}
Madarasi@7: \end{figure}
Madarasi@7: \end{subfigure}
Madarasi@7: \hspace*{1.5cm}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
Madarasi@7: \begin{figure}[H]
Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right]
Madarasi@7: \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
Madarasi@7:   west},scaled x ticks = false,x tick label style={/pgf/number
Madarasi@7:   format/1000 sep = \thinspace}]
Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt};
Madarasi@7: \addplot table {Orig/ContactMaps.128.txt};
Madarasi@7: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
Madarasi@7:         {VF2PPLabel/ContactMaps.128.txt};
Madarasi@7: \end{axis}
Madarasi@7: \end{tikzpicture}
Madarasi@7: \caption{On contact maps, VF2++ runs in near constant time, while VF2
Madarasi@7:   Plus has a near linear behaviour.} \label{fig:INDContact}
Madarasi@7: \end{figure}
Madarasi@7: \end{subfigure}
Madarasi@7: 
Madarasi@7: \begin{center}
Madarasi@7: \vspace*{-0.5cm}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
Madarasi@7: \begin{figure}[H]
Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right]
alpar@2:   \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid
alpar@3:   =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:     west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:     format/1000 sep = \thinspace}] %\addplot+[only marks] table
alpar@3:     {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue]
alpar@3:     table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark
alpar@3:       size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt};
alpar@2:   \end{axis}
alpar@2:   \end{tikzpicture}
alpar@3: \caption{Both the algorithms have linear behaviour on protein
alpar@3:   graphs. VF2++ is more than 10 times faster than VF2
alpar@3:   Plus.} \label{fig:INDProt}
alpar@2: \end{figure}
Madarasi@7: \end{subfigure}
alpar@2: \end{center}
Madarasi@7: \vspace*{-0.5cm}
Madarasi@7: \caption{\normalsize{Graph isomomorphism on biological graphs}}\label{fig:bioIND}
alpar@2: \end{figure}
alpar@2: 
alpar@2: 
alpar@2: 
alpar@2: 
alpar@2: 
alpar@2: \subsection{Random graphs}
alpar@3: This section compares VF2++ with VF2 Plus on random graphs of a large
alpar@3: size. The node labels are uniformly distributed.  Let $\delta$ denote
alpar@3: the average degree.  For the parameters of problems solved in the
alpar@3: experiments, please see the top of each chart.
alpar@2: \subsubsection{Graph isomorphism}
alpar@3: To evaluate the efficiency of the algorithms in the case of graph
alpar@3: isomorphism, connected graphs of less than 20 000 nodes have been
alpar@3: considered. Generating a random graph and shuffling its nodes, an
Madarasi@7: isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results
alpar@4: on graph sets of various density.
alpar@2: 
Madarasi@7: 
Madarasi@7: 
Madarasi@7: 
Madarasi@12: \begin{figure}
Madarasi@7: \vspace*{-1.5cm}
Madarasi@7: \hspace*{-1.5cm}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
Madarasi@7:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/iso/vf2pIso5_1.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/iso/vf2ppIso5_1.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
Madarasi@7: \end{subfigure}
Madarasi@7: %\hspace{1cm}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
Madarasi@7:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/iso/vf2pIso10_1.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/iso/vf2ppIso10_1.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
Madarasi@7: \end{subfigure}
Madarasi@7: %%\hspace{1cm}
Madarasi@7: \hspace*{-1.5cm}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
Madarasi@7:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/iso/vf2pIso15_1.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/iso/vf2ppIso15_1.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
Madarasi@7:      \end{subfigure}
Madarasi@7:      \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 35$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
Madarasi@7:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/iso/vf2pIso35_1.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/iso/vf2ppIso35_1.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
Madarasi@7: \end{subfigure}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
Madarasi@7: \hspace*{-1.5cm}
alpar@2: \begin{tikzpicture}
Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 45$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
Madarasi@7:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/iso/vf2pIso45_1.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/iso/vf2ppIso45_1.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
Madarasi@7: \end{subfigure}
Madarasi@7: \hspace*{-1.5cm}
Madarasi@7: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{tikzpicture}
Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \thinspace}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/iso/vf2pIso100_1.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/iso/vf2ppIso100_1.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
Madarasi@7: \end{subfigure}
alpar@2: \vspace*{-0.8cm}
Madarasi@7: \caption{IND on graphs having an average degree of
Madarasi@7:   5.}\label{fig:randISO}
alpar@2: \end{figure}
alpar@2: 
alpar@2: 
Madarasi@7: 
Madarasi@7: 
Madarasi@7: 
Madarasi@7: 
Madarasi@7: 
Madarasi@7: 
Madarasi@7: 
Madarasi@7: 
alpar@3: Considering the graph isomorphism problem, VF2++ consistently
alpar@3: outperforms its rival especially on sparse graphs. The reason for the
alpar@3: slightly super linear behaviour of VF2++ on denser graphs is the
alpar@3: larger number of nodes in the BFS tree constructed in
alpar@4: Algorithm~\ref{alg:VF2PPPseu}.
alpar@2: 
alpar@2: \subsubsection{Induced subgraph isomorphism}
alpar@3: This section provides a comparison of VF2++ and VF2 Plus in the case
alpar@3: of induced subgraph isomorphism. In addition to the size of the large
alpar@3: graph, that of the small graph dramatically influences the hardness of
alpar@3: a given problem too, so the overall picture is provided by examining
alpar@3: small graphs of various size.
alpar@2: 
alpar@3: For each chart, a number $0<\rho< 1$ has been fixed and the following
alpar@3: has been executed 150 times. Generating a large graph $G_{large}$,
alpar@3: choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes,
alpar@3: and for all the 10 subgraphs find a mapping by using both the graph
alpar@3: matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
Madarasi@10: 0.3, 0.6, 0.8, 0.95$ cases have been examined, see
alpar@4: Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
Madarasi@10: \ref{fig:randIND35}.
alpar@2: 
alpar@2: 
alpar@2: 
alpar@2: 
alpar@2: 
Madarasi@12: \begin{figure}
Madarasi@7: \vspace*{-1.5cm}
Madarasi@7: \hspace*{-1.5cm}
alpar@2: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.05.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd5_0.05.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2:      \end{subfigure}
alpar@2:      \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.1.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd5_0.1.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2: \end{subfigure}
Madarasi@7: \hspace*{-1.5cm}
alpar@2: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.3.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd5_0.3.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2:      \end{subfigure}
alpar@2:      \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.6.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd5_0.6.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2: \end{subfigure}
alpar@2: \begin{subfigure}[b]{0.55\textwidth}
Madarasi@7: \hspace*{-1.5cm}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.8.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd5_0.8.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2:      \end{subfigure}
Madarasi@7:      \hspace*{-1.5cm}
alpar@2:      \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \thinspace}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.95.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd5_0.95.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{subfigure}
alpar@2: \vspace*{-0.8cm}
alpar@3: \caption{IND on graphs having an average degree of
alpar@3:   5.}\label{fig:randIND5}
alpar@2: \end{figure}
alpar@2: 
alpar@2: 
alpar@2: \begin{figure}[H]
Madarasi@7: \vspace*{-1.5cm}
Madarasi@7: \hspace*{-1.5cm}
alpar@2: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
Madarasi@7: \hspace*{-0.5cm}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.05.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd10_0.05.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2:      \end{subfigure}
alpar@2:      \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
Madarasi@7:      \hspace*{-0.5cm}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.1.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd10_0.1.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2: \end{subfigure}
Madarasi@7: \hspace*{-1.5cm}
alpar@2: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.3.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd10_0.3.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2:      \end{subfigure}
alpar@2:      \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.6.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd10_0.6.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2: \end{subfigure}
Madarasi@7: \hspace*{-1.5cm}
alpar@2: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.8.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd10_0.8.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2:      \end{subfigure}
alpar@2:      \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \thinspace}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.95.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd10_0.95.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{subfigure}
alpar@2: \vspace*{-0.8cm}
alpar@3: \caption{IND on graphs having an average degree of
alpar@3:   10.}\label{fig:randIND10}
alpar@2: \end{figure}
alpar@2: 
alpar@2: 
alpar@2: 
alpar@2: \begin{figure}[H]
Madarasi@7: \vspace*{-1.5cm}
Madarasi@7: \hspace*{-1.5cm}
alpar@2: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.05.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd35_0.05.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2:      \end{subfigure}
alpar@2:      \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.1.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd35_0.1.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2: \end{subfigure}
Madarasi@7: \hspace*{-1.5cm}
alpar@2: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.3.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd35_0.3.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2:      \end{subfigure}
alpar@2:      \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{center}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.6.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd35_0.6.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{center}
alpar@2: \end{subfigure}
Madarasi@7: \hspace*{-1.5cm}
alpar@2: \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \space}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.8.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd35_0.8.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2:      \end{subfigure}
alpar@2:      \begin{subfigure}[b]{0.55\textwidth}
alpar@2: \begin{tikzpicture}
alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid
alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north
alpar@3:   west},scaled x ticks = false,x tick label style={/pgf/number
alpar@3:   format/1000 sep = \thinspace}]
alpar@2: %\addplot+[only marks] table {proteinsOrig.txt};
alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.95.txt};
alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table
alpar@3:         {randGraph/ind/vf2ppInd35_0.95.txt};
alpar@2: \end{axis}
alpar@2: \end{tikzpicture}
alpar@2: \end{subfigure}
alpar@2: \vspace*{-0.8cm}
alpar@3: \caption{IND on graphs having an average degree of
alpar@3:   35.}\label{fig:randIND35}
alpar@2: \end{figure}
alpar@2: 
alpar@2: 
alpar@3: Based on these experiments, VF2++ is faster than VF2 Plus and able to
alpar@3: handle really large graphs in milliseconds. Note that when $IND$ was
alpar@3: considered and the small graphs had proportionally few nodes ($\rho =
alpar@3: 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
alpar@4: orders (e.g. see the $\delta=10$ case on
alpar@4: Figure~\ref{fig:randIND10}). If these examples had been excluded, the
alpar@3: charts would have seemed to be similar to the other ones.
alpar@3: Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
alpar@3: Plus slow slightly down, but remain practically usable even on graphs
alpar@3: having 10 000 nodes.
alpar@2: 
alpar@2: 
alpar@2: 
alpar@2: 
alpar@3: 
alpar@2: \section{Conclusion}
alpar@3: In this paper, after providing a short summary of the recent
alpar@3: algorithms, a new graph matching algorithm based on VF2, called VF2++,
alpar@3: has been presented and analyzed from a practical viewpoint.
alpar@2: 
alpar@3: Recognizing the importance of the node order and determining an
alpar@3: efficient one, VF2++ is able to match graphs of thousands of nodes in
alpar@3: near practically linear time including preprocessing. In addition to
alpar@3: the proper order, VF2++ uses more efficient consistency and cutting
alpar@3: rules which are easy to compute and make the algorithm able to prune
alpar@3: most of the unfruitful branches without going astray.
alpar@2: 
alpar@3: In order to show the efficiency of the new method, it has been
alpar@3: compared to VF2 Plus, which is the best concurrent algorithm based on
alpar@3: \cite{VF2Plus}.
alpar@2: 
alpar@3: The experiments show that VF2++ consistently outperforms VF2 Plus on
alpar@3: biological graphs. It seems to be asymptotically faster on protein and
alpar@3: on contact map graphs in the case of induced subgraph isomorphism,
alpar@3: while in the case of graph isomorphism, it has definitely better
alpar@3: asymptotic behaviour on protein graphs.
alpar@2: 
alpar@3: Regarding random sparse graphs, not only has VF2++ proved itself to be
alpar@3: faster than VF2 Plus, but it has a practically linear behaviour both
alpar@3: in the case of induced subgraph- and graph isomorphism, as well.
alpar@2: 
alpar@2: 
alpar@0: 
alpar@0: %% The Appendices part is started with the command \appendix;
alpar@0: %% appendix sections are then done as normal sections
alpar@0: %% \appendix
alpar@0: 
alpar@0: %% \section{}
alpar@0: %% \label{}
alpar@0: 
alpar@0: %% If you have bibdatabase file and want bibtex to generate the
alpar@0: %% bibitems, please use
alpar@0: %%
alpar@3: \bibliographystyle{elsarticle-num} \bibliography{bibliography}
alpar@0: 
alpar@0: %% else use the following coding to input the bibitems directly in the
alpar@0: %% TeX file.
alpar@0: 
alpar@2: %% \begin{thebibliography}{00}
alpar@0: 
alpar@2: %% %% \bibitem{label}
alpar@2: %% %% Text of bibliographic item
alpar@0: 
alpar@2: %% \bibitem{}
alpar@0: 
alpar@2: %% \end{thebibliography}
alpar@2: 
alpar@0: \end{document}
alpar@0: \endinput
alpar@0: %%
alpar@0: %% End of file `elsarticle-template-num.tex'.