alpar@0: %% alpar@0: %% Copyright 2007, 2008, 2009 Elsevier Ltd alpar@0: %% alpar@0: %% This file is part of the 'Elsarticle Bundle'. alpar@0: %% --------------------------------------------- alpar@0: %% alpar@0: %% It may be distributed under the conditions of the LaTeX Project Public alpar@0: %% License, either version 1.2 of this license or (at your option) any alpar@0: %% later version. The latest version of this license is in alpar@0: %% http://www.latex-project.org/lppl.txt alpar@0: %% and version 1.2 or later is part of all distributions of LaTeX alpar@0: %% version 1999/12/01 or later. alpar@0: %% alpar@0: %% The list of all files belonging to the 'Elsarticle Bundle' is alpar@0: %% given in the file `manifest.txt'. alpar@0: %% alpar@0: alpar@0: %% Template article for Elsevier's document class `elsarticle' alpar@0: %% with numbered style bibliographic references alpar@0: %% SP 2008/03/01 alpar@0: alpar@0: \documentclass[preprint,12pt]{elsarticle} alpar@0: alpar@0: %% Use the option review to obtain double line spacing alpar@0: %% \documentclass[authoryear,preprint,review,12pt]{elsarticle} alpar@0: alpar@0: %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn alpar@0: %% for a journal layout: alpar@0: %% \documentclass[final,1p,times]{elsarticle} alpar@0: %% \documentclass[final,1p,times,twocolumn]{elsarticle} alpar@0: %% \documentclass[final,3p,times]{elsarticle} alpar@0: %% \documentclass[final,3p,times,twocolumn]{elsarticle} alpar@0: %% \documentclass[final,5p,times]{elsarticle} alpar@0: %% \documentclass[final,5p,times,twocolumn]{elsarticle} alpar@0: alpar@0: %% For including figures, graphicx.sty has been loaded in alpar@0: %% elsarticle.cls. If you prefer to use the old commands alpar@0: %% please give \usepackage{epsfig} alpar@0: alpar@0: %% The amssymb package provides various useful mathematical symbols alpar@0: \usepackage{amssymb} alpar@0: %% The amsthm package provides extended theorem environments alpar@0: %% \usepackage{amsthm} alpar@0: alpar@0: %% The lineno packages adds line numbers. Start line numbering with alpar@0: %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on alpar@0: %% for the whole article with \linenumbers. alpar@0: %% \usepackage{lineno} alpar@0: alpar@2: \usepackage{amsmath} alpar@2: %% \usepackage[pdftex]{graphicx} alpar@2: alpar@2: \usepackage{pgfplots} alpar@2: \pgfplotsset{width=9cm} alpar@2: \pgfplotsset{compat=1.8} alpar@2: alpar@2: \usepackage{caption} alpar@2: \usepackage{subcaption} alpar@2: alpar@2: \usepackage{algorithm} alpar@2: \usepackage{algpseudocode} alpar@2: \usepackage{tikz} alpar@2: alpar@2: \usepackage{amsthm,amssymb} alpar@2: \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}} alpar@2: alpar@2: \newtheorem{theorem}{Theorem}[subsection] alpar@2: \newtheorem{corollary}{Corollary}[theorem] alpar@2: \newtheorem{claim}[theorem]{Claim} alpar@2: alpar@2: \newtheorem{definition}{Definition}[subsection] alpar@2: \newtheorem{notation}{Notation}[subsection] alpar@2: \newtheorem{example}{Example}[subsection] alpar@2: \usetikzlibrary{decorations.markings} alpar@2: \let\oldproofname=\proofname alpar@2: %% \renewcommand{\proofname}{\rm\bf{Proof:}} alpar@2: Madarasi@7: \captionsetup{font=normalsize} Madarasi@7: alpar@1: \journal{Discrete Applied Mathematics} alpar@0: alpar@0: \begin{document} alpar@0: alpar@0: \begin{frontmatter} alpar@0: alpar@0: %% Title, authors and addresses alpar@0: alpar@0: %% use the tnoteref command within \title for footnotes; alpar@0: %% use the tnotetext command for theassociated footnote; alpar@0: %% use the fnref command within \author or \address for footnotes; alpar@0: %% use the fntext command for theassociated footnote; alpar@0: %% use the corref command within \author for corresponding author footnotes; alpar@0: %% use the cortext command for theassociated footnote; alpar@0: %% use the ead command for the email address, alpar@0: %% and the form \ead[url] for the home page: alpar@0: %% \title{Title\tnoteref{label1}} alpar@0: %% \tnotetext[label1]{} alpar@0: %% \author{Name\corref{cor1}\fnref{label2}} alpar@0: %% \ead{email address} alpar@0: %% \ead[url]{home page} alpar@0: %% \fntext[label2]{} alpar@0: %% \cortext[cor1]{} alpar@0: %% \address{Address\fnref{label3}} alpar@0: %% \fntext[label3]{} alpar@0: alpar@1: \title{Improved Algorithms for Matching Biological Graphs} alpar@0: alpar@0: %% use optional labels to link authors explicitly to addresses: alpar@0: %% \author[label1,label2]{} alpar@0: %% \address[label1]{} alpar@0: %% \address[label2]{} alpar@0: alpar@1: \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi} alpar@0: alpar@1: \address{Dept of Operations Research, ELTE} alpar@0: alpar@0: \begin{abstract} alpar@1: Subgraph isomorphism is a well-known NP-Complete problem, while its alpar@1: special case, the graph isomorphism problem is one of the few problems alpar@1: in NP neither known to be in P nor NP-Complete. Their appearance in alpar@1: many fields of application such as pattern analysis, computer vision alpar@1: questions and the analysis of chemical and biological systems has alpar@1: fostered the design of various algorithms for handling special graph alpar@1: structures. alpar@0: alpar@1: This paper presents VF2++, a new algorithm based on the original VF2, alpar@1: which runs significantly faster on most test cases and performs alpar@1: especially well on special graph classes stemming from biological alpar@1: questions. VF2++ handles graphs of thousands of nodes in practically alpar@1: near linear time including preprocessing. Not only is it an improved alpar@1: version of VF2, but in fact, it is by far the fastest existing Madarasi@19: algorithm especially on biological graphs. alpar@1: alpar@1: The reason for VF2++' superiority over VF2 is twofold. Firstly, taking alpar@1: into account the structure and the node labeling of the graph, VF2++ alpar@1: determines a state order in which most of the unfruitful branches of alpar@1: the search space can be pruned immediately. Secondly, introducing more alpar@1: efficient - nevertheless still easier to compute - cutting rules alpar@1: reduces the chance of going astray even further. alpar@1: alpar@1: In addition to the usual subgraph isomorphism, specialized versions alpar@1: for induced subgraph isomorphism and for graph isomorphism are alpar@1: presented. VF2++ has gained a runtime improvement of one order of alpar@1: magnitude respecting induced subgraph isomorphism and a better alpar@1: asymptotical behaviour in the case of graph isomorphism problem. alpar@1: alpar@1: After having provided the description of VF2++, in order to evaluate alpar@1: its effectiveness, an extensive comparison to the contemporary other alpar@1: algorithms is shown, using a wide range of inputs, including both real alpar@1: life biological and chemical datasets and standard randomly generated alpar@1: graph series. alpar@1: alpar@1: The work was motivated and sponsored by QuantumBio Inc., and all the alpar@1: developed algorithms are available as the part of the open source alpar@1: LEMON graph and network optimization library alpar@1: (http://lemon.cs.elte.hu). alpar@0: \end{abstract} alpar@0: alpar@0: \begin{keyword} alpar@0: %% keywords here, in the form: keyword \sep keyword alpar@0: alpar@0: %% PACS codes here, in the form: \PACS code \sep code alpar@0: alpar@0: %% MSC codes here, in the form: \MSC code \sep code alpar@0: %% or \MSC[2008] code \sep code (2000 is the default) alpar@0: alpar@0: \end{keyword} alpar@0: alpar@0: \end{frontmatter} alpar@0: alpar@0: %% \linenumbers alpar@0: alpar@0: %% main text alpar@2: \section{Introduction} alpar@2: \label{sec:intro} alpar@2: alpar@3: In the last decades, combinatorial structures, and especially graphs alpar@3: have been considered with ever increasing interest, and applied to the alpar@3: solution of several new and revised questions. The expressiveness, alpar@3: the simplicity and the studiedness of graphs make them practical for alpar@3: modelling and appear constantly in several seemingly independent Madarasi@19: fields, such as bioinformatics and chemistry. alpar@2: alpar@3: Complex biological systems arise from the interaction and cooperation alpar@3: of plenty of molecular components. Getting acquainted with such Madarasi@19: systems at the molecular level is of primary importance, since alpar@3: protein-protein interaction, DNA-protein interaction, metabolic alpar@3: interaction, transcription factor binding, neuronal networks, and Madarasi@19: hormone signaling networks can be understood this way. alpar@2: Madarasi@19: Many chemical and biological structures can easily be modeled Madarasi@19: as graphs, for instance, a molecular structure can be Madarasi@19: considered as a graph, whose nodes correspond to atoms and whose Madarasi@19: edges to chemical bonds. The similarity and dissimilarity of Madarasi@19: objects corresponding to nodes are incorporated to the model Madarasi@19: by \emph{node labels}. Understanding such networks basically Madarasi@19: requires finding specific subgraphs, thus calls for efficient Madarasi@19: graph matching algorithms. alpar@2: Madarasi@19: Other real-world fields related to some Madarasi@19: variants of graph matching include pattern recognition alpar@3: and machine vision \cite{HorstBunkeApplications}, symbol recognition alpar@3: \cite{CordellaVentoSymbolRecognition}, face identification alpar@3: \cite{JianzhuangYongFaceIdentification}. \\ alpar@2: alpar@3: Subgraph and induced subgraph matching problems are known to be alpar@3: NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is alpar@3: one of the few problems in NP neither known to be in P nor alpar@3: NP-Complete. Although polynomial time isomorphism algorithms are known alpar@3: for various graph classes, like trees and planar alpar@3: graphs\cite{PlanarGraphIso}, bounded valence alpar@3: graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso} Madarasi@19: or permutation graphs\cite{PermGraphIso}, and recently, an FPT algorithm has been presented for the coloured hypergraph isomorphism problem in \cite{ColoredHiperGraphIso}. alpar@2: alpar@3: In the following, some algorithms based on other approaches are Madarasi@19: summarized, which do not need any restrictions on the graphs. Even though, alpar@3: an overall polynomial behaviour is not expectable from such an Madarasi@19: alternative, it may often have good practical performance, in fact, Madarasi@19: it might be the best choice even on a graph class for which polynomial Madarasi@19: algorithm is known. alpar@2: alpar@3: The first practically usable approach was due to Madarasi@19: \emph{Ullmann}\cite{Ullmann} which is a commonly used depth-first alpar@3: search based algorithm with a complex heuristic for reducing the alpar@3: number of visited states. A major problem is its $\Theta(n^3)$ space alpar@3: complexity, which makes it impractical in the case of big sparse alpar@3: graphs. alpar@2: alpar@4: In a recent paper, Ullmann\cite{UllmannBit} presents an alpar@3: improved version of this algorithm based on a bit-vector solution for alpar@3: the binary Constraint Satisfaction Problem. alpar@2: Madarasi@19: The \emph{Nauty} algorithm\cite{Nauty} transforms the two graphs to alpar@3: a canonical form before starting to check for the isomorphism. It has alpar@3: been considered as one of the fastest graph isomorphism algorithms, alpar@3: although graph categories were shown in which it takes exponentially alpar@3: many steps. This algorithm handles only the graph isomorphism problem. alpar@2: alpar@4: The \emph{LAD} algorithm\cite{Lad} uses a depth-first search alpar@3: strategy and formulates the matching as a Constraint Satisfaction alpar@3: Problem to prune the search tree. The constraints are that the mapping alpar@3: has to be injective and edge-preserving, hence it is possible to alpar@3: handle new matching types as well. alpar@2: Madarasi@19: The \emph{RI} algorithm\cite{RI} and its variations are based on a alpar@3: state space representation. After reordering the nodes of the graphs, alpar@3: it uses some fast executable heuristic checks without using any alpar@3: complex pruning rules. It seems to run really efficiently on graphs alpar@3: coming from biology, and won the International Contest on Pattern alpar@3: Search in Biological Databases\cite{Content}. alpar@2: alpar@3: The currently most commonly used algorithm is the Madarasi@19: \emph{VF2}\cite{VF2}, the improved version of \emph{VF}\cite{VF}, which was alpar@3: designed for solving pattern matching and computer vision problems, alpar@3: and has been one of the best overall algorithms for more than a alpar@3: decade. Although, it can't be up to new specialized algorithms, it is alpar@3: still widely used due to its simplicity and space efficiency. VF2 uses alpar@3: a state space representation and checks some conditions in each state alpar@3: to prune the search tree. alpar@2: Madarasi@19: Meanwhile, another variant called \emph{VF2 Plus}\cite{VF2Plus} has alpar@3: been published. It is considered to be as efficient as the RI alpar@3: algorithm and has a strictly better behavior on large graphs. The alpar@3: main idea of VF2 Plus is to precompute a heuristic node order of the alpar@3: small graph, in which the VF2 works more efficiently. alpar@2: Madarasi@19: This paper introduces \emph{VF2++}, a new further improved algorithm Madarasi@19: for the graph and (induced)subgraph isomorphism problem, which uses Madarasi@19: efficient cutting rules and determines a node order in which VF2 runs Madarasi@19: significantly faster on practical inputs. Madarasi@19: Madarasi@19: This project was initiated and sponsored by QuantumBio Madarasi@19: Inc.\cite{QUANTUMBIO} and the implementation --- along with a source Madarasi@19: code --- has been published as a part of LEMON\cite{LEMON} open source Madarasi@19: graph library. Madarasi@19: alpar@2: \section{Problem Statement} Madarasi@19: This section provides a formal description of the problems to be alpar@3: solved. alpar@2: \subsection{Definitions} alpar@2: Madarasi@19: Throughout the paper $G_{1}=(V_{1}, E_{1})$ and Madarasi@19: $G_{2}=(V_{2}, E_{2})$ denote two undirected graphs. Madarasi@19: Madarasi@19: \begin{definition} Madarasi@19: $\mathcal{L}: (V_{1}\cup V_{2}) \longrightarrow K$ is a \textbf{node Madarasi@19: label function}, where K is an arbitrary set. The elements in K Madarasi@19: are the \textbf{node labels}. Two nodes, u and v are said to be Madarasi@19: \textbf{equivalent} if $\mathcal{L}(u)=\mathcal{L}(v)$. Madarasi@19: \end{definition} Madarasi@19: Madarasi@19: For the sake of simplicity, in this paper the graph, subgraph and induced subgraph isomorphisms are defined in a more general way. Madarasi@19: alpar@2: \begin{definition}\label{sec:ismorphic} Madarasi@19: $G_{1}$ and $G_{2}$ are \textbf{isomorphic} (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}: Madarasi@19: V_{1} \longrightarrow V_{2}$ bijection, for which the alpar@3: following is true: alpar@2: \begin{center} Madarasi@20: $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and\\ Madarasi@19: $\forall u,v\in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in{E_{2}}$ alpar@2: \end{center} alpar@2: \end{definition} Madarasi@19: alpar@2: \begin{definition} Madarasi@19: $G_{1}$ is a \textbf{subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}: Madarasi@19: V_{1}\longrightarrow V_{2}$ injection, for which the alpar@3: following is true: alpar@2: \begin{center} Madarasi@20: $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and\\ Madarasi@19: $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Rightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$ alpar@2: \end{center} alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} Madarasi@19: $G_{1}$ is an \textbf{induced subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists Madarasi@19: \mathfrak{m}: V_{1}\longrightarrow V_{2}$ injection, for which the alpar@3: following is true: alpar@2: \begin{center} Madarasi@19: $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and Madarasi@19: Madarasi@19: $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow Madarasi@19: (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$ alpar@2: \end{center} alpar@2: \end{definition} alpar@2: alpar@2: alpar@2: \subsection{Common problems}\label{sec:CommProb} alpar@2: alpar@3: The focus of this paper is on two extensively studied topics, the alpar@3: subgraph isomorphism and its variations. However, the following alpar@3: problems also appear in many applications. alpar@2: alpar@3: The \textbf{subgraph matching problem} is the following: is Madarasi@19: $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node alpar@3: label? alpar@2: alpar@3: The \textbf{induced subgraph matching problem} asks the same about the alpar@3: existence of an induced subgraph. alpar@2: alpar@3: The \textbf{graph isomorphism problem} can be defined as induced alpar@3: subgraph matching problem where the sizes of the two graphs are equal. alpar@2: alpar@3: In addition to existence, it may be needed to show such a subgraph, or alpar@3: it may be necessary to list all of them. alpar@2: alpar@3: It should be noted that some authors misleadingly refer to the term alpar@3: \emph{subgraph isomorphism problem} as an \emph{induced subgraph alpar@3: isomorphism problem}. alpar@2: alpar@3: The following sections give the descriptions of VF2, VF2++, VF2 Plus alpar@3: and a particular comparison. alpar@2: alpar@2: \section{The VF2 Algorithm} alpar@3: This algorithm is the basis of both the VF2++ and the VF2 Plus. VF2 alpar@4: is able to handle all the variations mentioned in Section alpar@4: \ref{sec:CommProb}. Although it can also handle directed graphs, alpar@3: for the sake of simplicity, only the undirected case will be alpar@3: discussed. alpar@2: alpar@2: alpar@2: \subsection{Common notations} Madarasi@19: \indent Assume $G_{1}$ is searched in $G_{2}$. The following alpar@3: definitions and notations will be used throughout the whole paper. alpar@2: \begin{definition} Madarasi@19: An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$. Madarasi@19: \end{definition} Madarasi@19: Madarasi@19: \begin{notation} Madarasi@19: $\mathfrak{D}(f)$ and $\mathfrak{R}(f)$ denote the domain and the range of a function $f$, respectively. Madarasi@19: \end{notation} Madarasi@19: Madarasi@19: \begin{definition} Madarasi@19: Mapping $\mathfrak{m}$ \textbf{covers} a node $u\in V_1\cup V_2$ if $u\in \mathfrak{D}(\mathfrak{m})\cup \mathfrak{R}(\mathfrak{m})$. alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} Madarasi@19: A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the Madarasi@19: nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$. alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} Madarasi@20: Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : \mathfrak{m}(w)=f(w)$ and $f(u)=v$ holds. Where $u\in V_1\setminus\mathfrak{D}(\mathfrak{m})$ and $v\in V_2\setminus\mathfrak{R}(\mathfrak{m})$, otherwise $extend(\mathfrak{m},(u,v))$ is undefined. alpar@2: \end{definition} alpar@2: alpar@2: \begin{notation} Madarasi@19: Throughout the paper, $\mathbf{PT}$ denotes a generic problem type Madarasi@19: which can be substituted by any of the $\mathbf{ISO}$, $\mathbf{SUB}$ Madarasi@19: and $\mathbf{IND}$ problems. alpar@2: \end{notation} alpar@2: alpar@2: \begin{definition} Madarasi@19: Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a Madarasi@17: \textbf{consistency function by } $\mathbf{PT}$ if the following Madarasi@19: holds. If there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$. alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} Madarasi@19: Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a Madarasi@17: \textbf{cutting function by } $\mathbf{PT}$ if the following Madarasi@19: holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$. alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} Madarasi@19: $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if Madarasi@19: $Cons_{PT}(\mathfrak{m})$ is true. alpar@2: \end{definition} alpar@2: alpar@2: $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form. alpar@2: \begin{notation} Madarasi@19: Let $\mathbf{Cons_{PT}(p, \mathfrak{m})}:=Cons_{PT}(extend(\mathfrak{m},p))$, and Madarasi@19: $\mathbf{Cut_{PT}(p, \mathfrak{m})}:=Cut_{PT}(extend(\mathfrak{m},p))$, where Madarasi@19: $p\in{V_{1}\backslash\mathfrak{D}(\mathfrak{m}) \!\times\!V_{2}\backslash\mathfrak{R}(\mathfrak{m})}$. alpar@2: \end{notation} alpar@2: alpar@3: $Cons_{PT}$ will be used to check the consistency of the already alpar@3: covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if alpar@3: no whole consistent mapping can contain the current mapping. alpar@2: alpar@2: \subsection{Overview of the algorithm} alpar@3: VF2 uses a state space representation of mappings, $Cons_{PT}$ for alpar@3: excluding inconsistency with the problem type and $Cut_{PT}$ for Madarasi@19: pruning the search tree. alpar@2: alpar@4: Algorithm~\ref{alg:VF2Pseu} is a high level description of Madarasi@19: the VF2 matching algorithm. Each state of the matching process can Madarasi@19: be associated with a mapping $\mathfrak{m}$. The initial state Madarasi@19: is associated with a mapping $\mathfrak{m}$, for which Madarasi@19: $\mathfrak{D}(\mathfrak{m})=\emptyset$, i.e. it starts with an empty mapping. alpar@2: alpar@2: alpar@2: \begin{algorithm} Madarasi@13: \algtext*{EndIf}%ne nyomtasson end if-et Madarasi@13: \algtext*{EndFor}%ne Madarasi@13: \algtext*{EndProcedure}%ne nyomtasson .. alpar@2: \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu} alpar@2: \begin{algorithmic}[1] alpar@2: Madarasi@19: \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$} Madarasi@19: \If{$\mathfrak{m}$ covers Madarasi@19: $V_{1}$} \State Output($\mathfrak{m}$) Madarasi@19: \Else Madarasi@19: \State Compute the set $P_\mathfrak{m}$ of the pairs candidate for inclusion Madarasi@19: in $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$ Madarasi@19: $\neg$Cut$_{PT}$($p,\mathfrak{m}$)} Madarasi@19: \State \textbf{call} Madarasi@19: VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure alpar@2: \end{algorithmic} alpar@2: \end{algorithm} alpar@2: alpar@2: Madarasi@19: For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of Madarasi@19: candidate node pairs for adding to the current mapping $\mathfrak{m}_s$. alpar@2: Madarasi@19: For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and Madarasi@19: $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and Madarasi@19: the latter is false, the whole process is recursively applied to Madarasi@19: $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it Madarasi@19: can be proved that $\mathfrak{m}$ can not be extended to a whole mapping. alpar@2: Madarasi@11: In order to make sure of the correctness, see alpar@2: \begin{claim} alpar@3: Through consistent mappings, only consistent whole mappings can be Madarasi@19: reached, and all the consistent whole mappings are reachable through alpar@3: consistent mappings. alpar@2: \end{claim} alpar@2: Madarasi@19: Note that a mapping may be reached in exponentially many different ways, since the Madarasi@19: order of extensions does not influence the nascent mapping. alpar@2: alpar@2: However, one may observe alpar@2: alpar@2: \begin{claim} alpar@2: \label{claim:claimTotOrd} Madarasi@19: Let $\prec$ be an arbitrary total ordering relation on $V_{1}$. If Madarasi@19: the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which alpar@2: \begin{center} Madarasi@19: $\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$, alpar@2: \end{center} Madarasi@19: then no mapping can be reached more than once, and each whole mapping remains reachable. alpar@2: \end{claim} alpar@2: alpar@3: Note that the cornerstone of the improvements to VF2 is a proper alpar@3: choice of a total ordering. alpar@2: Madarasi@19: \subsection{The candidate set} alpar@2: \label{candidateComputingVF2} Madarasi@19: Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$. alpar@2: alpar@2: \begin{notation} Madarasi@19: Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and Madarasi@19: $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$. alpar@2: \end{notation} alpar@2: Madarasi@19: The set $P_\mathfrak{m}$ includes the pairs of uncovered neighbours of covered Madarasi@17: nodes, and if there is not such a node pair, all the pairs containing alpar@3: two uncovered nodes are added. Formally, let alpar@2: \[ Madarasi@19: P_\mathfrak{m}\!=\! alpar@2: \begin{cases} Madarasi@19: T_{1}(\mathfrak{m})\times T_{2}(\mathfrak{m})&\hspace{-0.15cm}\text{if } Madarasi@19: T_{1}(\mathfrak{m})\!\neq\!\emptyset\ \text{and }T_{2}(\mathfrak{m})\!\neq Madarasi@19: \emptyset,\\ (V_{1}\!\setminus\!\mathfrak{D}(\mathfrak{m}))\!\times\!(V_{2}\!\setminus\!\mathfrak{R}(\mathfrak{m})) Madarasi@19: &\hspace{-0.15cm}\text{otherwise}. alpar@2: \end{cases} alpar@2: \] alpar@2: alpar@2: \subsection{Consistency} Madarasi@19: Suppose $p=(u,v)$, where $u\in V_{1}$ and $v\in V_{2}$, $\mathfrak{m}$ is a consistent mapping by Madarasi@19: $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether Madarasi@19: including pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$. Madarasi@15: Madarasi@15: For example, the consistency function of induced subgraph isomorphism is as follows. alpar@2: \begin{notation} Madarasi@19: Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} : Madarasi@19: (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2} Madarasi@19: (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$. alpar@2: \end{notation} alpar@2: Madarasi@19: $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ $\Leftrightarrow Madarasi@19: (\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1} Madarasi@19: \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. The alpar@3: following formulation gives an efficient way of calculating alpar@3: $Cons_{IND}$. alpar@2: \begin{claim} Madarasi@19: $Cons_{IND}((u,v),\mathfrak{m}):=\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge Madarasi@19: (\forall \tilde{u}\in \Gamma_{1}(u) Madarasi@19: \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is a alpar@3: consistency function in the case of $IND$. alpar@2: \end{claim} alpar@2: alpar@2: \subsection{Cutting rules} Madarasi@19: $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently Madarasi@19: verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can Madarasi@19: be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a alpar@3: whole mapping. Madarasi@15: Madarasi@15: As an example, the cutting function of induced subgraph isomorphism is presented. alpar@2: \begin{notation} Madarasi@19: Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash Madarasi@19: \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and Madarasi@19: \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash Madarasi@19: \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$. alpar@2: \end{notation} Madarasi@15: alpar@2: \begin{claim} Madarasi@19: $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| < Madarasi@19: |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap Madarasi@19: \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap Madarasi@19: \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function by $IND$. alpar@2: \end{claim} alpar@2: alpar@2: \section{The VF2++ Algorithm} alpar@3: Although any total ordering relation makes the search space of VF2 a alpar@3: tree, its choice turns out to dramatically influence the number of alpar@3: visited states. The goal is to determine an efficient one as quickly alpar@3: as possible. alpar@2: alpar@3: The main reason for VF2++' superiority over VF2 is twofold. Firstly, alpar@3: taking into account the structure and the node labeling of the graph, alpar@3: VF2++ determines a state order in which most of the unfruitful alpar@3: branches of the search space can be pruned immediately. Secondly, alpar@3: introducing more efficient --- nevertheless still easier to compute alpar@3: --- cutting rules reduces the chance of going astray even further. alpar@2: alpar@3: In addition to the usual subgraph isomorphism, specialized versions alpar@3: for induced subgraph isomorphism and for graph isomorphism have been alpar@3: designed. VF2++ has gained a runtime improvement of one order of alpar@3: magnitude respecting induced subgraph isomorphism and a better alpar@3: asymptotical behaviour in the case of graph isomorphism problem. alpar@2: alpar@3: Note that a weaker version of the cutting rules and the more efficient alpar@3: candidate set calculating were described in \cite{VF2Plus}, too. alpar@2: alpar@3: It should be noted that all the methods described in this section are Madarasi@19: extendable to handle directed graphs and edge labels as well.\newline alpar@2: alpar@3: The basic ideas and the detailed description of VF2++ are provided in alpar@3: the following. alpar@2: Madarasi@19: The goal is to find a matching order in which the algorithm is able to Madarasi@19: recognize inconsistency or prune the infeasible branches on the Madarasi@19: highest levels and goes deep only if it is needed. Madarasi@19: Madarasi@19: \begin{notation} Madarasi@19: Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{1}(u)\cap H\}|$, that is the Madarasi@19: number of neighbours of u which are in H, where $u\in V_{1} $ and Madarasi@19: $H\subseteq V_{1}$. Madarasi@19: \end{notation} Madarasi@19: Madarasi@19: The principal question is the following. Suppose a mapping $\mathfrak{m}$ is Madarasi@19: given. For which node of $T_{1}(\mathfrak{m})$ is the hardest to find a Madarasi@19: consistent pair in $G_{2}$? The more covered neighbours a node in Madarasi@19: $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has Madarasi@19: ---, the more rarely satisfiable consistency constraints for its pair Madarasi@19: are given. Madarasi@19: Madarasi@19: In biology, most of the graphs are sparse, thus several nodes in Madarasi@19: $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes Madarasi@19: reasonable to define a secondary and a tertiary order between them. Madarasi@19: The observation above proves itself to be as determining, that the Madarasi@19: secondary ordering prefers nodes with the most uncovered neighbours Madarasi@19: among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase Madarasi@19: $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes so much, as possible. The Madarasi@19: tertiary ordering prefers nodes having the rarest uncovered labels. Madarasi@19: Madarasi@19: Note that the secondary ordering is the same as the ordering by $deg$, Madarasi@19: which is a static data in front of the above used. Madarasi@19: Madarasi@19: These rules can easily result in a matching order which contains the Madarasi@19: nodes of a long path successively, whose nodes may have low $Conn$ and Madarasi@19: is easily matchable into $G_{2}$. To avoid that, a BFS order is Madarasi@19: used, which provides the shortest possible paths. Madarasi@19: \newline Madarasi@19: Madarasi@19: In the following, some examples on which the VF2 may be slow are Madarasi@19: described, although they are easily solvable by using a proper Madarasi@19: matching order. Madarasi@19: Madarasi@19: \begin{example} Madarasi@19: Suppose $G_{1}$ can be mapped into $G_{2}$ in many ways Madarasi@19: without node labels. Let $u\in V_{1}$ and $v\in V_{2}$. Madarasi@19: \newline Madarasi@19: $\mathcal{L}(u):=black$ Madarasi@19: \newline Madarasi@19: $\mathcal{L}(v):=black$ Madarasi@19: \newline Madarasi@19: $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in (V_{1}\backslash Madarasi@19: \{u\})$ Madarasi@19: \newline Madarasi@19: $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in (V_{2}\backslash Madarasi@19: \{v\})$ Madarasi@19: \newline Madarasi@19: Madarasi@19: Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since Madarasi@19: $u$ is black and no node in $V_{2}$ has a black label except Madarasi@19: $v$. If unfortunately $u$ were the last node which will get covered, Madarasi@19: VF2 would check only in the last steps, whether $u$ can be matched to Madarasi@19: $v$. Madarasi@19: \newline Madarasi@19: However, had $u$ been the first matched node, u would have been Madarasi@19: matched immediately to v, so all the mappings would have been Madarasi@19: precluded in which node labels can not correspond. Madarasi@19: \end{example} Madarasi@19: Madarasi@19: \begin{example} Madarasi@19: Suppose there is no node label given, $G_{1}$ is a small graph and Madarasi@19: can not be mapped into $G_{2}$ and $u\in V_{1}$. Madarasi@19: \newline Madarasi@19: Let $G'_{1}:=(V_{1}\cup Madarasi@19: \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup Madarasi@19: \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is, Madarasi@19: $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint Madarasi@19: from $G_{1}$ and one of its starting points is connected to $u\in Madarasi@19: V_{1}\}$. Madarasi@19: \newline Madarasi@19: Is there a subgraph of $G_{2}$, which is isomorph with Madarasi@19: $G'_{1}$? Madarasi@19: \newline Madarasi@19: If unfortunately the nodes of the path were the first $k$ nodes in the Madarasi@19: matching order, the algorithm would iterate through all the possible k Madarasi@19: long paths in $G_{2}$, and it would recognize that no path can be Madarasi@19: extended to $G'_{1}$. Madarasi@19: \newline Madarasi@19: However, had it started by the matching of $G_{1}$, it would not Madarasi@19: have matched any nodes of the path. Madarasi@19: \end{example} Madarasi@19: Madarasi@19: These examples may look artificial, but the same problems also appear Madarasi@19: in real-world instances, even though in a less obvious way. Madarasi@19: alpar@2: \subsection{Preparations} alpar@2: \begin{claim} alpar@2: \label{claim:claimCoverFromLeft} alpar@3: The total ordering relation uniquely determines a node order, in which Madarasi@19: the nodes of $V_{1}$ will be covered by VF2. From the point of alpar@3: view of the matching procedure, this means, that always the same node Madarasi@19: of $G_{1}$ will be covered on the d-th level. alpar@2: \end{claim} alpar@2: alpar@2: \begin{definition} Madarasi@19: An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of Madarasi@19: $V_{1}$ is \textbf{matching order} if exists $\prec$ total alpar@3: ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds Madarasi@19: pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$. alpar@2: \end{definition} alpar@2: alpar@2: \begin{claim}\label{claim:MOclaim} Madarasi@17: A total ordering is matching order iff the nodes of every component alpar@3: form an interval in the node sequence, and every node connects to a Madarasi@17: previous node in its component except the first node of each component. alpar@2: \end{claim} alpar@2: alpar@3: To summing up, a total ordering always uniquely determines a matching alpar@3: order, and every matching order can be determined by a total ordering, alpar@3: however, more than one different total orderings may determine the alpar@3: same matching order. alpar@2: alpar@2: \subsection{Total ordering} Madarasi@19: The matching order will be searched directly. alpar@2: \begin{notation} Madarasi@19: Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} : Madarasi@19: l=\mathcal{L}(v)\}|-|\{u\in V_{1}\backslash \mathcal{M} : l=\mathcal{L}(u)\}|$ , Madarasi@19: where $l$ is a label and $\mathcal{M}\subseteq V_{1}$. alpar@2: \end{notation} alpar@2: Madarasi@17: \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$. alpar@2: \end{definition} alpar@2: alpar@2: \begin{algorithm} Madarasi@8: \algtext*{EndIf} Madarasi@8: \algtext*{EndProcedure} alpar@2: \algtext*{EndWhile} Madarasi@13: \algtext*{EndFor} alpar@2: \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu} alpar@2: \begin{algorithmic}[1] alpar@3: \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$ Madarasi@19: \Comment{matching order} \While{$V_{1}\backslash \mathcal{M} alpar@3: \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg Madarasi@19: min$_{F_\mathcal{M}\circ \mathcal{L}}(V_{1}\backslash alpar@3: \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with alpar@3: root node $r$. \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the alpar@3: $d$-th level \State Process $V_d$ \Comment{See Algorithm Madarasi@8: \ref{alg:VF2PPProcess1}} \EndFor alpar@3: \EndWhile \EndProcedure alpar@2: \end{algorithmic} alpar@2: \end{algorithm} alpar@2: alpar@2: \begin{algorithm} Madarasi@8: \algtext*{EndIf} Madarasi@8: \algtext*{EndProcedure}%ne nyomtasson .. alpar@2: \algtext*{EndWhile} Madarasi@8: \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1} alpar@2: \begin{algorithmic}[1] Madarasi@17: \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$} Madarasi@19: \State $m\in$ arg min$_{F_\mathcal{M}\circ\ \mathcal{L}}($ arg max$_{deg}($arg alpar@3: max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$ alpar@3: \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh alpar@3: $F_\mathcal{M}$ \EndWhile \EndProcedure alpar@2: \end{algorithmic} alpar@2: \end{algorithm} alpar@2: alpar@4: Algorithm~\ref{alg:VF2PPPseu} is a high level description of the alpar@4: matching order procedure of VF2++. It computes a BFS tree for each Madarasi@19: component in ascending order of their rarest node labels and largest $deg$, alpar@4: whose root vertex is the component's minimal Madarasi@8: node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending Madarasi@8: lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately Madarasi@8: to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately. alpar@2: alpar@4: Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu} alpar@4: provides a matching order. alpar@2: alpar@2: alpar@2: \subsection{Cutting rules} alpar@2: \label{VF2PPCuttingRules} Madarasi@19: This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels. alpar@2: \begin{notation} Madarasi@19: Let $\mathbf{\Gamma_{1}^{l}(u)}:=\{\tilde{u} : \mathcal{L}(\tilde{u})=l Madarasi@19: \wedge \tilde{u}\in \Gamma_{1} (u)\}$ and Madarasi@19: $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge Madarasi@19: \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in Madarasi@19: V_{2}$ and $l$ is a label. alpar@2: \end{notation} alpar@2: Madarasi@19: \subsubsection{Induced subgraph isomorphism} alpar@2: \begin{claim} Madarasi@19: \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by IND. Madarasi@19: \end{claim} Madarasi@19: \subsubsection{Graph isomorphism} Madarasi@19: \begin{claim} Madarasi@19: \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by ISO. alpar@2: \end{claim} Madarasi@13: Madarasi@19: \subsubsection{Subgraph isomorphism} Madarasi@19: \begin{claim} Madarasi@19: \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function by SUB. Madarasi@19: \end{claim} alpar@2: Madarasi@19: Madarasi@19: Madarasi@19: \section{Implementation details} alpar@3: This section provides a detailed summary of an efficient alpar@3: implementation of VF2++. alpar@2: \subsubsection{Storing a mapping} alpar@3: After fixing an arbitrary node order ($u_0, u_1, .., Madarasi@19: u_{|G_{1}|-1}$) of $G_{1}$, an array $M$ is usable to store alpar@3: the current mapping in the following way. alpar@2: \[ alpar@3: M[i] = alpar@2: \begin{cases} Madarasi@19: v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D & Madarasi@17: if\ no\ node\ has\ been\ mapped\ to\ u_i, alpar@2: \end{cases} alpar@2: \] Madarasi@19: where $i\in\{0,1, ..,|G_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$ alpar@3: means "no node". alpar@2: \subsubsection{Avoiding the recurrence} alpar@4: The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized Madarasi@9: as a \textit{while loop}, which has a loop counter $depth$ denoting the Madarasi@9: all-time depth of the recursion. Fixing a matching order, let $M$ Madarasi@9: denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft}, Madarasi@19: $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before Madarasi@9: $depth$. $M[depth]$ changes alpar@3: while the state is being processed, but the property is held before alpar@3: both stepping back to a predecessor state and exploring a successor alpar@3: state. alpar@2: alpar@3: The necessary part of the candidate set is easily maintainable or alpar@3: computable by following alpar@4: Section~\ref{candidateComputingVF2}. A much faster method alpar@3: has been designed for biological- and sparse graphs, see the next alpar@3: section for details. alpar@2: alpar@2: \subsubsection{Calculating the candidates for a node} alpar@4: Being aware of Claim~\ref{claim:claimCoverFromLeft}, the alpar@3: task is not to maintain the candidate set, but to generate the Madarasi@19: candidate nodes in $G_{2}$ for a given node $u\in V_{1}$. In Madarasi@20: case of any of the three problem types and a mapping $\mathfrak{m}$, if a node $v\in Madarasi@19: V_{2}$ is a potential pair of $u\in V_{1}$, then $\forall Madarasi@20: u'\in \mathfrak{D}(\mathfrak{m}) : (u,u')\in Madarasi@20: E_{1}\Rightarrow (v,\mathfrak{m}(u'))\in Madarasi@19: E_{2}$. That is, each covered neighbour of $u$ has to be mapped to alpar@3: a covered neighbour of $v$. alpar@2: alpar@3: Having said that, an algorithm running in $\Theta(deg)$ time is alpar@3: describable if there exists a covered node in the component containing Madarasi@17: $u$, and a linear one otherwise. alpar@2: alpar@2: alpar@2: \subsubsection{Determining the node order} alpar@3: This section describes how the node order preprocessing method of alpar@3: VF2++ can efficiently be implemented. alpar@2: alpar@3: For using lookup tables, the node labels are associated with the alpar@3: numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It Madarasi@9: enables $F_\mathcal{M}$ to be stored in an array. At first, the node order alpar@3: $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes Madarasi@19: in $V_{1}$ having label $i$, which is easy to compute in Madarasi@19: $\Theta(|V_{1}|)$ steps. alpar@2: Madarasi@19: Representing $\mathcal{M}\subseteq V_{1}$ as an array of Madarasi@19: size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes. alpar@2: alpar@2: \subsubsection{Cutting rules} alpar@4: In Section~\ref{VF2PPCuttingRules}, the cutting rules were Madarasi@19: described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$ Madarasi@19: and $\tilde T_{2}$, which are dependent on the all-time mapping alpar@3: (i.e. on the all-time state). The aim is to check the labeled cutting alpar@3: rules of VF2++ in $\Theta(deg)$ time. alpar@2: alpar@3: Firstly, suppose that these four sets are given in such a way, that alpar@3: checking whether a node is in a certain set takes constant time, alpar@3: e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an alpar@3: initially zero integer lookup table of size $|K|$. After incrementing Madarasi@19: $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and Madarasi@19: decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap Madarasi@19: T_{2}(s)$, the first part of the cutting rules is checkable in alpar@3: $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$ alpar@3: to zero takes $\Theta(deg)$ time again, which makes it possible to use Madarasi@9: the same table through the whole algorithm. The second part of the alpar@3: cutting rules can be verified using the same method with $\tilde Madarasi@19: T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and Madarasi@19: $T_{2}$. Thus, the overall complexity is $\Theta(deg)$. alpar@2: Madarasi@19: Another integer lookup table storing the number of covered neighbours Madarasi@19: of each node in $G_{2}$ gives all the information about the sets Madarasi@19: $T_{2}$ and $\tilde T_{2}$, which is maintainable in alpar@3: $\Theta(deg)$ time when a pair is added or substracted by incrementing alpar@3: or decrementing the proper indices. A further improvement is that the Madarasi@19: values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on Madarasi@19: $u$, i.e. on the size of the mapping, so for each $u\in V_{1}$ an alpar@3: array of pairs (label, number of such labels) can be stored to skip alpar@3: the maintaining operations. Note that these arrays are at most of size Madarasi@19: $deg$. alpar@2: Madarasi@19: Using similar techniques, the consistency function can be evaluated in alpar@3: $\Theta(deg)$ steps, as well. alpar@2: Madarasi@19: \section{Experimental results} Madarasi@19: This section compares the performance of VF2++ and VF2 Plus. According to Madarasi@19: our experience, both algorithms run faster than VF2 with orders of Madarasi@19: magnitude, thus its inclusion was not reasonable. alpar@2: Madarasi@19: The algorithms were implemented in C++ using the open source Madarasi@19: LEMON graph and network optimization library\cite{LEMON}. The test were carried out on a linux based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM. alpar@2: \subsection{Biological graphs} alpar@3: The tests have been executed on a recent biological dataset created alpar@3: for the International Contest on Pattern Search in Biological Madarasi@7: Databases\cite{Content}, which has been constructed of molecule, Madarasi@7: protein and contact map graphs extracted from the Protein Data alpar@3: Bank\cite{ProteinDataBank}. alpar@2: alpar@3: The molecule dataset contains small graphs with less than 100 nodes alpar@3: and an average degree of less than 3. The protein dataset contains alpar@3: graphs having 500-10 000 nodes and an average degree of 4, while the alpar@3: contact map dataset contains graphs with 150-800 nodes and an average alpar@3: degree of 20. \\ alpar@2: Madarasi@19: In the following, both the induced subgraph isomorphism and the graph alpar@3: isomorphism will be examined. alpar@2: Madarasi@17: This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}. Madarasi@7: Madarasi@7: In an other experiment, the nodes of each graph in the database had been Madarasi@7: shuffled, and an isomorphism between the shuffled and the original Madarasi@7: graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}. Madarasi@7: Madarasi@7: Madarasi@17: \begin{figure}[H] Madarasi@17: \vspace*{-2cm} Madarasi@17: \hspace*{-1.5cm} Madarasi@17: \begin{subfigure}[b]{0.55\textwidth} Madarasi@17: \begin{figure}[H] Madarasi@17: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@17: \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@17: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@17: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@17: format/1000 sep = \thinspace}] Madarasi@17: %\addplot+[only marks] table {proteinsOrig.txt}; Madarasi@17: \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark Madarasi@17: size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt}; Madarasi@17: \end{axis} Madarasi@17: \end{tikzpicture} Madarasi@17: \caption{In the case of molecules, the algorithms have Madarasi@17: similar behaviour, but VF2++ is almost two times faster even on such Madarasi@17: small graphs.} \label{fig:INDMolecule} Madarasi@17: \end{figure} Madarasi@17: \end{subfigure} Madarasi@17: \hspace*{1.5cm} Madarasi@17: \begin{subfigure}[b]{0.55\textwidth} Madarasi@17: \begin{figure}[H] Madarasi@17: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@17: \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@17: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@17: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@17: format/1000 sep = \thinspace}] Madarasi@17: %\addplot+[only marks] table {proteinsOrig.txt}; Madarasi@17: \addplot table {Orig/ContactMaps.128.txt}; Madarasi@17: \addplot[mark=triangle*,mark size=1.8pt,color=red] table Madarasi@17: {VF2PPLabel/ContactMaps.128.txt}; Madarasi@17: \end{axis} Madarasi@17: \end{tikzpicture} Madarasi@17: \caption{On contact maps, VF2++ runs almost in constant time, while VF2 Madarasi@17: Plus has a near linear behaviour.} \label{fig:INDContact} Madarasi@17: \end{figure} Madarasi@17: \end{subfigure} Madarasi@17: Madarasi@17: \begin{center} Madarasi@17: \vspace*{-0.5cm} Madarasi@17: \begin{subfigure}[b]{0.55\textwidth} Madarasi@17: \begin{figure}[H] Madarasi@17: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@17: \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@17: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@17: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@17: format/1000 sep = \thinspace}] %\addplot+[only marks] table Madarasi@17: {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue] Madarasi@17: table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark Madarasi@17: size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt}; Madarasi@17: \end{axis} Madarasi@17: \end{tikzpicture} Madarasi@17: \caption{Both the algorithms have linear behaviour on protein Madarasi@17: graphs. VF2++ is more than 10 times faster than VF2 Madarasi@17: Plus.} \label{fig:INDProt} Madarasi@17: \end{figure} Madarasi@17: \end{subfigure} Madarasi@17: \end{center} Madarasi@17: \vspace*{-0.5cm} Madarasi@17: \caption{\normalsize{Induced subgraph isomorphism on biological graphs}}\label{fig:bioIND} Madarasi@17: \end{figure} Madarasi@17: alpar@2: alpar@2: \begin{figure}[H] Madarasi@7: \vspace*{-2cm} Madarasi@7: \hspace*{-1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \begin{figure}[H] Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@7: \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@7: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \thinspace}] Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt}; Madarasi@7: \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark Madarasi@7: size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt}; Madarasi@7: \end{axis} Madarasi@7: \end{tikzpicture} Madarasi@7: \caption{In the case of molecules, there is not such a significant Madarasi@7: difference, but VF2++ seems to be faster as the number of nodes Madarasi@7: increases.}\label{fig:ISOMolecule} Madarasi@7: \end{figure} Madarasi@7: \end{subfigure} Madarasi@7: \hspace*{1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \begin{figure}[H] Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@7: \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@7: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \thinspace}] Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt}; Madarasi@7: \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark Madarasi@7: size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt}; Madarasi@7: \end{axis} Madarasi@7: \end{tikzpicture} Madarasi@7: \caption{The results are closer to each other on contact maps, but Madarasi@7: VF2++ still performs consistently better.}\label{fig:ISOContact} Madarasi@7: \end{figure} Madarasi@7: \end{subfigure} Madarasi@7: alpar@2: \begin{center} Madarasi@7: \vspace*{-0.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \begin{figure}[H] Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@7: \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@7: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \thinspace}] Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt}; Madarasi@7: \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark Madarasi@7: size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt}; Madarasi@7: \end{axis} Madarasi@7: \end{tikzpicture} Madarasi@7: \caption{On protein graphs, VF2 Plus has a super linear time Madarasi@7: complexity, while VF2++ runs in near constant time. The difference Madarasi@7: is about two order of magnitude on large graphs.}\label{fig:ISOProt} Madarasi@7: \end{figure} Madarasi@7: \end{subfigure} Madarasi@7: \end{center} Madarasi@7: \vspace*{-0.6cm} Madarasi@17: \caption{\normalsize{Graph isomorphism on biological graphs}}\label{fig:bioISO} Madarasi@7: \end{figure} Madarasi@7: Madarasi@7: alpar@2: alpar@2: alpar@2: \subsection{Random graphs} alpar@3: This section compares VF2++ with VF2 Plus on random graphs of a large alpar@3: size. The node labels are uniformly distributed. Let $\delta$ denote alpar@3: the average degree. For the parameters of problems solved in the alpar@3: experiments, please see the top of each chart. alpar@2: \subsubsection{Graph isomorphism} alpar@3: To evaluate the efficiency of the algorithms in the case of graph Madarasi@17: isomorphism, random connected graphs of less than 20 000 nodes have been alpar@3: considered. Generating a random graph and shuffling its nodes, an Madarasi@7: isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results alpar@4: on graph sets of various density. alpar@2: Madarasi@7: Madarasi@7: Madarasi@7: Madarasi@12: \begin{figure} Madarasi@7: \vspace*{-1.5cm} Madarasi@7: \hspace*{-1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso5_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso5_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} Madarasi@7: \end{subfigure} Madarasi@7: %\hspace{1cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso10_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso10_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} Madarasi@7: \end{subfigure} Madarasi@7: %%\hspace{1cm} Madarasi@7: \hspace*{-1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso15_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso15_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} Madarasi@7: \end{subfigure} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 35$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso35_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso35_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} Madarasi@7: \end{subfigure} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 45$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso45_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso45_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} Madarasi@7: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \thinspace}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso100_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso100_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} Madarasi@7: \end{subfigure} alpar@2: \vspace*{-0.8cm} Madarasi@7: \caption{IND on graphs having an average degree of Madarasi@7: 5.}\label{fig:randISO} alpar@2: \end{figure} alpar@2: alpar@2: alpar@2: \subsubsection{Induced subgraph isomorphism} Madarasi@17: This section presents a comparison of VF2++ and VF2 Plus in the case alpar@3: of induced subgraph isomorphism. In addition to the size of the large alpar@3: graph, that of the small graph dramatically influences the hardness of alpar@3: a given problem too, so the overall picture is provided by examining alpar@3: small graphs of various size. alpar@2: Madarasi@17: For each chart, a number $0<\rho< 1$ has been fixed, and the following Madarasi@19: has been executed 150 times. Generating a large graph $G_{2}$ of an average degree of $\delta$, Madarasi@19: choose 10 of its induced subgraphs having $\rho\ |V_{2}|$ nodes, alpar@3: and for all the 10 subgraphs find a mapping by using both the graph alpar@3: matching algorithms. The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1, Madarasi@10: 0.3, 0.6, 0.8, 0.95$ cases have been examined, see alpar@4: Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and Madarasi@10: \ref{fig:randIND35}. alpar@2: alpar@2: alpar@2: alpar@2: alpar@2: Madarasi@12: \begin{figure} Madarasi@7: \vspace*{-1.5cm} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.05.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.05.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.3.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.3.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.6.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.6.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.8.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.8.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \thinspace}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.95.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.95.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} alpar@2: \vspace*{-0.8cm} alpar@3: \caption{IND on graphs having an average degree of alpar@3: 5.}\label{fig:randIND5} alpar@2: \end{figure} alpar@2: alpar@2: alpar@2: \begin{figure}[H] Madarasi@7: \vspace*{-1.5cm} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} Madarasi@7: \hspace*{-0.5cm} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.05.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.05.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} Madarasi@7: \hspace*{-0.5cm} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.3.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.3.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.6.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.6.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.8.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.8.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \thinspace}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.95.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.95.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} alpar@2: \vspace*{-0.8cm} alpar@3: \caption{IND on graphs having an average degree of alpar@3: 10.}\label{fig:randIND10} alpar@2: \end{figure} alpar@2: alpar@2: alpar@2: alpar@2: \begin{figure}[H] Madarasi@7: \vspace*{-1.5cm} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.05.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.05.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.3.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.3.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.6.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.6.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.8.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.8.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \thinspace}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.95.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.95.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} alpar@2: \vspace*{-0.8cm} alpar@3: \caption{IND on graphs having an average degree of alpar@3: 35.}\label{fig:randIND35} alpar@2: \end{figure} alpar@2: alpar@2: alpar@3: Based on these experiments, VF2++ is faster than VF2 Plus and able to alpar@3: handle really large graphs in milliseconds. Note that when $IND$ was alpar@3: considered and the small graphs had proportionally few nodes ($\rho = alpar@3: 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node alpar@4: orders (e.g. see the $\delta=10$ case on Madarasi@17: Figure~\ref{fig:randIND10}). If these instances had been excluded, the alpar@3: charts would have seemed to be similar to the other ones. alpar@3: Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 alpar@3: Plus slow slightly down, but remain practically usable even on graphs alpar@3: having 10 000 nodes. alpar@2: alpar@2: alpar@2: alpar@2: alpar@3: alpar@2: \section{Conclusion} Madarasi@19: This paper presented VF2++, a new graph matching algorithm based on VF2, called VF2++, and analyzed it from a practical viewpoint. alpar@2: alpar@3: Recognizing the importance of the node order and determining an alpar@3: efficient one, VF2++ is able to match graphs of thousands of nodes in alpar@3: near practically linear time including preprocessing. In addition to alpar@3: the proper order, VF2++ uses more efficient consistency and cutting alpar@3: rules which are easy to compute and make the algorithm able to prune alpar@3: most of the unfruitful branches without going astray. alpar@2: alpar@3: In order to show the efficiency of the new method, it has been Madarasi@19: compared to VF2 Plus\cite{VF2Plus}, which is the best contemporary algorithm. Madarasi@19: . alpar@2: alpar@3: The experiments show that VF2++ consistently outperforms VF2 Plus on alpar@3: biological graphs. It seems to be asymptotically faster on protein and alpar@3: on contact map graphs in the case of induced subgraph isomorphism, alpar@3: while in the case of graph isomorphism, it has definitely better alpar@3: asymptotic behaviour on protein graphs. alpar@2: alpar@3: Regarding random sparse graphs, not only has VF2++ proved itself to be Madarasi@19: faster than VF2 Plus, but it also has a practically linear behaviour both Madarasi@19: in the case of induced subgraph- and graph isomorphism. alpar@2: alpar@2: alpar@0: alpar@0: %% The Appendices part is started with the command \appendix; alpar@0: %% appendix sections are then done as normal sections alpar@0: %% \appendix alpar@0: alpar@0: %% \section{} alpar@0: %% \label{} alpar@0: alpar@0: %% If you have bibdatabase file and want bibtex to generate the alpar@0: %% bibitems, please use alpar@0: %% alpar@3: \bibliographystyle{elsarticle-num} \bibliography{bibliography} alpar@0: alpar@0: %% else use the following coding to input the bibitems directly in the alpar@0: %% TeX file. alpar@0: alpar@2: %% \begin{thebibliography}{00} alpar@0: alpar@2: %% %% \bibitem{label} alpar@2: %% %% Text of bibliographic item alpar@0: alpar@2: %% \bibitem{} alpar@0: alpar@2: %% \end{thebibliography} alpar@2: alpar@0: \end{document} alpar@0: \endinput alpar@0: %% alpar@0: %% End of file `elsarticle-template-num.tex'.