alpar@0: %% alpar@0: %% Copyright 2007, 2008, 2009 Elsevier Ltd alpar@0: %% alpar@0: %% This file is part of the 'Elsarticle Bundle'. alpar@0: %% --------------------------------------------- alpar@0: %% alpar@0: %% It may be distributed under the conditions of the LaTeX Project Public alpar@0: %% License, either version 1.2 of this license or (at your option) any alpar@0: %% later version. The latest version of this license is in alpar@0: %% http://www.latex-project.org/lppl.txt alpar@0: %% and version 1.2 or later is part of all distributions of LaTeX alpar@0: %% version 1999/12/01 or later. alpar@0: %% alpar@0: %% The list of all files belonging to the 'Elsarticle Bundle' is alpar@0: %% given in the file `manifest.txt'. alpar@0: %% alpar@0: alpar@0: %% Template article for Elsevier's document class `elsarticle' alpar@0: %% with numbered style bibliographic references alpar@0: %% SP 2008/03/01 alpar@0: alpar@0: \documentclass[preprint,12pt]{elsarticle} alpar@0: alpar@0: %% Use the option review to obtain double line spacing alpar@0: %% \documentclass[authoryear,preprint,review,12pt]{elsarticle} alpar@0: alpar@0: %% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn alpar@0: %% for a journal layout: alpar@0: %% \documentclass[final,1p,times]{elsarticle} alpar@0: %% \documentclass[final,1p,times,twocolumn]{elsarticle} alpar@0: %% \documentclass[final,3p,times]{elsarticle} alpar@0: %% \documentclass[final,3p,times,twocolumn]{elsarticle} alpar@0: %% \documentclass[final,5p,times]{elsarticle} alpar@0: %% \documentclass[final,5p,times,twocolumn]{elsarticle} alpar@0: alpar@0: %% For including figures, graphicx.sty has been loaded in alpar@0: %% elsarticle.cls. If you prefer to use the old commands alpar@0: %% please give \usepackage{epsfig} alpar@0: alpar@0: %% The amssymb package provides various useful mathematical symbols alpar@0: \usepackage{amssymb} alpar@0: %% The amsthm package provides extended theorem environments alpar@0: %% \usepackage{amsthm} alpar@0: alpar@0: %% The lineno packages adds line numbers. Start line numbering with alpar@0: %% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on alpar@0: %% for the whole article with \linenumbers. alpar@0: %% \usepackage{lineno} alpar@0: alpar@2: \usepackage{amsmath} alpar@2: %% \usepackage[pdftex]{graphicx} alpar@2: alpar@2: \usepackage{pgfplots} alpar@2: \pgfplotsset{width=9cm} alpar@2: \pgfplotsset{compat=1.8} alpar@2: alpar@2: \usepackage{caption} alpar@2: \usepackage{subcaption} alpar@2: alpar@2: \usepackage{algorithm} alpar@2: \usepackage{algpseudocode} alpar@2: \usepackage{tikz} alpar@2: alpar@2: \usepackage{amsthm,amssymb} alpar@2: \renewcommand{\qedsymbol}{\rule{0.7em}{0.7em}} alpar@2: alpar@2: \newtheorem{theorem}{Theorem}[subsection] alpar@2: \newtheorem{corollary}{Corollary}[theorem] alpar@2: \newtheorem{claim}[theorem]{Claim} alpar@2: alpar@2: \newtheorem{definition}{Definition}[subsection] alpar@2: \newtheorem{notation}{Notation}[subsection] alpar@2: \newtheorem{example}{Example}[subsection] alpar@2: \usetikzlibrary{decorations.markings} alpar@2: \let\oldproofname=\proofname alpar@2: %% \renewcommand{\proofname}{\rm\bf{Proof:}} alpar@2: Madarasi@7: \captionsetup{font=normalsize} Madarasi@7: alpar@1: \journal{Discrete Applied Mathematics} alpar@0: alpar@0: \begin{document} alpar@0: alpar@0: \begin{frontmatter} alpar@0: alpar@0: %% Title, authors and addresses alpar@0: alpar@0: %% use the tnoteref command within \title for footnotes; alpar@0: %% use the tnotetext command for theassociated footnote; alpar@0: %% use the fnref command within \author or \address for footnotes; alpar@0: %% use the fntext command for theassociated footnote; alpar@0: %% use the corref command within \author for corresponding author footnotes; alpar@0: %% use the cortext command for theassociated footnote; alpar@0: %% use the ead command for the email address, alpar@0: %% and the form \ead[url] for the home page: alpar@0: %% \title{Title\tnoteref{label1}} alpar@0: %% \tnotetext[label1]{} alpar@0: %% \author{Name\corref{cor1}\fnref{label2}} alpar@0: %% \ead{email address} alpar@0: %% \ead[url]{home page} alpar@0: %% \fntext[label2]{} alpar@0: %% \cortext[cor1]{} alpar@0: %% \address{Address\fnref{label3}} alpar@0: %% \fntext[label3]{} alpar@0: alpar@1: \title{Improved Algorithms for Matching Biological Graphs} alpar@0: alpar@0: %% use optional labels to link authors explicitly to addresses: alpar@0: %% \author[label1,label2]{} alpar@0: %% \address[label1]{} alpar@0: %% \address[label2]{} alpar@0: alpar@1: \author{Alp{\'a}r J{\"u}ttner and P{\'e}ter Madarasi} alpar@0: alpar@1: \address{Dept of Operations Research, ELTE} alpar@0: alpar@0: \begin{abstract} alpar@1: Subgraph isomorphism is a well-known NP-Complete problem, while its alpar@1: special case, the graph isomorphism problem is one of the few problems alpar@1: in NP neither known to be in P nor NP-Complete. Their appearance in alpar@1: many fields of application such as pattern analysis, computer vision alpar@1: questions and the analysis of chemical and biological systems has alpar@1: fostered the design of various algorithms for handling special graph alpar@1: structures. alpar@0: alpar@1: The idea of using state space representation and checking some alpar@1: conditions in each state to prune the search tree has made the VF2 alpar@1: algorithm one of the state of the art graph matching algorithms for alpar@1: more than a decade. Recently, biological questions of ever increasing alpar@1: importance have required more efficient, specialized algorithms. alpar@1: alpar@1: This paper presents VF2++, a new algorithm based on the original VF2, alpar@1: which runs significantly faster on most test cases and performs alpar@1: especially well on special graph classes stemming from biological alpar@1: questions. VF2++ handles graphs of thousands of nodes in practically alpar@1: near linear time including preprocessing. Not only is it an improved alpar@1: version of VF2, but in fact, it is by far the fastest existing alpar@1: algorithm regarding biological graphs. alpar@1: alpar@1: The reason for VF2++' superiority over VF2 is twofold. Firstly, taking alpar@1: into account the structure and the node labeling of the graph, VF2++ alpar@1: determines a state order in which most of the unfruitful branches of alpar@1: the search space can be pruned immediately. Secondly, introducing more alpar@1: efficient - nevertheless still easier to compute - cutting rules alpar@1: reduces the chance of going astray even further. alpar@1: alpar@1: In addition to the usual subgraph isomorphism, specialized versions alpar@1: for induced subgraph isomorphism and for graph isomorphism are alpar@1: presented. VF2++ has gained a runtime improvement of one order of alpar@1: magnitude respecting induced subgraph isomorphism and a better alpar@1: asymptotical behaviour in the case of graph isomorphism problem. alpar@1: alpar@1: After having provided the description of VF2++, in order to evaluate alpar@1: its effectiveness, an extensive comparison to the contemporary other alpar@1: algorithms is shown, using a wide range of inputs, including both real alpar@1: life biological and chemical datasets and standard randomly generated alpar@1: graph series. alpar@1: alpar@1: The work was motivated and sponsored by QuantumBio Inc., and all the alpar@1: developed algorithms are available as the part of the open source alpar@1: LEMON graph and network optimization library alpar@1: (http://lemon.cs.elte.hu). alpar@0: \end{abstract} alpar@0: alpar@0: \begin{keyword} alpar@0: %% keywords here, in the form: keyword \sep keyword alpar@0: alpar@0: %% PACS codes here, in the form: \PACS code \sep code alpar@0: alpar@0: %% MSC codes here, in the form: \MSC code \sep code alpar@0: %% or \MSC[2008] code \sep code (2000 is the default) alpar@0: alpar@0: \end{keyword} alpar@0: alpar@0: \end{frontmatter} alpar@0: alpar@0: %% \linenumbers alpar@0: alpar@0: %% main text alpar@2: \section{Introduction} alpar@2: \label{sec:intro} alpar@2: alpar@3: In the last decades, combinatorial structures, and especially graphs alpar@3: have been considered with ever increasing interest, and applied to the alpar@3: solution of several new and revised questions. The expressiveness, alpar@3: the simplicity and the studiedness of graphs make them practical for alpar@3: modelling and appear constantly in several seemingly independent alpar@3: fields. Bioinformatics and chemistry are amongst the most relevant alpar@3: and most important fields. alpar@2: alpar@3: Complex biological systems arise from the interaction and cooperation alpar@3: of plenty of molecular components. Getting acquainted with such alpar@3: systems at the molecular level has primary importance, since alpar@3: protein-protein interaction, DNA-protein interaction, metabolic alpar@3: interaction, transcription factor binding, neuronal networks, and alpar@3: hormone signaling networks can be understood only this way. alpar@2: alpar@3: For instance, a molecular structure can be considered as a graph, alpar@3: whose nodes correspond to atoms and whose edges to chemical bonds. The alpar@3: secondary structure of a protein can also be represented as a graph, alpar@3: where nodes are associated with aminoacids and the edges with hydrogen alpar@3: bonds. The nodes are often whole molecular components and the edges alpar@3: represent some relationships among them. The similarity and alpar@3: dissimilarity of objects corresponding to nodes are incorporated to alpar@3: the model by \emph{node labels}. Many other chemical and biological alpar@3: structures can easily be modeled in a similar way. Understanding such alpar@3: networks basically requires finding specific subgraphs, which can not alpar@3: avoid the application of graph matching algorithms. alpar@2: alpar@3: Finally, let some of the other real-world fields related to some alpar@3: variants of graph matching be briefly mentioned: pattern recognition alpar@3: and machine vision \cite{HorstBunkeApplications}, symbol recognition alpar@3: \cite{CordellaVentoSymbolRecognition}, face identification alpar@3: \cite{JianzhuangYongFaceIdentification}. \\ alpar@2: alpar@3: Subgraph and induced subgraph matching problems are known to be alpar@3: NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is alpar@3: one of the few problems in NP neither known to be in P nor alpar@3: NP-Complete. Although polynomial time isomorphism algorithms are known alpar@3: for various graph classes, like trees and planar alpar@3: graphs\cite{PlanarGraphIso}, bounded valence alpar@3: graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso} alpar@3: or permutation graphs\cite{PermGraphIso}. alpar@2: alpar@3: In the following, some algorithms based on other approaches are alpar@3: summarized, which do not need any restrictions on the graphs. However, alpar@3: an overall polynomial behaviour is not expectable from such an alpar@3: alternative, it may often have good performance, even on a graph class alpar@3: for which polynomial algorithm is known. Note that this summary alpar@3: containing only exact matching algorithms is far not complete, neither alpar@3: does it cover all the recent algorithms. alpar@2: alpar@3: The first practically usable approach was due to alpar@4: Ullmann\cite{Ullmann} which is a commonly used depth-first alpar@3: search based algorithm with a complex heuristic for reducing the alpar@3: number of visited states. A major problem is its $\Theta(n^3)$ space alpar@3: complexity, which makes it impractical in the case of big sparse alpar@3: graphs. alpar@2: alpar@4: In a recent paper, Ullmann\cite{UllmannBit} presents an alpar@3: improved version of this algorithm based on a bit-vector solution for alpar@3: the binary Constraint Satisfaction Problem. alpar@2: alpar@4: The Nauty algorithm\cite{Nauty} transforms the two graphs to alpar@3: a canonical form before starting to check for the isomorphism. It has alpar@3: been considered as one of the fastest graph isomorphism algorithms, alpar@3: although graph categories were shown in which it takes exponentially alpar@3: many steps. This algorithm handles only the graph isomorphism problem. alpar@2: alpar@4: The \emph{LAD} algorithm\cite{Lad} uses a depth-first search alpar@3: strategy and formulates the matching as a Constraint Satisfaction alpar@3: Problem to prune the search tree. The constraints are that the mapping alpar@3: has to be injective and edge-preserving, hence it is possible to alpar@3: handle new matching types as well. alpar@2: alpar@3: The \textbf{RI} algorithm\cite{RI} and its variations are based on a alpar@3: state space representation. After reordering the nodes of the graphs, alpar@3: it uses some fast executable heuristic checks without using any alpar@3: complex pruning rules. It seems to run really efficiently on graphs alpar@3: coming from biology, and won the International Contest on Pattern alpar@3: Search in Biological Databases\cite{Content}. alpar@2: alpar@3: The currently most commonly used algorithm is the alpar@3: \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was alpar@3: designed for solving pattern matching and computer vision problems, alpar@3: and has been one of the best overall algorithms for more than a alpar@3: decade. Although, it can't be up to new specialized algorithms, it is alpar@3: still widely used due to its simplicity and space efficiency. VF2 uses alpar@3: a state space representation and checks some conditions in each state alpar@3: to prune the search tree. alpar@2: alpar@3: Our first graph matching algorithm was the first version of VF2 which alpar@3: recognizes the significance of the node ordering, more opportunities alpar@3: to increase the cutting efficiency and reduce its computational alpar@3: complexity. This project was initiated and sponsored by QuantumBio alpar@3: Inc.\cite{QUANTUMBIO} and the implementation --- along with a source alpar@3: code --- has been published as a part of LEMON\cite{LEMON} open source alpar@3: graph library. alpar@2: alpar@3: This paper introduces \textbf{VF2++}, a new further improved algorithm alpar@3: for the graph and (induced)subgraph isomorphism problem, which uses alpar@3: efficient cutting rules and determines a node order in which VF2 runs alpar@3: significantly faster on practical inputs. alpar@2: alpar@3: Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has alpar@3: been published. It is considered to be as efficient as the RI alpar@3: algorithm and has a strictly better behavior on large graphs. The alpar@3: main idea of VF2 Plus is to precompute a heuristic node order of the alpar@3: small graph, in which the VF2 works more efficiently. alpar@2: alpar@2: \section{Problem Statement} alpar@3: This section provides a detailed description of the problems to be alpar@3: solved. alpar@2: \subsection{Definitions} alpar@2: alpar@3: Throughout the paper $G_{small}=(V_{small}, E_{small})$ and alpar@3: $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs. alpar@2: \begin{definition}\label{sec:ismorphic} alpar@3: $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M: alpar@3: V_{small} \longrightarrow V_{large}$ bijection, for which the alpar@3: following is true: alpar@2: \begin{center} alpar@3: $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow alpar@3: (M(u),M(v))\in{E_{large}}$ alpar@2: \end{center} alpar@2: \end{definition} alpar@3: For the sake of simplicity in this paper subgraphs and induced alpar@3: subgraphs are defined in a more general way than usual: alpar@2: \begin{definition} alpar@3: $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I: alpar@3: V_{small}\longrightarrow V_{large}$ injection, for which the alpar@3: following is true: alpar@2: \begin{center} alpar@2: $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$ alpar@2: \end{center} alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} alpar@3: $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists alpar@3: I: V_{small}\longrightarrow V_{large}$ injection, for which the alpar@3: following is true: alpar@2: \begin{center} alpar@3: $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow alpar@3: (I(u),I(v))\in E_{large}$ alpar@2: \end{center} alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} alpar@3: $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node alpar@3: label function}, where K is an arbitrary set. The elements in K alpar@3: are the \textbf{node labels}. Two nodes, u and v are said to be alpar@3: \textbf{equivalent}, if $lab(u)=lab(v)$. alpar@2: \end{definition} alpar@2: alpar@3: When node labels are also given, the matched nodes must have the same alpar@3: labels. For example, the node labeled isomorphism is phrased by alpar@2: \begin{definition} alpar@3: $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label alpar@3: function lab} if $\exists M: V_{small} \longrightarrow V_{large}$ alpar@3: bijection, for which the following is true: alpar@2: \begin{center} alpar@3: $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow alpar@3: (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} : alpar@3: lab(u)=lab(M(u)))$ alpar@2: \end{center} alpar@2: \end{definition} alpar@2: alpar@2: The other two definitions can be extended in the same way. alpar@2: alpar@3: Note that edge label function can be defined similarly to node label alpar@3: function, and all the definitions can be extended with additional alpar@3: conditions, but it is out of the scope of this work. alpar@2: alpar@3: The equivalence of two nodes is usually defined by another relation, alpar@3: $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the alpar@3: definition given above if R is an equivalence relation, which does not alpar@3: mean restriction in biological and chemical applications. alpar@2: alpar@2: \subsection{Common problems}\label{sec:CommProb} alpar@2: alpar@3: The focus of this paper is on two extensively studied topics, the alpar@3: subgraph isomorphism and its variations. However, the following alpar@3: problems also appear in many applications. alpar@2: alpar@3: The \textbf{subgraph matching problem} is the following: is alpar@3: $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node alpar@3: label? alpar@2: alpar@3: The \textbf{induced subgraph matching problem} asks the same about the alpar@3: existence of an induced subgraph. alpar@2: alpar@3: The \textbf{graph isomorphism problem} can be defined as induced alpar@3: subgraph matching problem where the sizes of the two graphs are equal. alpar@2: alpar@3: In addition to existence, it may be needed to show such a subgraph, or alpar@3: it may be necessary to list all of them. alpar@2: alpar@3: It should be noted that some authors misleadingly refer to the term alpar@3: \emph{subgraph isomorphism problem} as an \emph{induced subgraph alpar@3: isomorphism problem}. alpar@2: alpar@3: The following sections give the descriptions of VF2, VF2++, VF2 Plus alpar@3: and a particular comparison. alpar@2: alpar@2: \section{The VF2 Algorithm} alpar@3: This algorithm is the basis of both the VF2++ and the VF2 Plus. VF2 alpar@4: is able to handle all the variations mentioned in Section alpar@4: \ref{sec:CommProb}. Although it can also handle directed graphs, alpar@3: for the sake of simplicity, only the undirected case will be alpar@3: discussed. alpar@2: alpar@2: alpar@2: \subsection{Common notations} alpar@3: \indent Assume $G_{small}$ is searched in $G_{large}$. The following alpar@3: definitions and notations will be used throughout the whole paper. alpar@2: \begin{definition} alpar@3: A set $M\subseteq V_{small}\times V_{large}$ is called alpar@3: \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears alpar@3: in more than one pair in M. That is, M uniquely associates some of alpar@3: the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice alpar@3: versa. alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} alpar@3: Mapping M \textbf{covers} a node v, if there exists a pair in M, which alpar@3: contains v. alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} alpar@3: A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the alpar@3: nodes in $V_{small}$. alpar@2: \end{definition} alpar@2: alpar@2: \begin{notation} alpar@3: Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in alpar@3: V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in alpar@3: V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$. alpar@2: \end{notation} alpar@2: alpar@2: \begin{notation} alpar@3: Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node alpar@3: exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$ alpar@3: and $v\in V_{small}\cup V_{large}$. alpar@2: \end{notation} alpar@2: alpar@2: Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique alpar@2: alpar@3: The definitions of the isomorphism types can be rephrased on the alpar@3: existence of a special whole mapping $M$, since it represents a alpar@3: bijection. For example alpar@2: \begin{center} alpar@3: $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph alpar@3: isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v alpar@3: \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow alpar@3: (Pair(M,u),Pair(M,v))\in E_{large}$. alpar@2: \end{center} alpar@2: alpar@2: \begin{definition} alpar@2: A set of whole mappings is called \textbf{problem type}. alpar@2: \end{definition} alpar@3: Throughout the paper, $\mathbf{PT}$ denotes a generic problem type alpar@3: which can be substituted by any problem type. alpar@2: alpar@3: A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using alpar@3: this notations, VF2 searches a whole mapping $W$ of type $PT$. alpar@2: alpar@3: For example the problem type of graph isomorphism problem is the alpar@3: following. A whole mapping $W$ is in $\mathbf{ISO}$, iff the alpar@4: bijection represented by $W$ satisfies Definition~\ref{sec:ismorphic}. alpar@4: The subgraph- and induced subgraph matching problems can be formalized alpar@4: in a similar way. Let their problem types be denoted as $\mathbf{SUB}$ alpar@4: and $\mathbf{IND}$. alpar@2: alpar@2: \begin{definition} alpar@2: \label{expPT} alpar@3: $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in alpar@3: PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow alpar@3: (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of alpar@3: $G_{small}$ has to be mapped to an edge of $G_{large}$ for each alpar@3: mapping in $PT$. alpar@2: \end{definition} alpar@2: alpar@2: Note that $ISO$, $SUB$ and $IND$ are expanding problem types. alpar@2: alpar@3: This paper deals with the three problem types mentioned above only, alpar@3: but the following generic definitions make it possible to handle other alpar@3: types as well. Although it may be challenging to find a proper alpar@3: consistency function and an efficient cutting function. alpar@2: alpar@2: \begin{definition} alpar@3: Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a alpar@3: \textbf{consistency function by } $\mathbf{PT}$, if the following alpar@3: holds. If there exists whole mapping $W$ of type PT for which alpar@3: $M\subseteq W$, then $Cons_{PT}(M)$ is true. alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} alpar@3: Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a alpar@3: \textbf{cutting function by } $\mathbf{PT}$, if the following alpar@3: holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a alpar@3: whole mapping W of type PT. alpar@2: \end{definition} alpar@2: alpar@2: \begin{definition} alpar@3: $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if alpar@3: $Cons_{PT}(M)$ is true. alpar@2: \end{definition} alpar@2: alpar@2: $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form. alpar@2: \begin{notation} alpar@3: Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and alpar@3: $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where alpar@3: $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping. alpar@2: \end{notation} alpar@2: alpar@3: $Cons_{PT}$ will be used to check the consistency of the already alpar@3: covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if alpar@3: no whole consistent mapping can contain the current mapping. alpar@2: alpar@2: \subsection{Overview of the algorithm} alpar@3: VF2 uses a state space representation of mappings, $Cons_{PT}$ for alpar@3: excluding inconsistency with the problem type and $Cut_{PT}$ for alpar@3: pruning the search tree. Each state $s$ of the matching process can alpar@3: be associated with a mapping $M(s)$. alpar@2: alpar@4: Algorithm~\ref{alg:VF2Pseu} is a high level description of alpar@3: the VF2 matching algorithm. alpar@2: alpar@2: alpar@2: \begin{algorithm} Madarasi@13: \algtext*{EndIf}%ne nyomtasson end if-et Madarasi@13: \algtext*{EndFor}%ne Madarasi@13: \algtext*{EndProcedure}%ne nyomtasson .. alpar@2: \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu} alpar@2: \begin{algorithmic}[1] alpar@2: alpar@3: \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers alpar@3: $V_{small}$} \State Output($M(s)$) \Else alpar@2: alpar@3: \State Compute the set $P(s)$ of the pairs candidate for inclusion alpar@3: in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$ alpar@3: $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state alpar@3: $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call} alpar@3: VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure alpar@2: \end{algorithmic} alpar@2: \end{algorithm} alpar@2: alpar@2: alpar@3: The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it alpar@3: starts with an empty mapping. alpar@2: alpar@3: For each state $s$, the algorithm computes $P(s)$, the set of alpar@3: candidate node pairs for adding to the current state $s$. alpar@2: alpar@3: For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and alpar@3: $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and alpar@3: $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup alpar@3: \{p\}$ is computed, and the whole process is recursively applied to alpar@3: $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it alpar@3: can be proved that $s$ can not be extended to a whole mapping. alpar@2: Madarasi@11: In order to make sure of the correctness, see alpar@2: \begin{claim} alpar@3: Through consistent mappings, only consistent whole mappings can be alpar@3: reached, and all of the whole mappings are reachable through alpar@3: consistent mappings. alpar@2: \end{claim} alpar@2: alpar@3: Note that a state may be reached in many different ways, since the alpar@3: order of insertions into M does not influence the nascent mapping. In alpar@3: fact, the number of different ways which lead to the same state can be alpar@3: exponentially large. If $G_{small}$ and $G_{large}$ are circles with n alpar@3: nodes and n different node labels, there exists exactly one graph alpar@3: isomorphism between them, but it will be reached in $n!$ different alpar@3: ways. alpar@2: alpar@2: However, one may observe alpar@2: alpar@2: \begin{claim} alpar@2: \label{claim:claimTotOrd} alpar@3: Let $\prec$ an arbitrary total ordering relation on $V_{small}$. If alpar@3: the algorithm ignores each $p=(u,v) \in P(s)$, for which alpar@2: \begin{center} alpar@2: $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$, alpar@2: \end{center} alpar@3: then no state can be reached more than ones and each state associated alpar@3: with a whole mapping remains reachable. alpar@2: \end{claim} alpar@2: alpar@3: Note that the cornerstone of the improvements to VF2 is a proper alpar@3: choice of a total ordering. alpar@2: alpar@2: \subsection{The candidate set P(s)} alpar@2: \label{candidateComputingVF2} alpar@2: $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$. alpar@3: Suppose that $PT$ is an expanding problem type, see alpar@4: Definition~\ref{expPT}. alpar@2: alpar@2: \begin{notation} alpar@3: Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by alpar@3: $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}} alpar@3: \wedge \tilde{u}$ is covered by $M(s)\}$, and alpar@3: \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not alpar@3: covered by alpar@3: $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}} alpar@3: \wedge \tilde{v}$ is covered by $M(s)\}$ alpar@2: \end{notation} alpar@2: alpar@3: The set $P(s)$ includes the pairs of uncovered neighbours of covered alpar@3: nodes and if there is not such a node pair, all the pairs containing alpar@3: two uncovered nodes are added. Formally, let alpar@2: \[ alpar@2: P(s)\!=\! alpar@2: \begin{cases} alpar@3: T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if } alpar@3: T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq alpar@3: \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s)) alpar@3: &\hspace{-0.15cm}otherwise. alpar@2: \end{cases} alpar@2: \] alpar@2: alpar@2: \subsection{Consistency} alpar@3: This section defines the consistency functions for the different alpar@4: problem types mentioned in Section~\ref{sec:CommProb}. alpar@2: \begin{notation} alpar@3: Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} : alpar@3: (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large} alpar@3: (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$ alpar@2: \end{notation} alpar@3: Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is alpar@3: a state of the matching procedure, $M(s)$ is consistent mapping by alpar@3: $PT$ and $lab(u)=lab(v)$. $Cons_{PT}(p,M(s))$ checks whether alpar@3: including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$. alpar@2: alpar@2: \subsubsection{Induced subgraph isomorphism} alpar@3: $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow alpar@3: (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small} alpar@3: \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline The alpar@3: following formulation gives an efficient way of calculating alpar@3: $Cons_{IND}$. alpar@2: \begin{claim} alpar@3: $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v) alpar@3: \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge alpar@3: (\forall \tilde{u}\in \Gamma_{small}(u) alpar@3: \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a alpar@3: consistency function in the case of $IND$. alpar@2: \end{claim} alpar@2: alpar@2: \subsubsection{Graph isomorphism} alpar@3: $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$ alpar@3: $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by alpar@3: $IND$. alpar@2: \begin{claim} alpar@3: $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and alpar@3: only if it is a consistency function by $IND$. alpar@2: \end{claim} alpar@2: \subsubsection{Subgraph isomorphism} alpar@3: $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow alpar@3: (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small} alpar@3: \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$. alpar@2: \newline alpar@3: The following formulation gives an efficient way of calculating alpar@3: $Cons_{SUB}$. alpar@2: \begin{claim} alpar@3: $Cons_{SUB}((u,v),M(s)):= (\forall \tilde{u}\in \Gamma_{small}(u) alpar@3: \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a alpar@3: consistency function by $SUB$. alpar@2: \end{claim} alpar@2: alpar@2: \subsection{Cutting rules} alpar@3: $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently alpar@3: verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can alpar@3: be true only if it is impossible to extended $M(s)\cup \{p\}$ to a alpar@3: whole mapping. alpar@2: \begin{notation} alpar@2: alpar@3: Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash alpar@3: M_{small}(s))\backslash T_{small}(s)$, and alpar@3: \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash alpar@3: M_{large}(s))\backslash T_{large}(s)$. alpar@2: \end{notation} alpar@2: \subsubsection{Induced subgraph isomorphism} alpar@2: \begin{claim} alpar@3: $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < alpar@3: |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap alpar@3: \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap alpar@3: \tilde{T}_{small}(s)|$ is a cutting function by $IND$. alpar@2: \end{claim} alpar@2: \subsubsection{Graph isomorphism} alpar@3: Note that the cutting function of induced subgraph isomorphism defined alpar@3: above is a cutting function by $ISO$, too, however it is less alpar@3: efficient than the following while their computational complexity is alpar@3: the same. alpar@2: \begin{claim} alpar@3: $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq alpar@3: |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap alpar@3: \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap alpar@3: \tilde{T}_{small}(s)|$ is a cutting function by $ISO$. alpar@2: \end{claim} alpar@2: alpar@2: \subsubsection{Subgraph isomorphism} alpar@2: \begin{claim} alpar@3: $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < alpar@3: |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by alpar@3: $SUB$. alpar@2: \end{claim} alpar@3: Note that there is a significant difference between induced and alpar@3: non-induced subgraph isomorphism: alpar@2: alpar@2: \begin{claim} alpar@2: \label{claimSUB} alpar@3: $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < alpar@3: |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap alpar@3: \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ alpar@3: is \textbf{not} a cutting function by $SUB$. alpar@2: \end{claim} alpar@2: alpar@2: \section{The VF2++ Algorithm} alpar@3: Although any total ordering relation makes the search space of VF2 a alpar@3: tree, its choice turns out to dramatically influence the number of alpar@3: visited states. The goal is to determine an efficient one as quickly alpar@3: as possible. alpar@2: alpar@3: The main reason for VF2++' superiority over VF2 is twofold. Firstly, alpar@3: taking into account the structure and the node labeling of the graph, alpar@3: VF2++ determines a state order in which most of the unfruitful alpar@3: branches of the search space can be pruned immediately. Secondly, alpar@3: introducing more efficient --- nevertheless still easier to compute alpar@3: --- cutting rules reduces the chance of going astray even further. alpar@2: alpar@3: In addition to the usual subgraph isomorphism, specialized versions alpar@3: for induced subgraph isomorphism and for graph isomorphism have been alpar@3: designed. VF2++ has gained a runtime improvement of one order of alpar@3: magnitude respecting induced subgraph isomorphism and a better alpar@3: asymptotical behaviour in the case of graph isomorphism problem. alpar@2: alpar@3: Note that a weaker version of the cutting rules and the more efficient alpar@3: candidate set calculating were described in \cite{VF2Plus}, too. alpar@2: alpar@3: It should be noted that all the methods described in this section are alpar@3: extendable to handle directed graphs and edge labels as well. alpar@2: alpar@3: The basic ideas and the detailed description of VF2++ are provided in alpar@3: the following. alpar@2: alpar@2: \subsection{Preparations} alpar@2: \begin{claim} alpar@2: \label{claim:claimCoverFromLeft} alpar@3: The total ordering relation uniquely determines a node order, in which alpar@3: the nodes of $V_{small}$ will be covered by VF2. From the point of alpar@3: view of the matching procedure, this means, that always the same node alpar@3: of $G_{small}$ will be covered on the d-th level. alpar@2: \end{claim} alpar@2: alpar@2: \begin{definition} alpar@3: An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of alpar@3: $V_{small}$ is \textbf{matching order}, if exists $\prec$ total alpar@3: ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds alpar@3: pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$. alpar@2: \end{definition} alpar@2: alpar@2: \begin{claim}\label{claim:MOclaim} alpar@3: A total ordering is matching order, iff the nodes of every component alpar@3: form an interval in the node sequence, and every node connects to a alpar@3: previous node in its component except the first node of the alpar@3: component. The order of the components is arbitrary. \\Formally alpar@3: spoken, an order alpar@3: $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of alpar@3: $V_{small}$ is matching order $\Leftrightarrow$ $\forall alpar@3: G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall alpar@3: i: (\exists j : jd}P_{deg}(d')$\\ $M$ is the set of alpar@3: already sorted nodes, $T$ is the set of nodes candidate to be alpar@3: selected, and $degreeM$ of a node is the number of its neighbours in alpar@3: $M$. alpar@2: \begin{algorithm} alpar@3: \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne alpar@3: nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson .. alpar@2: \algtext*{EndWhile} alpar@2: \caption{}\label{alg:VF2PlusPseu} alpar@2: \begin{algorithmic}[1] alpar@3: \Procedure{VF2 Plus order}{} \State Select the node with the lowest alpar@3: $P$. \If {more nodes share the same $P$} \State select the one with alpar@3: maximum degree \EndIf \If {more nodes share the same $P$ and have the alpar@3: max degree} \State select the first \EndIf \State Put the selected alpar@3: node in the set $M$. \label{alg:putIn} \State Put all its unsorted alpar@3: neighbours in the set $T$. \If {$M\neq V_{small}$} \State From set alpar@3: $T$ select the node with maximum $degreeM$. \If {more nodes have alpar@3: maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf alpar@3: \If {more nodes have maximum $degreeM$ and $P$} \State Select the alpar@3: first. \EndIf \State \textbf{goto \ref{alg:putIn}.} \EndIf alpar@2: \EndProcedure alpar@2: \end{algorithmic} alpar@2: \end{algorithm} alpar@2: alpar@4: Using these notations, Algorithm~\ref{alg:VF2PlusPseu} alpar@3: provides the description of the sorting procedure. alpar@2: alpar@3: Note that $P(u)$ is not the exact probability of finding a consistent alpar@3: pair for $u$ by choosing a node of $V_{large}$ randomly, since alpar@3: $P_{lab}$ and $P_{deg}$ are not independent, though calculating the alpar@3: real probability would take quadratic time, which may be reduced by alpar@3: using fittingly lookup tables. alpar@2: alpar@2: \section{Experimental results} alpar@3: This section compares the performance of VF2++ and VF2 Plus. Both alpar@3: algorithms have run faster with orders of magnitude than VF2, thus its alpar@3: inclusion was not reasonable. alpar@2: \subsection{Biological graphs} alpar@3: The tests have been executed on a recent biological dataset created alpar@3: for the International Contest on Pattern Search in Biological Madarasi@7: Databases\cite{Content}, which has been constructed of molecule, Madarasi@7: protein and contact map graphs extracted from the Protein Data alpar@3: Bank\cite{ProteinDataBank}. alpar@2: alpar@3: The molecule dataset contains small graphs with less than 100 nodes alpar@3: and an average degree of less than 3. The protein dataset contains alpar@3: graphs having 500-10 000 nodes and an average degree of 4, while the alpar@3: contact map dataset contains graphs with 150-800 nodes and an average alpar@3: degree of 20. \\ alpar@2: alpar@3: In the following, the induced subgraph isomorphism and the graph alpar@3: isomorphism will be examined. alpar@2: Madarasi@7: This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For run time results, please see Figure~\ref{fig:bioIND}. Madarasi@7: Madarasi@7: In an other experiment, the nodes of each graph in the database had been Madarasi@7: shuffled, and an isomorphism between the shuffled and the original Madarasi@7: graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}. Madarasi@7: Madarasi@7: alpar@2: alpar@2: \begin{figure}[H] Madarasi@7: \vspace*{-2cm} Madarasi@7: \hspace*{-1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \begin{figure}[H] Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@7: \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@7: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \thinspace}] Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt}; Madarasi@7: \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark Madarasi@7: size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt}; Madarasi@7: \end{axis} Madarasi@7: \end{tikzpicture} Madarasi@7: \caption{In the case of molecules, there is not such a significant Madarasi@7: difference, but VF2++ seems to be faster as the number of nodes Madarasi@7: increases.}\label{fig:ISOMolecule} Madarasi@7: \end{figure} Madarasi@7: \end{subfigure} Madarasi@7: \hspace*{1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \begin{figure}[H] Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@7: \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@7: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \thinspace}] Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt}; Madarasi@7: \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark Madarasi@7: size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt}; Madarasi@7: \end{axis} Madarasi@7: \end{tikzpicture} Madarasi@7: \caption{The results are closer to each other on contact maps, but Madarasi@7: VF2++ still performs consistently better.}\label{fig:ISOContact} Madarasi@7: \end{figure} Madarasi@7: \end{subfigure} Madarasi@7: alpar@2: \begin{center} Madarasi@7: \vspace*{-0.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \begin{figure}[H] Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@7: \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@7: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \thinspace}] Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt}; Madarasi@7: \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark Madarasi@7: size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt}; Madarasi@7: \end{axis} Madarasi@7: \end{tikzpicture} Madarasi@7: \caption{On protein graphs, VF2 Plus has a super linear time Madarasi@7: complexity, while VF2++ runs in near constant time. The difference Madarasi@7: is about two order of magnitude on large graphs.}\label{fig:ISOProt} Madarasi@7: \end{figure} Madarasi@7: \end{subfigure} Madarasi@7: \end{center} Madarasi@7: \vspace*{-0.6cm} Madarasi@7: \caption{\normalsize{Graph isomomorphism on biological graphs}}\label{fig:bioISO} Madarasi@7: \end{figure} Madarasi@7: Madarasi@7: Madarasi@7: \begin{figure}[H] Madarasi@7: \vspace*{-2cm} Madarasi@7: \hspace*{-1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \begin{figure}[H] Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@7: \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@7: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \thinspace}] Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt}; Madarasi@7: \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark Madarasi@7: size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt}; Madarasi@7: \end{axis} Madarasi@7: \end{tikzpicture} Madarasi@7: \caption{In the case of molecules, the algorithms have Madarasi@7: similar behaviour, but VF2++ is almost two times faster even on such Madarasi@7: small graphs.} \label{fig:INDMolecule} Madarasi@7: \end{figure} Madarasi@7: \end{subfigure} Madarasi@7: \hspace*{1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \begin{figure}[H] Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right] Madarasi@7: \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid Madarasi@7: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north Madarasi@7: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \thinspace}] Madarasi@7: %\addplot+[only marks] table {proteinsOrig.txt}; Madarasi@7: \addplot table {Orig/ContactMaps.128.txt}; Madarasi@7: \addplot[mark=triangle*,mark size=1.8pt,color=red] table Madarasi@7: {VF2PPLabel/ContactMaps.128.txt}; Madarasi@7: \end{axis} Madarasi@7: \end{tikzpicture} Madarasi@7: \caption{On contact maps, VF2++ runs in near constant time, while VF2 Madarasi@7: Plus has a near linear behaviour.} \label{fig:INDContact} Madarasi@7: \end{figure} Madarasi@7: \end{subfigure} Madarasi@7: Madarasi@7: \begin{center} Madarasi@7: \vspace*{-0.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \begin{figure}[H] Madarasi@7: \begin{tikzpicture}[trim axis left, trim axis right] alpar@2: \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \thinspace}] %\addplot+[only marks] table alpar@3: {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue] alpar@3: table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark alpar@3: size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@3: \caption{Both the algorithms have linear behaviour on protein alpar@3: graphs. VF2++ is more than 10 times faster than VF2 alpar@3: Plus.} \label{fig:INDProt} alpar@2: \end{figure} Madarasi@7: \end{subfigure} alpar@2: \end{center} Madarasi@7: \vspace*{-0.5cm} Madarasi@7: \caption{\normalsize{Graph isomomorphism on biological graphs}}\label{fig:bioIND} alpar@2: \end{figure} alpar@2: alpar@2: alpar@2: alpar@2: alpar@2: alpar@2: \subsection{Random graphs} alpar@3: This section compares VF2++ with VF2 Plus on random graphs of a large alpar@3: size. The node labels are uniformly distributed. Let $\delta$ denote alpar@3: the average degree. For the parameters of problems solved in the alpar@3: experiments, please see the top of each chart. alpar@2: \subsubsection{Graph isomorphism} alpar@3: To evaluate the efficiency of the algorithms in the case of graph alpar@3: isomorphism, connected graphs of less than 20 000 nodes have been alpar@3: considered. Generating a random graph and shuffling its nodes, an Madarasi@7: isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results alpar@4: on graph sets of various density. alpar@2: Madarasi@7: Madarasi@7: Madarasi@7: Madarasi@12: \begin{figure} Madarasi@7: \vspace*{-1.5cm} Madarasi@7: \hspace*{-1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 5$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso5_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso5_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} Madarasi@7: \end{subfigure} Madarasi@7: %\hspace{1cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 10$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso10_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso10_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} Madarasi@7: \end{subfigure} Madarasi@7: %%\hspace{1cm} Madarasi@7: \hspace*{-1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 15$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso15_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso15_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} Madarasi@7: \end{subfigure} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 35$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso35_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso35_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} Madarasi@7: \end{subfigure} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 45$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number Madarasi@7: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso45_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso45_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} Madarasi@7: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} Madarasi@7: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} Madarasi@7: \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \thinspace}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/iso/vf2pIso100_1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/iso/vf2ppIso100_1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} Madarasi@7: \end{subfigure} alpar@2: \vspace*{-0.8cm} Madarasi@7: \caption{IND on graphs having an average degree of Madarasi@7: 5.}\label{fig:randISO} alpar@2: \end{figure} alpar@2: alpar@2: Madarasi@7: Madarasi@7: Madarasi@7: Madarasi@7: Madarasi@7: Madarasi@7: Madarasi@7: Madarasi@7: alpar@3: Considering the graph isomorphism problem, VF2++ consistently alpar@3: outperforms its rival especially on sparse graphs. The reason for the alpar@3: slightly super linear behaviour of VF2++ on denser graphs is the alpar@3: larger number of nodes in the BFS tree constructed in alpar@4: Algorithm~\ref{alg:VF2PPPseu}. alpar@2: alpar@2: \subsubsection{Induced subgraph isomorphism} alpar@3: This section provides a comparison of VF2++ and VF2 Plus in the case alpar@3: of induced subgraph isomorphism. In addition to the size of the large alpar@3: graph, that of the small graph dramatically influences the hardness of alpar@3: a given problem too, so the overall picture is provided by examining alpar@3: small graphs of various size. alpar@2: alpar@3: For each chart, a number $0<\rho< 1$ has been fixed and the following alpar@3: has been executed 150 times. Generating a large graph $G_{large}$, alpar@3: choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes, alpar@3: and for all the 10 subgraphs find a mapping by using both the graph alpar@3: matching algorithms. The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1, Madarasi@10: 0.3, 0.6, 0.8, 0.95$ cases have been examined, see alpar@4: Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and Madarasi@10: \ref{fig:randIND35}. alpar@2: alpar@2: alpar@2: alpar@2: alpar@2: Madarasi@12: \begin{figure} Madarasi@7: \vspace*{-1.5cm} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.05.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.05.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.3.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.3.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.6.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.6.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.8.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.8.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \thinspace}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd5_0.95.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd5_0.95.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} alpar@2: \vspace*{-0.8cm} alpar@3: \caption{IND on graphs having an average degree of alpar@3: 5.}\label{fig:randIND5} alpar@2: \end{figure} alpar@2: alpar@2: alpar@2: \begin{figure}[H] Madarasi@7: \vspace*{-1.5cm} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} Madarasi@7: \hspace*{-0.5cm} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.05.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.05.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} Madarasi@7: \hspace*{-0.5cm} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.3.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.3.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.6.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.6.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.8.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.8.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \thinspace}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd10_0.95.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd10_0.95.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} alpar@2: \vspace*{-0.8cm} alpar@3: \caption{IND on graphs having an average degree of alpar@3: 10.}\label{fig:randIND10} alpar@2: \end{figure} alpar@2: alpar@2: alpar@2: alpar@2: \begin{figure}[H] Madarasi@7: \vspace*{-1.5cm} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.05.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.05.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.1.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.1.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.3.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.3.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{center} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.6.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.6.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{center} alpar@2: \end{subfigure} Madarasi@7: \hspace*{-1.5cm} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \space}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.8.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.8.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} alpar@2: \begin{subfigure}[b]{0.55\textwidth} alpar@2: \begin{tikzpicture} alpar@2: \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid alpar@3: =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north alpar@3: west},scaled x ticks = false,x tick label style={/pgf/number alpar@3: format/1000 sep = \thinspace}] alpar@2: %\addplot+[only marks] table {proteinsOrig.txt}; alpar@2: \addplot table {randGraph/ind/vf2pInd35_0.95.txt}; alpar@3: \addplot[mark=triangle*,mark size=1.8pt,color=red] table alpar@3: {randGraph/ind/vf2ppInd35_0.95.txt}; alpar@2: \end{axis} alpar@2: \end{tikzpicture} alpar@2: \end{subfigure} alpar@2: \vspace*{-0.8cm} alpar@3: \caption{IND on graphs having an average degree of alpar@3: 35.}\label{fig:randIND35} alpar@2: \end{figure} alpar@2: alpar@2: alpar@3: Based on these experiments, VF2++ is faster than VF2 Plus and able to alpar@3: handle really large graphs in milliseconds. Note that when $IND$ was alpar@3: considered and the small graphs had proportionally few nodes ($\rho = alpar@3: 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node alpar@4: orders (e.g. see the $\delta=10$ case on alpar@4: Figure~\ref{fig:randIND10}). If these examples had been excluded, the alpar@3: charts would have seemed to be similar to the other ones. alpar@3: Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 alpar@3: Plus slow slightly down, but remain practically usable even on graphs alpar@3: having 10 000 nodes. alpar@2: alpar@2: alpar@2: alpar@2: alpar@3: alpar@2: \section{Conclusion} alpar@3: In this paper, after providing a short summary of the recent alpar@3: algorithms, a new graph matching algorithm based on VF2, called VF2++, alpar@3: has been presented and analyzed from a practical viewpoint. alpar@2: alpar@3: Recognizing the importance of the node order and determining an alpar@3: efficient one, VF2++ is able to match graphs of thousands of nodes in alpar@3: near practically linear time including preprocessing. In addition to alpar@3: the proper order, VF2++ uses more efficient consistency and cutting alpar@3: rules which are easy to compute and make the algorithm able to prune alpar@3: most of the unfruitful branches without going astray. alpar@2: alpar@3: In order to show the efficiency of the new method, it has been alpar@3: compared to VF2 Plus, which is the best concurrent algorithm based on alpar@3: \cite{VF2Plus}. alpar@2: alpar@3: The experiments show that VF2++ consistently outperforms VF2 Plus on alpar@3: biological graphs. It seems to be asymptotically faster on protein and alpar@3: on contact map graphs in the case of induced subgraph isomorphism, alpar@3: while in the case of graph isomorphism, it has definitely better alpar@3: asymptotic behaviour on protein graphs. alpar@2: alpar@3: Regarding random sparse graphs, not only has VF2++ proved itself to be alpar@3: faster than VF2 Plus, but it has a practically linear behaviour both alpar@3: in the case of induced subgraph- and graph isomorphism, as well. alpar@2: alpar@2: alpar@0: alpar@0: %% The Appendices part is started with the command \appendix; alpar@0: %% appendix sections are then done as normal sections alpar@0: %% \appendix alpar@0: alpar@0: %% \section{} alpar@0: %% \label{} alpar@0: alpar@0: %% If you have bibdatabase file and want bibtex to generate the alpar@0: %% bibitems, please use alpar@0: %% alpar@3: \bibliographystyle{elsarticle-num} \bibliography{bibliography} alpar@0: alpar@0: %% else use the following coding to input the bibitems directly in the alpar@0: %% TeX file. alpar@0: alpar@2: %% \begin{thebibliography}{00} alpar@0: alpar@2: %% %% \bibitem{label} alpar@2: %% %% Text of bibliographic item alpar@0: alpar@2: %% \bibitem{} alpar@0: alpar@2: %% \end{thebibliography} alpar@2: alpar@0: \end{document} alpar@0: \endinput alpar@0: %% alpar@0: %% End of file `elsarticle-template-num.tex'.