# HG changeset patch # User Alpar Juttner # Date 1479819489 -3600 # Node ID 550f81b2f81ce77601f353c6fab8327c253cc516 # Parent 20d1b0e5838f5dad6ce371ba9fe13c8e3a36b89d Reformat diff -r 20d1b0e5838f -r 550f81b2f81c damecco.tex --- a/damecco.tex Tue Nov 22 08:15:16 2016 +0100 +++ b/damecco.tex Tue Nov 22 13:58:09 2016 +0100 @@ -178,319 +178,518 @@ \section{Introduction} \label{sec:intro} -In the last decades, combinatorial structures, and especially graphs have been considered with ever increasing interest, and applied to the solution of several new and revised questions. -The expressiveness, the simplicity and the studiedness of graphs make them practical for modelling and appear constantly in several seemingly independent fields. -Bioinformatics and chemistry are amongst the most relevant and most important fields. +In the last decades, combinatorial structures, and especially graphs +have been considered with ever increasing interest, and applied to the +solution of several new and revised questions. The expressiveness, +the simplicity and the studiedness of graphs make them practical for +modelling and appear constantly in several seemingly independent +fields. Bioinformatics and chemistry are amongst the most relevant +and most important fields. -Complex biological systems arise from the interaction and cooperation of plenty of molecular components. Getting acquainted with such systems at the molecular level has primary importance, since protein-protein interaction, DNA-protein interaction, metabolic interaction, transcription factor binding, neuronal networks, and hormone signaling networks can be understood only this way. +Complex biological systems arise from the interaction and cooperation +of plenty of molecular components. Getting acquainted with such +systems at the molecular level has primary importance, since +protein-protein interaction, DNA-protein interaction, metabolic +interaction, transcription factor binding, neuronal networks, and +hormone signaling networks can be understood only this way. -For instance, a molecular structure can be considered as a graph, whose nodes correspond to atoms and whose edges to chemical bonds. The secondary structure of a protein can also be represented as a graph, where nodes are associated with aminoacids and the edges with hydrogen bonds. The nodes are often whole molecular components and the edges represent some relationships among them. -The similarity and dissimilarity of objects corresponding to nodes are incorporated to the model by \emph{node labels}. -Many other chemical and biological structures can easily be modeled in a similar way. Understanding such networks basically requires finding specific subgraphs, which can not avoid the application of graph matching algorithms. +For instance, a molecular structure can be considered as a graph, +whose nodes correspond to atoms and whose edges to chemical bonds. The +secondary structure of a protein can also be represented as a graph, +where nodes are associated with aminoacids and the edges with hydrogen +bonds. The nodes are often whole molecular components and the edges +represent some relationships among them. The similarity and +dissimilarity of objects corresponding to nodes are incorporated to +the model by \emph{node labels}. Many other chemical and biological +structures can easily be modeled in a similar way. Understanding such +networks basically requires finding specific subgraphs, which can not +avoid the application of graph matching algorithms. -Finally, let some of the other real-world fields related to some variants of graph matching be briefly mentioned: pattern recognition and machine vision \cite{HorstBunkeApplications}, symbol recognition \cite{CordellaVentoSymbolRecognition}, face identification \cite{JianzhuangYongFaceIdentification}. -\\ +Finally, let some of the other real-world fields related to some +variants of graph matching be briefly mentioned: pattern recognition +and machine vision \cite{HorstBunkeApplications}, symbol recognition +\cite{CordellaVentoSymbolRecognition}, face identification +\cite{JianzhuangYongFaceIdentification}. \\ -Subgraph and induced subgraph matching problems are known to be NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is one of the few problems in NP neither known to be in P nor NP-Complete. Although polynomial time isomorphism algorithms are known for various graph classes, like trees and planar graphs\cite{PlanarGraphIso}, bounded valence graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso} or permutation graphs\cite{PermGraphIso}. +Subgraph and induced subgraph matching problems are known to be +NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is +one of the few problems in NP neither known to be in P nor +NP-Complete. Although polynomial time isomorphism algorithms are known +for various graph classes, like trees and planar +graphs\cite{PlanarGraphIso}, bounded valence +graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso} +or permutation graphs\cite{PermGraphIso}. -In the following, some algorithms based on other approaches are summarized, which do not need any restrictions on the graphs. However, an overall polynomial behaviour is not expectable from such an alternative, it may often have good performance, even on a graph class for which polynomial algorithm is known. Note that this summary containing only exact matching algorithms is far not complete, neither does it cover all the recent algorithms. +In the following, some algorithms based on other approaches are +summarized, which do not need any restrictions on the graphs. However, +an overall polynomial behaviour is not expectable from such an +alternative, it may often have good performance, even on a graph class +for which polynomial algorithm is known. Note that this summary +containing only exact matching algorithms is far not complete, neither +does it cover all the recent algorithms. -The first practically usable approach was due to \textbf{Ullmann}\cite{Ullmann} which is a commonly used depth-first search based algorithm with a complex heuristic for reducing the number of visited states. A major problem is its $\Theta(n^3)$ space complexity, which makes it impractical in the case of big sparse graphs. +The first practically usable approach was due to +\textbf{Ullmann}\cite{Ullmann} which is a commonly used depth-first +search based algorithm with a complex heuristic for reducing the +number of visited states. A major problem is its $\Theta(n^3)$ space +complexity, which makes it impractical in the case of big sparse +graphs. -In a recent paper, \textbf{Ullmann}\cite{UllmannBit} presents an improved version of this algorithm based on a bit-vector solution for the binary Constraint Satisfaction Problem. +In a recent paper, \textbf{Ullmann}\cite{UllmannBit} presents an +improved version of this algorithm based on a bit-vector solution for +the binary Constraint Satisfaction Problem. -The \textbf{Nauty} algorithm\cite{Nauty} transforms the two graphs to a canonical form before starting to check for the isomorphism. It has been considered as one of the fastest graph isomorphism algorithms, although graph categories were shown in which it takes exponentially many steps. This algorithm handles only the graph isomorphism problem. +The \textbf{Nauty} algorithm\cite{Nauty} transforms the two graphs to +a canonical form before starting to check for the isomorphism. It has +been considered as one of the fastest graph isomorphism algorithms, +although graph categories were shown in which it takes exponentially +many steps. This algorithm handles only the graph isomorphism problem. -The \textbf{LAD} algorithm\cite{Lad} uses a depth-first search strategy and formulates the matching as a Constraint Satisfaction Problem to prune the search tree. The constraints are that the mapping has to be injective and edge-preserving, hence it is possible to handle new matching types as well. +The \textbf{LAD} algorithm\cite{Lad} uses a depth-first search +strategy and formulates the matching as a Constraint Satisfaction +Problem to prune the search tree. The constraints are that the mapping +has to be injective and edge-preserving, hence it is possible to +handle new matching types as well. -The \textbf{RI} algorithm\cite{RI} and its variations are based on a state space representation. After reordering the nodes of the graphs, it uses some fast executable heuristic checks without using any complex pruning rules. It seems to run really efficiently on graphs coming from biology, and won the International Contest on Pattern Search in Biological Databases\cite{Content}. +The \textbf{RI} algorithm\cite{RI} and its variations are based on a +state space representation. After reordering the nodes of the graphs, +it uses some fast executable heuristic checks without using any +complex pruning rules. It seems to run really efficiently on graphs +coming from biology, and won the International Contest on Pattern +Search in Biological Databases\cite{Content}. -The currently most commonly used algorithm is the \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was designed for solving pattern matching and computer vision problems, and has been one of the best overall algorithms for more than a decade. Although, it can't be up to new specialized algorithms, it is still widely used due to its simplicity and space efficiency. VF2 uses a state space representation and checks some conditions in each state to prune the search tree. +The currently most commonly used algorithm is the +\textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was +designed for solving pattern matching and computer vision problems, +and has been one of the best overall algorithms for more than a +decade. Although, it can't be up to new specialized algorithms, it is +still widely used due to its simplicity and space efficiency. VF2 uses +a state space representation and checks some conditions in each state +to prune the search tree. -Our first graph matching algorithm was the first version of VF2 which recognizes the significance of the node ordering, more opportunities to increase the cutting efficiency and reduce its computational complexity. This project was initiated and sponsored by QuantumBio Inc.\cite{QUANTUMBIO} and the implementation --- along with a source code --- has been published as a part of LEMON\cite{LEMON} open source graph library. +Our first graph matching algorithm was the first version of VF2 which +recognizes the significance of the node ordering, more opportunities +to increase the cutting efficiency and reduce its computational +complexity. This project was initiated and sponsored by QuantumBio +Inc.\cite{QUANTUMBIO} and the implementation --- along with a source +code --- has been published as a part of LEMON\cite{LEMON} open source +graph library. -This thesis introduces \textbf{VF2++}, a new further improved algorithm for the graph and (induced)subgraph isomorphism problem, which uses efficient cutting rules and determines a node order in which VF2 runs significantly faster on practical inputs. +This paper introduces \textbf{VF2++}, a new further improved algorithm +for the graph and (induced)subgraph isomorphism problem, which uses +efficient cutting rules and determines a node order in which VF2 runs +significantly faster on practical inputs. -Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has been published. It is considered to be as efficient as the RI algorithm and has a strictly better behavior on large graphs. The main idea of VF2 Plus is to precompute a heuristic node order of the small graph, in which the VF2 works more efficiently.\newline -\newline +Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has +been published. It is considered to be as efficient as the RI +algorithm and has a strictly better behavior on large graphs. The +main idea of VF2 Plus is to precompute a heuristic node order of the +small graph, in which the VF2 works more efficiently. \section{Problem Statement} -This section provides a detailed description of the problems to be solved. +This section provides a detailed description of the problems to be +solved. \subsection{Definitions} -Throughout the paper $G_{small}=(V_{small}, E_{small})$ and $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs. +Throughout the paper $G_{small}=(V_{small}, E_{small})$ and +$G_{large}=(V_{large}, E_{large})$ denote two undirected graphs. \begin{definition}\label{sec:ismorphic} -$G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M: V_{small} \longrightarrow V_{large}$ bijection, for which the following is true: +$G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M: + V_{small} \longrightarrow V_{large}$ bijection, for which the + following is true: \begin{center} -$\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (M(u),M(v))\in{E_{large}}$ +$\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow + (M(u),M(v))\in{E_{large}}$ \end{center} \end{definition} -For the sake of simplicity in this paper subgraphs and induced subgraphs are defined in a more general way than usual: +For the sake of simplicity in this paper subgraphs and induced +subgraphs are defined in a more general way than usual: \begin{definition} -$G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I: V_{small}\longrightarrow V_{large}$ injection, for which the following is true: +$G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I: + V_{small}\longrightarrow V_{large}$ injection, for which the + following is true: \begin{center} $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$ \end{center} \end{definition} \begin{definition} -$G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists I: V_{small}\longrightarrow V_{large}$ injection, for which the following is true: +$G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists + I: V_{small}\longrightarrow V_{large}$ injection, for which the + following is true: \begin{center} -$\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (I(u),I(v))\in E_{large}$ +$\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow + (I(u),I(v))\in E_{large}$ \end{center} \end{definition} \begin{definition} -$lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node label function}, where K is an arbitrary set. The elements in K are the \textbf{node labels}. Two nodes, u and v are said to be \textbf{equivalent}, if $lab(u)=lab(v)$. +$lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node + label function}, where K is an arbitrary set. The elements in K + are the \textbf{node labels}. Two nodes, u and v are said to be + \textbf{equivalent}, if $lab(u)=lab(v)$. \end{definition} -When node labels are also given, the matched nodes must have the same labels. -For example, the node labeled isomorphism is phrased by +When node labels are also given, the matched nodes must have the same +labels. For example, the node labeled isomorphism is phrased by \begin{definition} -$G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label function lab} if $\exists M: V_{small} \longrightarrow V_{large}$ bijection, for which the following is true: +$G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label + function lab} if $\exists M: V_{small} \longrightarrow V_{large}$ + bijection, for which the following is true: \begin{center} -$(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (M(u),M(v))\in{E_{large}})$ - and $(\forall u\in{V_{small}} : lab(u)=lab(M(u)))$ +$(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow + (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} : + lab(u)=lab(M(u)))$ \end{center} \end{definition} The other two definitions can be extended in the same way. -Note that edge label function can be defined similarly to node label function, and all the definitions can be extended with additional conditions, but it is out of the scope of this work. +Note that edge label function can be defined similarly to node label +function, and all the definitions can be extended with additional +conditions, but it is out of the scope of this work. -The equivalence of two nodes is usually defined by another relation, $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the definition given above if R is an equivalence relation, which does not mean restriction in biological and chemical applications. +The equivalence of two nodes is usually defined by another relation, +$\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the +definition given above if R is an equivalence relation, which does not +mean restriction in biological and chemical applications. \subsection{Common problems}\label{sec:CommProb} -The focus of this paper is on two extensively studied topics, the subgraph isomorphism and its variations. However, the following problems also appear in many applications. +The focus of this paper is on two extensively studied topics, the +subgraph isomorphism and its variations. However, the following +problems also appear in many applications. -The \textbf{subgraph matching problem} is the following: is $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node label? +The \textbf{subgraph matching problem} is the following: is +$G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node +label? -The \textbf{induced subgraph matching problem} asks the same about the existence of an induced subgraph. +The \textbf{induced subgraph matching problem} asks the same about the +existence of an induced subgraph. -The \textbf{graph isomorphism problem} can be defined as induced subgraph matching problem where the sizes of the two graphs are equal. +The \textbf{graph isomorphism problem} can be defined as induced +subgraph matching problem where the sizes of the two graphs are equal. -In addition to existence, it may be needed to show such a subgraph, or it may be necessary to list all of them. +In addition to existence, it may be needed to show such a subgraph, or +it may be necessary to list all of them. -It should be noted that some authors misleadingly refer to the term \emph{subgraph isomorphism problem} as an \emph{induced subgraph isomorphism problem}. +It should be noted that some authors misleadingly refer to the term +\emph{subgraph isomorphism problem} as an \emph{induced subgraph + isomorphism problem}. -The following sections give the descriptions of VF2, VF2++, VF2 Plus and a particular comparison. +The following sections give the descriptions of VF2, VF2++, VF2 Plus +and a particular comparison. \section{The VF2 Algorithm} -This algorithm is the basis of both the VF2++ and the VF2 Plus. -VF2 is able to handle all the variations mentioned in \textbf{Section \ref{sec:CommProb})}. -Although it can also handle directed graphs, for the sake of simplicity, only the undirected case will be discussed. +This algorithm is the basis of both the VF2++ and the VF2 Plus. VF2 +is able to handle all the variations mentioned in \textbf{Section + \ref{sec:CommProb})}. Although it can also handle directed graphs, +for the sake of simplicity, only the undirected case will be +discussed. \subsection{Common notations} -\indent -Assume $G_{small}$ is searched in $G_{large}$. -The following definitions and notations will be used throughout the whole paper. +\indent Assume $G_{small}$ is searched in $G_{large}$. The following +definitions and notations will be used throughout the whole paper. \begin{definition} -A set $M\subseteq V_{small}\times V_{large}$ is called \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears in more than one pair in M. -That is, M uniquely associates some of the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice versa. +A set $M\subseteq V_{small}\times V_{large}$ is called +\textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears +in more than one pair in M. That is, M uniquely associates some of +the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice +versa. \end{definition} \begin{definition} -Mapping M \textbf{covers} a node v, if there exists a pair in M, which contains v. +Mapping M \textbf{covers} a node v, if there exists a pair in M, which +contains v. \end{definition} \begin{definition} -A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the nodes in $V_{small}$. +A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the +nodes in $V_{small}$. \end{definition} \begin{notation} -Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$. +Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in +V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in +V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$. \end{notation} \begin{notation} -Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$ and $v\in V_{small}\cup V_{large}$. +Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node +exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$ +and $v\in V_{small}\cup V_{large}$. \end{notation} Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique -The definitions of the isomorphism types can be rephrased on the existence of a special whole mapping $M$, since it represents a bijection. For example +The definitions of the isomorphism types can be rephrased on the +existence of a special whole mapping $M$, since it represents a +bijection. For example \begin{center} -$M\subseteq V_{small}\times V_{large}$ represents an induced subgraph isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (Pair(M,u),Pair(M,v))\in E_{large}$. +$M\subseteq V_{small}\times V_{large}$ represents an induced subgraph + isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v + \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow + (Pair(M,u),Pair(M,v))\in E_{large}$. \end{center} \begin{definition} A set of whole mappings is called \textbf{problem type}. \end{definition} -Throughout the paper, $\mathbf{PT}$ denotes a generic problem type which can be substituted by any problem type. +Throughout the paper, $\mathbf{PT}$ denotes a generic problem type +which can be substituted by any problem type. -A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using this notations, VF2 searches a whole mapping $W$ of type $PT$. +A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using +this notations, VF2 searches a whole mapping $W$ of type $PT$. -For example the problem type of graph isomorphism problem is the following. -A whole mapping $W$ is in $\mathbf{ISO}$, iff the bijection represented by $W$ satisfies \textbf{Definition \ref{sec:ismorphic})}. -The subgraph- and induced subgraph matching problems can be formalized in a similar way. Let their problem types be denoted as $\mathbf{SUB}$ and $\mathbf{IND}$. +For example the problem type of graph isomorphism problem is the +following. A whole mapping $W$ is in $\mathbf{ISO}$, iff the +bijection represented by $W$ satisfies \textbf{Definition + \ref{sec:ismorphic})}. The subgraph- and induced subgraph matching +problems can be formalized in a similar way. Let their problem types +be denoted as $\mathbf{SUB}$ and $\mathbf{IND}$. \begin{definition} \label{expPT} -$PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of $G_{small}$ has to be mapped to an edge of $G_{large}$ for each mapping in $PT$. +$PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in +PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow +(Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of +$G_{small}$ has to be mapped to an edge of $G_{large}$ for each +mapping in $PT$. \end{definition} Note that $ISO$, $SUB$ and $IND$ are expanding problem types. -This paper deals with the three problem types mentioned above only, but -the following generic definitions make it possible to handle other types as well. -Although it may be challenging to find a proper consistency function and an efficient -cutting function. +This paper deals with the three problem types mentioned above only, +but the following generic definitions make it possible to handle other +types as well. Although it may be challenging to find a proper +consistency function and an efficient cutting function. \begin{definition} -Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a \textbf{consistency function by } $\mathbf{PT}$, if the following holds. If there exists whole mapping $W$ of type PT for which $M\subseteq W$, then $Cons_{PT}(M)$ is true. +Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a +\textbf{consistency function by } $\mathbf{PT}$, if the following +holds. If there exists whole mapping $W$ of type PT for which +$M\subseteq W$, then $Cons_{PT}(M)$ is true. \end{definition} \begin{definition} -Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a \textbf{cutting function by } $\mathbf{PT}$, if the following holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a whole mapping W of type PT. +Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a +\textbf{cutting function by } $\mathbf{PT}$, if the following +holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a +whole mapping W of type PT. \end{definition} \begin{definition} -$M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if $Cons_{PT}(M)$ is true. +$M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if + $Cons_{PT}(M)$ is true. \end{definition} $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form. \begin{notation} -Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping. +Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and +$\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where +$p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping. \end{notation} -$Cons_{PT}$ will be used to check the consistency of the already covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if no whole consistent mapping can contain the current mapping. +$Cons_{PT}$ will be used to check the consistency of the already +covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if +no whole consistent mapping can contain the current mapping. \subsection{Overview of the algorithm} -VF2 uses a state space representation of mappings, $Cons_{PT}$ for excluding inconsistency with the problem type and $Cut_{PT}$ for pruning the search tree. -Each state $s$ of the matching process can be associated with a mapping $M(s)$. +VF2 uses a state space representation of mappings, $Cons_{PT}$ for +excluding inconsistency with the problem type and $Cut_{PT}$ for +pruning the search tree. Each state $s$ of the matching process can +be associated with a mapping $M(s)$. -\textbf{Algorithm~\ref{alg:VF2Pseu})} is a high level description of the VF2 matching algorithm. +\textbf{Algorithm~\ref{alg:VF2Pseu})} is a high level description of +the VF2 matching algorithm. \begin{algorithm} -\algtext*{EndIf}%ne nyomtasson end if-et -\algtext*{EndFor}%ne nyomtasson .. -\algtext*{EndProcedure}%ne nyomtasson .. +\algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne +nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson .. \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu} \begin{algorithmic}[1] -\Procedure{VF2}{State $s$, ProblemType $PT$} - \If{$M(s$) covers $V_{small}$} - \State Output($M(s)$) - \Else +\Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers + $V_{small}$} \State Output($M(s)$) \Else - \State Compute the set $P(s)$ of the pairs candidate for inclusion in $M(s)$ - \ForAll{$p\in{P(s)}$} - \If{Cons$_{PT}$($p, M(s)$) $\wedge$ $\neg$Cut$_{PT}$($p, M(s)$)} - \State Compute the nascent state $\tilde{s}$ by adding $p$ to $M(s)$ - \State \textbf{call} VF2($\tilde{s}$, $PT$) - \EndIf - \EndFor - \EndIf -\EndProcedure + \State Compute the set $P(s)$ of the pairs candidate for inclusion + in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$ + $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state + $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call} + VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure \end{algorithmic} \end{algorithm} -The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it starts with an empty mapping. +The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it +starts with an empty mapping. -For each state $s$, the algorithm computes $P(s)$, the set of candidate node pairs for adding to the current state $s$. +For each state $s$, the algorithm computes $P(s)$, the set of +candidate node pairs for adding to the current state $s$. -For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup \{p\}$ is computed, and the whole process is recursively applied to $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it can be proved that $s$ can not be extended to a whole mapping. +For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and +$Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and +$Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup +\{p\}$ is computed, and the whole process is recursively applied to +$\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it +can be proved that $s$ can not be extended to a whole mapping. In order to make sure of the correctness see \begin{claim} -Through consistent mappings, only consistent whole mappings can be reached, and all of the whole mappings are reachable through consistent mappings. +Through consistent mappings, only consistent whole mappings can be +reached, and all of the whole mappings are reachable through +consistent mappings. \end{claim} -Note that a state may be reached in many different ways, since the order of insertions into M does not influence the nascent mapping. In fact, the number of different ways which lead to the same state can be exponentially large. If $G_{small}$ and $G_{large}$ are circles with n nodes and n different node labels, there exists exactly one graph isomorphism between them, but it will be reached in $n!$ different ways. +Note that a state may be reached in many different ways, since the +order of insertions into M does not influence the nascent mapping. In +fact, the number of different ways which lead to the same state can be +exponentially large. If $G_{small}$ and $G_{large}$ are circles with n +nodes and n different node labels, there exists exactly one graph +isomorphism between them, but it will be reached in $n!$ different +ways. However, one may observe \begin{claim} \label{claim:claimTotOrd} -Let $\prec$ an arbitrary total ordering relation on $V_{small}$. -If the algorithm ignores each $p=(u,v) \in P(s)$, for which +Let $\prec$ an arbitrary total ordering relation on $V_{small}$. If +the algorithm ignores each $p=(u,v) \in P(s)$, for which \begin{center} $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$, \end{center} -then no state can be reached more than ones and each state associated with a whole mapping remains reachable. +then no state can be reached more than ones and each state associated +with a whole mapping remains reachable. \end{claim} -Note that the cornerstone of the improvements to VF2 is a proper choice of a total ordering. +Note that the cornerstone of the improvements to VF2 is a proper +choice of a total ordering. \subsection{The candidate set P(s)} \label{candidateComputingVF2} $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$. -Suppose that $PT$ is an expanding problem type, see \textbf{Definition~\ref{expPT})}. +Suppose that $PT$ is an expanding problem type, see +\textbf{Definition~\ref{expPT})}. \begin{notation} -Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}} \wedge \tilde{u}$ is covered by $M(s)\}$, and \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not covered by $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}} \wedge \tilde{v}$ is covered by $M(s)\}$ +Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by +$M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}} +\wedge \tilde{u}$ is covered by $M(s)\}$, and +\\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not +covered by +$M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}} +\wedge \tilde{v}$ is covered by $M(s)\}$ \end{notation} -The set $P(s)$ includes the pairs of uncovered neighbours of covered nodes and if there is not such a node pair, all the pairs containing two uncovered nodes are added. Formally, let +The set $P(s)$ includes the pairs of uncovered neighbours of covered +nodes and if there is not such a node pair, all the pairs containing +two uncovered nodes are added. Formally, let \[ P(s)\!=\! \begin{cases} - T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if } T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq \emptyset,\\ - (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s)) &\hspace{-0.15cm}otherwise. + T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if } + T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq + \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s)) + &\hspace{-0.15cm}otherwise. \end{cases} \] \subsection{Consistency} -This section defines the consistency functions for the different problem types mentioned in \textbf{Section \ref{sec:CommProb})}. +This section defines the consistency functions for the different +problem types mentioned in \textbf{Section \ref{sec:CommProb})}. \begin{notation} -Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} : (u,\tilde{u})\in E_{small}\}$\\ -Let $\mathbf{\Gamma_{large} (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$ +Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} : +(u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large} + (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$ \end{notation} -Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, -$s$ is a state of the matching procedure, -$M(s)$ is consistent mapping by $PT$ and $lab(u)=lab(v)$. -$Cons_{PT}(p,M(s))$ checks whether including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$. +Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is +a state of the matching procedure, $M(s)$ is consistent mapping by +$PT$ and $lab(u)=lab(v)$. $Cons_{PT}(p,M(s))$ checks whether +including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$. \subsubsection{Induced subgraph isomorphism} -$M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small} \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline -The following formulation gives an efficient way of calculating $Cons_{IND}$. +$M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow +(\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small} +\Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline The +following formulation gives an efficient way of calculating +$Cons_{IND}$. \begin{claim} -$Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v) \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge -(\forall \tilde{u}\in \Gamma_{small}(u) \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a consistency function in the case of $IND$. +$Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v) + \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge + (\forall \tilde{u}\in \Gamma_{small}(u) + \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a + consistency function in the case of $IND$. \end{claim} \subsubsection{Graph isomorphism} -$M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$ $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$. +$M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$ +$\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by +$IND$. \begin{claim} -$Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and only if it is a consistency function by $IND$. +$Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and + only if it is a consistency function by $IND$. \end{claim} \subsubsection{Subgraph isomorphism} -$M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small} \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$. +$M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow +(\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small} +\Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$. \newline -The following formulation gives an efficient way of calculating $Cons_{SUB}$. +The following formulation gives an efficient way of calculating +$Cons_{SUB}$. \begin{claim} -$Cons_{SUB}((u,v),M(s)):= -(\forall \tilde{u}\in \Gamma_{small}(u) \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a consistency function by $SUB$. +$Cons_{SUB}((u,v),M(s)):= (\forall \tilde{u}\in \Gamma_{small}(u) + \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a + consistency function by $SUB$. \end{claim} \subsection{Cutting rules} -$Cut_{PT}(p,M(s))$ is defined by a collection of efficiently verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can be true only if it is impossible to extended $M(s)\cup \{p\}$ to a whole mapping. +$Cut_{PT}(p,M(s))$ is defined by a collection of efficiently +verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can +be true only if it is impossible to extended $M(s)\cup \{p\}$ to a +whole mapping. \begin{notation} -Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash M_{small}(s))\backslash T_{small}(s)$, and \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash M_{large}(s))\backslash T_{large}(s)$. +Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash +M_{small}(s))\backslash T_{small}(s)$, and +\\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash +M_{large}(s))\backslash T_{large}(s)$. \end{notation} \subsubsection{Induced subgraph isomorphism} \begin{claim} -$Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is a cutting function by $IND$. +$Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < + |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap + \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap + \tilde{T}_{small}(s)|$ is a cutting function by $IND$. \end{claim} \subsubsection{Graph isomorphism} -Note that the cutting function of induced subgraph isomorphism defined above is a cutting function by $ISO$, too, however it is less efficient than the following while their computational complexity is the same. +Note that the cutting function of induced subgraph isomorphism defined +above is a cutting function by $ISO$, too, however it is less +efficient than the following while their computational complexity is +the same. \begin{claim} -$Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is a cutting function by $ISO$. +$Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq + |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap + \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap + \tilde{T}_{small}(s)|$ is a cutting function by $ISO$. \end{claim} \subsubsection{Subgraph isomorphism} \begin{claim} -$Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by $SUB$. +$Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < + |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by + $SUB$. \end{claim} -Note that there is a significant difference between induced and non-induced subgraph isomorphism: +Note that there is a significant difference between induced and +non-induced subgraph isomorphism: \begin{claim} \label{claimSUB} -$Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is \textbf{not} a cutting function by $SUB$. +$Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < +|\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap +\tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ +is \textbf{not} a cutting function by $SUB$. \end{claim} \begin{proof}$ $\\ \vspace*{-0.5cm} @@ -499,208 +698,324 @@ \begin{center} \begin{tikzpicture} [scale=.8,auto=left,every node/.style={circle,fill=black!15}] - \node[rectangle,fill=black!15] at (4,6) {$G_{small}$}; - \node (u4) at (2.5,10) {$u_4$}; - \node (u3) at (5.5,10) {$u_3$}; - \node (u1) at (2.5,7) {$u_1$}; - \node (u2) at (5.5,7) {$u_2$}; + \node[rectangle,fill=black!15] at (4,6) {$G_{small}$}; \node (u4) at + (2.5,10) {$u_4$}; \node (u3) at (5.5,10) {$u_3$}; \node (u1) at + (2.5,7) {$u_1$}; \node (u2) at (5.5,7) {$u_2$}; \node[rectangle,fill=black!30] at (13.5,6) {$G_{large}$}; - \node[fill=black!30] (v4) at (12,10) {$v_4$}; - \node[fill=black!30] (v3) at (15,10) {$v_3$}; - \node[fill=black!30] (v1) at (12,7) {$v_1$}; - \node[fill=black!30] (v2) at (15,7) {$v_2$}; + \node[fill=black!30] (v4) at (12,10) {$v_4$}; \node[fill=black!30] + (v3) at (15,10) {$v_3$}; \node[fill=black!30] (v1) at (12,7) + {$v_1$}; \node[fill=black!30] (v2) at (15,7) {$v_2$}; - \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1} - \draw (\from) -- (\to); - \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3} - \draw (\from) -- (\to); + \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1} \draw (\from) -- + (\to); \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3} \draw + (\from) -- (\to); % \draw[dashed] (\from) -- (\to); \end{tikzpicture} -\caption{Graphs for the proof of \textbf{Claim \ref{claimSUB}}} \label{fig:proofSUB} +\caption{Graphs for the proof of \textbf{Claim + \ref{claimSUB}}} \label{fig:proofSUB} \end{center} \end{figure} -Let the two graphs of \textbf{Figure \ref{fig:proofSUB})} be the input graphs. -Suppose the total ordering relation is $u_1 \prec u_2 \prec u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add $(u_2,v_2)\in P(s)$.\newline -$Cons_{SUB}((u_2,v_2),M(s))=true$, so $M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function $Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the tree.\newline -On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true, since\\$0=|\Gamma_{large}(v_2)\cap \tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap \tilde{T}_{small}(s)|=1$ is true, but still the tree can not be pruned, because otherwise the $\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found. +Let the two graphs of \textbf{Figure \ref{fig:proofSUB})} be the input +graphs. Suppose the total ordering relation is $u_1 \prec u_2 \prec +u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add +$(u_2,v_2)\in P(s)$.\newline $Cons_{SUB}((u_2,v_2),M(s))=true$, so +$M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function +$Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the +tree.\newline On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true, +since\\$0=|\Gamma_{large}(v_2)\cap +\tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap +\tilde{T}_{small}(s)|=1$ is true, but still the tree can not be +pruned, because otherwise the +$\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found. \end{proof} -\newpage \section{The VF2++ Algorithm} -Although any total ordering relation makes the search space of VF2 a tree, its -choice turns out to dramatically influence the number of visited states. The goal is to determine an efficient one as quickly as possible. +Although any total ordering relation makes the search space of VF2 a +tree, its choice turns out to dramatically influence the number of +visited states. The goal is to determine an efficient one as quickly +as possible. -The main reason for VF2++' superiority over VF2 is twofold. Firstly, taking into account the structure and the node labeling of the graph, VF2++ determines a state order in which most of the unfruitful branches of the search space can be pruned immediately. Secondly, introducing more efficient --- nevertheless still easier to compute --- cutting rules reduces the chance of going astray even further. +The main reason for VF2++' superiority over VF2 is twofold. Firstly, +taking into account the structure and the node labeling of the graph, +VF2++ determines a state order in which most of the unfruitful +branches of the search space can be pruned immediately. Secondly, +introducing more efficient --- nevertheless still easier to compute +--- cutting rules reduces the chance of going astray even further. -In addition to the usual subgraph isomorphism, specialized versions for induced subgraph isomorphism and for graph isomorphism have been designed. VF2++ has gained a runtime improvement of one order of magnitude respecting induced subgraph isomorphism and a better asymptotical behaviour in the case of graph isomorphism problem. +In addition to the usual subgraph isomorphism, specialized versions +for induced subgraph isomorphism and for graph isomorphism have been +designed. VF2++ has gained a runtime improvement of one order of +magnitude respecting induced subgraph isomorphism and a better +asymptotical behaviour in the case of graph isomorphism problem. -Note that a weaker version of the cutting rules and the more efficient candidate -set calculating were described in \cite{VF2Plus}, too. +Note that a weaker version of the cutting rules and the more efficient +candidate set calculating were described in \cite{VF2Plus}, too. -It should be noted that all the methods described in this section are extendable to handle directed graphs and edge labels as well. +It should be noted that all the methods described in this section are +extendable to handle directed graphs and edge labels as well. -The basic ideas and the detailed description of VF2++ are provided in the following. +The basic ideas and the detailed description of VF2++ are provided in +the following. \subsection{Preparations} \begin{claim} \label{claim:claimCoverFromLeft} -The total ordering relation uniquely determines a node order, in which the nodes of $V_{small}$ will be covered by VF2. From the point of view of the matching procedure, this means, that always the same node of $G_{small}$ will be covered on the d-th level. +The total ordering relation uniquely determines a node order, in which +the nodes of $V_{small}$ will be covered by VF2. From the point of +view of the matching procedure, this means, that always the same node +of $G_{small}$ will be covered on the d-th level. \end{claim} \begin{proof} -In order to make the search space a tree, the pairs in $\{(u,v)\in P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$. +In order to make the search space a tree, the pairs in $\{(u,v)\in +P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$. \newline -Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} : \hat{u}\prec u\}$ +Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} : +\hat{u}\prec u\}$ \newline -The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} : \forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form $\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious, that only $\tilde{u}$ can be covered in $V_{small}$. Actually, $\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in $V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and $T_{small}(s)$ depends only on the covered nodes of $G_{small}$. +The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} : +\forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form +$\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious, +that only $\tilde{u}$ can be covered in $V_{small}$. Actually, +$\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in +$V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and +$T_{small}(s)$ depends only on the covered nodes of $G_{small}$. \newline -Simple induction on $d$ shows that the set of covered nodes of $G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if $d$ is given. +Simple induction on $d$ shows that the set of covered nodes of +$G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if +$d$ is given. \end{proof} \begin{definition} -An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of $V_{small}$ is \textbf{matching order}, if exists $\prec$ total ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$. +An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of +$V_{small}$ is \textbf{matching order}, if exists $\prec$ total +ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds +pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$. \end{definition} \begin{claim}\label{claim:MOclaim} -A total ordering is matching order, iff the nodes of every component form an interval in the node sequence, and every node connects to a previous node in its component except the first node of the component. The order of the components is arbitrary. -\\Formally spoken, an order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of $V_{small}$ is matching order $\Leftrightarrow$ $\forall G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall i: (\exists j : j depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes while the state is being processed, but the property is held before both stepping back to a predecessor state and exploring a successor state. +The recursion of \textbf{Algorithm~\ref{alg:VF2Pseu})} can be realized +as a while loop, which has a loop counter $depth$ denoting the +all-time depth of the recursion. Fixing a matching order, let $M$ +denote the array storing the all-time mapping. The initial state is +associated with the empty mapping, which means that $\forall i: +M[i]=INVALID$ and $depth=0$. In case of a recursive call, $depth$ has +to be incremented, while in case of a return, it has to be +decremented. Based on \textbf{Claim~\ref{claim:claimCoverFromLeft})}, +$M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before +$depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and +$\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes +while the state is being processed, but the property is held before +both stepping back to a predecessor state and exploring a successor +state. -The necessary part of the candidate set is easily maintainable or computable by following \textbf{Section~\ref{candidateComputingVF2})}. A much faster method has been designed for biological- and sparse graphs, see the next section for details. +The necessary part of the candidate set is easily maintainable or +computable by following +\textbf{Section~\ref{candidateComputingVF2})}. A much faster method +has been designed for biological- and sparse graphs, see the next +section for details. \subsubsection{Calculating the candidates for a node} -Being aware of \textbf{Claim~\ref{claim:claimCoverFromLeft})}, the task is not to maintain the candidate set, but to generate the candidate nodes in $G_{large}$ for a given node $u\in V_{small}$. -In case of an expanding problem type and $M$ mapping, if a node $v\in V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall u'\in V_{small} : (u,u')\in E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in E_{large}$. That is, each covered neighbour of $u$ has to be mapped to a covered neighbour of $v$. +Being aware of \textbf{Claim~\ref{claim:claimCoverFromLeft})}, the +task is not to maintain the candidate set, but to generate the +candidate nodes in $G_{large}$ for a given node $u\in V_{small}$. In +case of an expanding problem type and $M$ mapping, if a node $v\in +V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall +u'\in V_{small} : (u,u')\in +E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in +E_{large}$. That is, each covered neighbour of $u$ has to be mapped to +a covered neighbour of $v$. -Having said that, an algorithm running in $\Theta(deg)$ time is describable if there exists a covered node in the component containing $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily --- such a node exists based on \textbf{Claim~\ref{claim:MOclaim})}. With all the candidates of $u$ being among the uncovered neighbours of $Pair(M,u')$, there are solely $deg(Pair(M,u'))$ nodes to check. +Having said that, an algorithm running in $\Theta(deg)$ time is +describable if there exists a covered node in the component containing +$u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily +--- such a node exists based on +\textbf{Claim~\ref{claim:MOclaim})}. With all the candidates of $u$ +being among the uncovered neighbours of $Pair(M,u')$, there are solely +$deg(Pair(M,u'))$ nodes to check. -An easy trick is to choose an $u'$, for which $|\{uncovered\ neighbours\ $ $of\ Pair(M,u')\}|$ is the smallest possible. +An easy trick is to choose an $u'$, for which +$|\{uncovered\ neighbours\ $ $of\ Pair(M,u')\}|$ is the smallest +possible. -Note that if $u$ is the first node of its component, then all the uncovered nodes of $G_{large}$ are candidates, so giving a sublinear method is impossible. +Note that if $u$ is the first node of its component, then all the +uncovered nodes of $G_{large}$ are candidates, so giving a sublinear +method is impossible. \subsubsection{Determining the node order} -This section describes how the node order preprocessing method of VF2++ can efficiently be implemented. +This section describes how the node order preprocessing method of +VF2++ can efficiently be implemented. -For using lookup tables, the node labels are associated with the numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It enables $F_\mathcal{M}$ to be stored in an array, for which $F_\mathcal{M}[i]=F_\mathcal{M}(i)$ where $i=0,1,..,|K|-1$. At first, $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes in $V_{small}$ having label i, which is easy to compute in $\Theta(|V_{small}|)$ steps. +For using lookup tables, the node labels are associated with the +numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It +enables $F_\mathcal{M}$ to be stored in an array, for which +$F_\mathcal{M}[i]=F_\mathcal{M}(i)$ where $i=0,1,..,|K|-1$. At first, +$\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes +in $V_{small}$ having label i, which is easy to compute in +$\Theta(|V_{small}|)$ steps. -$\mathcal{M}\subseteq V_{small}$ can be represented as an array of size $|V_{small}|$. +$\mathcal{M}\subseteq V_{small}$ can be represented as an array of +size $|V_{small}|$. -The BFS tree is computed by using a FIFO data structure which is usually implemented as a linked list, but one can avoid it by using the array $\mathcal{M}$ itself. $\mathcal{M}$ contains all the nodes seen before, a pointer shows where the first node of the FIFO is, and another one shows where the next explored node has to be inserted. So the nodes of each level of the BFS tree can be processed by \textbf{Algorithm \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} in place by swapping nodes. +The BFS tree is computed by using a FIFO data structure which is +usually implemented as a linked list, but one can avoid it by using +the array $\mathcal{M}$ itself. $\mathcal{M}$ contains all the nodes +seen before, a pointer shows where the first node of the FIFO is, and +another one shows where the next explored node has to be inserted. So +the nodes of each level of the BFS tree can be processed by +\textbf{Algorithm \ref{alg:VF2PPProcess1})} and +\textbf{\ref{alg:VF2PPProcess2})} in place by swapping nodes. -After a node $u$ gets to the next place of the node order, $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is one less covered node in $V_{large}$ with label $lab(u)$, that is why min selection sort is preferred which gives the elements from left to right in descending order, see \textbf{Algorithm \ref{alg:VF2PPProcess1})}. +After a node $u$ gets to the next place of the node order, +$F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is +one less covered node in $V_{large}$ with label $lab(u)$, that is why +min selection sort is preferred which gives the elements from left to +right in descending order, see \textbf{Algorithm + \ref{alg:VF2PPProcess1})}. -Note that using a $\Theta(n^2)$ sort absolutely does not slow down the procedure on biological (and on sparse) graphs, since they have few nodes on a level. If a level had a large number of nodes, \textbf{Algorithm \ref{alg:VF2PPProcess2})} would seem to be a better choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not be immediately refreshed, so it is unable to provide up-to-date label information. +Note that using a $\Theta(n^2)$ sort absolutely does not slow down the +procedure on biological (and on sparse) graphs, since they have few +nodes on a level. If a level had a large number of nodes, +\textbf{Algorithm \ref{alg:VF2PPProcess2})} would seem to be a better +choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the +efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not +be immediately refreshed, so it is unable to provide up-to-date label +information. -Note that the \textit{while loop} of \textbf{Algorithm \ref{alg:VF2PPPseu})} takes one iteration per graph component and the graphs in biology are mostly connected. +Note that the \textit{while loop} of \textbf{Algorithm + \ref{alg:VF2PPPseu})} takes one iteration per graph component and +the graphs in biology are mostly connected. \subsubsection{Cutting rules} -In \textbf{Section \ref{VF2PPCuttingRules})}, the cutting rules were described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$ and $\tilde T_{large}$, which are dependent on the all-time mapping (i.e. on the all-time state). The aim is to check the labeled cutting rules of VF2++ in $\Theta(deg)$ time. +In \textbf{Section \ref{VF2PPCuttingRules})}, the cutting rules were +described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$ +and $\tilde T_{large}$, which are dependent on the all-time mapping +(i.e. on the all-time state). The aim is to check the labeled cutting +rules of VF2++ in $\Theta(deg)$ time. -Firstly, suppose that these four sets are given in such a way, that checking whether a node is in a certain set takes constant time, e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an initially zero integer lookup table of size $|K|$. After incrementing $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap T_{large}(s)$, the first part of the cutting rules is checkable in $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$ to zero takes $\Theta(deg)$ time again, which makes it possible to use the same table through the whole algorithm. -The second part of the cutting rules can be verified using the same method with $\tilde T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and $T_{large}$. Thus, the overall complexity is $\Theta(deg)$. +Firstly, suppose that these four sets are given in such a way, that +checking whether a node is in a certain set takes constant time, +e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an +initially zero integer lookup table of size $|K|$. After incrementing +$L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and +decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap +T_{large}(s)$, the first part of the cutting rules is checkable in +$\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$ +to zero takes $\Theta(deg)$ time again, which makes it possible to use +the same table through the whole algorithm. The second part of the +cutting rules can be verified using the same method with $\tilde +T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and +$T_{large}$. Thus, the overall complexity is $\Theta(deg)$. -An other integer lookup table storing the number of covered neighbours of each node in $G_{large}$ gives all the information about the sets $T_{large}$ and $\tilde T_{large}$, which is maintainable in $\Theta(deg)$ time when a pair is added or substracted by incrementing or decrementing the proper indices. A further improvement is that the values of $L[lab(u')]$ in case of checking $u$ is dependent only on $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an array of pairs (label, number of such labels) can be stored to skip the maintaining operations. Note that these arrays are at most of size $deg$. Skipping this trick, the number of covered neighbours has to be stored for each node of $G_{small}$ as well to get the sets $T_{small}$ and $\tilde T_{small}$. +An other integer lookup table storing the number of covered neighbours +of each node in $G_{large}$ gives all the information about the sets +$T_{large}$ and $\tilde T_{large}$, which is maintainable in +$\Theta(deg)$ time when a pair is added or substracted by incrementing +or decrementing the proper indices. A further improvement is that the +values of $L[lab(u')]$ in case of checking $u$ is dependent only on +$u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an +array of pairs (label, number of such labels) can be stored to skip +the maintaining operations. Note that these arrays are at most of size +$deg$. Skipping this trick, the number of covered neighbours has to be +stored for each node of $G_{small}$ as well to get the sets +$T_{small}$ and $\tilde T_{small}$. -Using similar tricks, the consistency function can be evaluated in $\Theta(deg)$ steps, as well. +Using similar tricks, the consistency function can be evaluated in +$\Theta(deg)$ steps, as well. \section{The VF2 Plus Algorithm} -The VF2 Plus algorithm is a recently improved version of VF2. It was compared with the state of the art algorithms in \cite{VF2Plus} and has proven itself to be competitive with RI, the best algorithm on biological graphs. -\\ -A short summary of VF2 Plus follows, which uses the notation and the conventions of the original paper. +The VF2 Plus algorithm is a recently improved version of VF2. It was +compared with the state of the art algorithms in \cite{VF2Plus} and +has proven itself to be competitive with RI, the best algorithm on +biological graphs. \\ A short summary of VF2 Plus follows, which uses +the notation and the conventions of the original paper. \subsection{Ordering procedure} -VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$ with the lowest probability to find a pair in $V_{small}$ and the highest number of connections with the nodes already sorted by the algorithm. +VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$ +with the lowest probability to find a pair in $V_{small}$ and the +highest number of connections with the nodes already sorted by the +algorithm. \begin{definition} -$(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$. +$(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and + $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$. \end{definition} -$P_{lab}(L):=$ a priori probability to find a node with label $L$ in $V_{large}$ +$P_{lab}(L):=$ a priori probability to find a node with label $L$ in +$V_{large}$ \newline -$P_{deg}(d):=$ a priori probability to find a node with degree $d$ in $V_{large}$ +$P_{deg}(d):=$ a priori probability to find a node with degree $d$ in +$V_{large}$ \newline -$P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ -$M$ is the set of already sorted nodes, $T$ is the set of nodes candidate to be selected, and $degreeM$ of a node is the number of its neighbours in $M$. +$P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of +already sorted nodes, $T$ is the set of nodes candidate to be +selected, and $degreeM$ of a node is the number of its neighbours in +$M$. \begin{algorithm} -\algtext*{EndIf}%ne nyomtasson end if-et -\algtext*{EndFor}%ne nyomtasson .. -\algtext*{EndProcedure}%ne nyomtasson .. +\algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne +nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson .. \algtext*{EndWhile} \caption{}\label{alg:VF2PlusPseu} \begin{algorithmic}[1] -\Procedure{VF2 Plus order}{} - \State Select the node with the lowest $P$. - \If {more nodes share the same $P$} - \State select the one with maximum degree - \EndIf - \If {more nodes share the same $P$ and have the max degree} - \State select the first - \EndIf - \State Put the selected node in the set $M$. \label{alg:putIn} - \State Put all its unsorted neighbours in the set $T$. - \If {$M\neq V_{small}$} - \State From set $T$ select the node with maximum $degreeM$. - \If {more nodes have maximum $degreeM$} - \State Select the one with the lowest $P$ - \EndIf - \If {more nodes have maximum $degreeM$ and $P$} - \State Select the first. - \EndIf - \State \textbf{goto \ref{alg:putIn}.} - \EndIf +\Procedure{VF2 Plus order}{} \State Select the node with the lowest +$P$. \If {more nodes share the same $P$} \State select the one with +maximum degree \EndIf \If {more nodes share the same $P$ and have the + max degree} \State select the first \EndIf \State Put the selected +node in the set $M$. \label{alg:putIn} \State Put all its unsorted +neighbours in the set $T$. \If {$M\neq V_{small}$} \State From set +$T$ select the node with maximum $degreeM$. \If {more nodes have + maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf +\If {more nodes have maximum $degreeM$ and $P$} \State Select the +first. \EndIf \State \textbf{goto \ref{alg:putIn}.} \EndIf \EndProcedure \end{algorithmic} \end{algorithm} -Using these notations, \textbf{Algorithm~\ref{alg:VF2PlusPseu})} provides the description of the sorting procedure. +Using these notations, \textbf{Algorithm~\ref{alg:VF2PlusPseu})} +provides the description of the sorting procedure. -Note that $P(u)$ is not the exact probability of finding a consistent pair for $u$ by choosing a node of $V_{large}$ randomly, since $P_{lab}$ and $P_{deg}$ are not independent, though calculating the real probability would take quadratic time, which may be reduced by using fittingly lookup tables. +Note that $P(u)$ is not the exact probability of finding a consistent +pair for $u$ by choosing a node of $V_{large}$ randomly, since +$P_{lab}$ and $P_{deg}$ are not independent, though calculating the +real probability would take quadratic time, which may be reduced by +using fittingly lookup tables. -\newpage \section{Experimental results} -This section compares the performance of VF2++ and VF2 Plus. Both algorithms have run faster with orders of magnitude than VF2, thus its inclusion was not reasonable. +This section compares the performance of VF2++ and VF2 Plus. Both +algorithms have run faster with orders of magnitude than VF2, thus its +inclusion was not reasonable. \subsection{Biological graphs} -The tests have been executed on a recent biological dataset created for the International Contest on Pattern Search in Biological Databases\cite{Content}, which has been constructed of Molecule, Protein and Contact Map graphs extracted from the Protein Data Bank\cite{ProteinDataBank}. +The tests have been executed on a recent biological dataset created +for the International Contest on Pattern Search in Biological +Databases\cite{Content}, which has been constructed of Molecule, +Protein and Contact Map graphs extracted from the Protein Data +Bank\cite{ProteinDataBank}. -The molecule dataset contains small graphs with less than 100 nodes and an average degree of less than 3. The protein dataset contains graphs having 500-10 000 nodes and an average degree of 4, while the contact map dataset contains graphs with 150-800 nodes and an average degree of 20. -\\ +The molecule dataset contains small graphs with less than 100 nodes +and an average degree of less than 3. The protein dataset contains +graphs having 500-10 000 nodes and an average degree of 4, while the +contact map dataset contains graphs with 150-800 nodes and an average +degree of 20. \\ -In the following, the induced subgraph isomorphism and the graph isomorphism will be examined. +In the following, the induced subgraph isomorphism and the graph +isomorphism will be examined. \subsubsection{Induced subgraph isomorphism} -This dataset contains a set of graph pairs, and \textbf{all} the induced subgraph ismorphisms have to be found between them. \textbf{Figure \ref{fig:INDProt}), \ref{fig:INDContact}),} and \textbf{ \ref{fig:INDMolecule})} show the solution time of the problems in the problem set. +This dataset contains a set of graph pairs, and \textbf{all} the +induced subgraph ismorphisms have to be found between +them. \textbf{Figure \ref{fig:INDProt}), \ref{fig:INDContact}),} and +\textbf{ \ref{fig:INDMolecule})} show the solution time of the +problems in the problem set. \begin{figure}[H] \begin{center} \begin{tikzpicture} \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid - =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] - %\addplot+[only marks] table {proteinsOrig.txt}; - \addplot[mark=*,mark size=1.2pt,color=blue] table {Orig/Proteins.256.txt}; - \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt}; + =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table + {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue] + table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark + size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt}; \end{axis} \end{tikzpicture} \end{center} \vspace*{-0.8cm} -\caption{Both the algorithms have linear behaviour on protein graphs. VF2++ is more than 10 times faster than VF2 Plus.} \label{fig:INDProt} +\caption{Both the algorithms have linear behaviour on protein + graphs. VF2++ is more than 10 times faster than VF2 + Plus.} \label{fig:INDProt} \end{figure} \begin{figure}[H] \begin{center} \begin{tikzpicture} \begin{axis}[title=Contact Maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {Orig/ContactMaps.128.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/ContactMaps.128.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {VF2PPLabel/ContactMaps.128.txt}; \end{axis} \end{tikzpicture} \end{center} \vspace*{-0.8cm} -\caption{On Contact Maps, VF2++ runs in near constant time, while VF2 Plus has a near linear behaviour.} \label{fig:INDContact} +\caption{On Contact Maps, VF2++ runs in near constant time, while VF2 + Plus has a near linear behaviour.} \label{fig:INDContact} \end{figure} \begin{figure}[H] \begin{center} \begin{tikzpicture} \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; -\addplot table {Orig/Molecules.32.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt}; +\addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark + size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt}; \end{axis} \end{tikzpicture} \end{center} \vspace*{-0.8cm} -\caption{In the case of Molecules, the algorithms seem to have a similar behaviour, but VF2++ is almost two times faster even on such small graphs.} \label{fig:INDMolecule} +\caption{In the case of Molecules, the algorithms seem to have a + similar behaviour, but VF2++ is almost two times faster even on such + small graphs.} \label{fig:INDMolecule} \end{figure} \subsubsection{Graph ismorphism} -In this experiment, the nodes of each graph in the database have been shuffled and an isomorphism between the shuffled and the original graph has been searched. For runtime results, see \textbf{Figure \ref{fig:ISOProt}), \ref{fig:ISOContact}),} and \textbf{\ref{fig:ISOMolecule})}. +In this experiment, the nodes of each graph in the database have been +shuffled and an isomorphism between the shuffled and the original +graph has been searched. For runtime results, see \textbf{Figure + \ref{fig:ISOProt}), \ref{fig:ISOContact}),} and +\textbf{\ref{fig:ISOMolecule})}. \begin{figure}[H] \begin{center} \begin{tikzpicture} \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; -\addplot table {Orig/proteinsIso.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt}; +\addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark + size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt}; \end{axis} \end{tikzpicture} \end{center} \vspace*{-0.8cm} -\caption{On protein graphs, VF2 Plus has a super linear time complexity, while VF2++ runs in near constant time. The difference is about two order of magnitude on large graphs.}\label{fig:ISOProt} +\caption{On protein graphs, VF2 Plus has a super linear time + complexity, while VF2++ runs in near constant time. The difference + is about two order of magnitude on large graphs.}\label{fig:ISOProt} \end{figure} \begin{figure}[H] \begin{center} \begin{tikzpicture} \begin{axis}[title=Contact Maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; -\addplot table {Orig/contactMapsIso.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt}; +\addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark + size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt}; \end{axis} \end{tikzpicture} \end{center} \vspace*{-0.8cm} -\caption{The results are closer to each other on Contact Maps, but VF2++ still performs consistently better.}\label{fig:ISOContact} +\caption{The results are closer to each other on Contact Maps, but + VF2++ still performs consistently better.}\label{fig:ISOContact} \end{figure} \begin{figure}[H] \begin{center} \begin{tikzpicture} \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; -\addplot table {Orig/moleculesIso.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt}; +\addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark + size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt}; \end{axis} \end{tikzpicture} \end{center} \vspace*{-0.8cm} -\caption{In the case of Molecules, there is not such a significant difference, but VF2++ seems to be faster as the number of nodes increases.}\label{fig:ISOMolecule} +\caption{In the case of Molecules, there is not such a significant + difference, but VF2++ seems to be faster as the number of nodes + increases.}\label{fig:ISOMolecule} \end{figure} \subsection{Random graphs} -This section compares VF2++ with VF2 Plus on random graphs of a large size. The node labels are uniformly distributed. -Let $\delta$ denote the average degree. -For the parameters of problems solved in the experiments, please see the top of each chart. +This section compares VF2++ with VF2 Plus on random graphs of a large +size. The node labels are uniformly distributed. Let $\delta$ denote +the average degree. For the parameters of problems solved in the +experiments, please see the top of each chart. \subsubsection{Graph isomorphism} -To evaluate the efficiency of the algorithms in the case of graph isomorphism, connected graphs of less than 20 000 nodes have been considered. Generating a random graph and shuffling its nodes, an isomorphism had to be found. \textbf{Figure \ref{fig:randISO5}), \ref{fig:randISO10}), \ref{fig:randISO15}), \ref{fig:randISO35}), \ref{fig:randISO45}),} and \textbf{\ref{fig:randISO100}) } show the runtime results on graph sets of various density. +To evaluate the efficiency of the algorithms in the case of graph +isomorphism, connected graphs of less than 20 000 nodes have been +considered. Generating a random graph and shuffling its nodes, an +isomorphism had to be found. \textbf{Figure \ref{fig:randISO5}), + \ref{fig:randISO10}), \ref{fig:randISO15}), \ref{fig:randISO35}), + \ref{fig:randISO45}),} and \textbf{\ref{fig:randISO100}) } show the +runtime results on graph sets of various density. \begin{figure}[H] \begin{center} \begin{tikzpicture} \begin{axis}[title={Random ISO, $\delta = 5$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/iso/vf2pIso5_1.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso5_1.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/iso/vf2ppIso5_1.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -972,10 +1454,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random ISO, $\delta = 10$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/iso/vf2pIso10_1.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso10_1.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/iso/vf2ppIso10_1.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -987,10 +1472,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random ISO, $\delta = 15$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/iso/vf2pIso15_1.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso15_1.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/iso/vf2ppIso15_1.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1002,10 +1490,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random ISO, $\delta = 35$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/iso/vf2pIso35_1.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso35_1.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/iso/vf2ppIso35_1.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1017,10 +1508,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random ISO, $\delta = 45$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/iso/vf2pIso45_1.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso45_1.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/iso/vf2ppIso45_1.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1032,10 +1526,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random ISO, $\delta = 100$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/iso/vf2pIso100_1.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso100_1.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/iso/vf2ppIso100_1.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1044,13 +1541,30 @@ \end{figure} -Considering the graph isomorphism problem, VF2++ consistently outperforms its rival especially on sparse graphs. The reason for the slightly super linear behaviour of VF2++ on denser graphs is the larger number of nodes in the BFS tree constructed in \textbf{Algorithm \ref{alg:VF2PPPseu})}. +Considering the graph isomorphism problem, VF2++ consistently +outperforms its rival especially on sparse graphs. The reason for the +slightly super linear behaviour of VF2++ on denser graphs is the +larger number of nodes in the BFS tree constructed in +\textbf{Algorithm \ref{alg:VF2PPPseu})}. \subsubsection{Induced subgraph isomorphism} -This section provides a comparison of VF2++ and VF2 Plus in the case of induced subgraph isomorphism. In addition to the size of the large graph, that of the small graph dramatically influences the hardness of a given problem too, so the overall picture is provided by examining small graphs of various size. +This section provides a comparison of VF2++ and VF2 Plus in the case +of induced subgraph isomorphism. In addition to the size of the large +graph, that of the small graph dramatically influences the hardness of +a given problem too, so the overall picture is provided by examining +small graphs of various size. -For each chart, a number $0<\rho< 1$ has been fixed and the following has been executed 150 times. Generating a large graph $G_{large}$, choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes, and for all the 10 subgraphs find a mapping by using both the graph matching algorithms. -The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1, 0.3, 0.6, 0.8, 0.95$ cases have been examined (see \textbf{Figure \ref{fig:randIND5}), \ref{fig:randIND10})} and \textbf{\ref{fig:randIND35})}), and for each $\delta$, a cumulative chart is given as well, which excludes $\rho = 0.05$ and $0.1$ for the sake of perspicuity (see \textbf{Figure \ref{fig:randIND5Sum}), \ref{fig:randIND10Sum})} and \textbf{\ref{fig:randIND35Sum})}). +For each chart, a number $0<\rho< 1$ has been fixed and the following +has been executed 150 times. Generating a large graph $G_{large}$, +choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes, +and for all the 10 subgraphs find a mapping by using both the graph +matching algorithms. The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1, +0.3, 0.6, 0.8, 0.95$ cases have been examined (see \textbf{Figure + \ref{fig:randIND5}), \ref{fig:randIND10})} and +\textbf{\ref{fig:randIND35})}), and for each $\delta$, a cumulative +chart is given as well, which excludes $\rho = 0.05$ and $0.1$ for the +sake of perspicuity (see \textbf{Figure \ref{fig:randIND5Sum}), + \ref{fig:randIND10Sum})} and \textbf{\ref{fig:randIND35Sum})}). @@ -1062,10 +1576,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd5_0.05.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.05.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd5_0.05.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1074,10 +1591,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd5_0.1.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.1.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd5_0.1.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1087,10 +1607,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd5_0.3.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.3.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd5_0.3.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1099,10 +1622,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd5_0.6.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.6.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd5_0.6.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1111,41 +1637,58 @@ \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd5_0.8.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.8.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd5_0.8.txt}; \end{axis} \end{tikzpicture} \end{subfigure} \begin{subfigure}[b]{0.55\textwidth} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd5_0.95.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.95.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd5_0.95.txt}; \end{axis} \end{tikzpicture} \end{subfigure} \vspace*{-0.8cm} -\caption{IND on graphs having an average degree of 5.}\label{fig:randIND5} +\caption{IND on graphs having an average degree of + 5.}\label{fig:randIND5} \end{figure} \begin{figure}[H] \begin{center} \begin{tikzpicture} \begin{axis}[title={Rand IND Summary, $\delta = 5$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid -=major,mark size=1pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.3.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.3.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.6.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.6.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.8.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.8.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd5_0.95.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd5_0.95.txt}; +\addplot[mark=*,mark size=1.5pt,color=blue] table + {randGraph/ind/vf2pInd5_0.3.txt}; \addplot[mark=triangle*,mark + size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd5_0.3.txt}; \addplot[mark=*,mark + size=1.5pt,color=blue] table + {randGraph/ind/vf2pInd5_0.6.txt}; \addplot[mark=triangle*,mark + size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd5_0.6.txt}; \addplot[mark=*,mark + size=1.5pt,color=blue] table + {randGraph/ind/vf2pInd5_0.8.txt}; \addplot[mark=triangle*,mark + size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd5_0.8.txt}; \addplot[mark=*,mark + size=1.5pt,color=blue] table + {randGraph/ind/vf2pInd5_0.95.txt}; + \addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd5_0.95.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1161,10 +1704,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd10_0.05.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.05.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd10_0.05.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1173,10 +1719,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd10_0.1.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.1.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd10_0.1.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1186,10 +1735,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd10_0.3.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.3.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd10_0.3.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1198,10 +1750,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd10_0.6.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.6.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd10_0.6.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1210,41 +1765,65 @@ \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd10_0.8.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.8.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd10_0.8.txt}; \end{axis} \end{tikzpicture} \end{subfigure} \begin{subfigure}[b]{0.55\textwidth} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd10_0.95.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.95.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd10_0.95.txt}; \end{axis} \end{tikzpicture} \end{subfigure} \vspace*{-0.8cm} -\caption{IND on graphs having an average degree of 10.}\label{fig:randIND10} +\caption{IND on graphs having an average degree of + 10.}\label{fig:randIND10} \end{figure} \begin{figure}[H] \begin{center} \begin{tikzpicture} \begin{axis}[title={Rand IND Summary, $\delta = 10$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid -=major,mark size=1pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.3.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.3.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.6.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.6.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.8.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.8.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd10_0.95.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd10_0.95.txt}; +\addplot[mark=*,mark size=1.5pt,color=blue] table + {randGraph/ind/vf2pInd10_0.3.txt}; + \addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd10_0.3.txt}; + \addplot[mark=*,mark size=1.5pt,color=blue] table + {randGraph/ind/vf2pInd10_0.6.txt}; + \addplot[mark=triangle*,mark + size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd10_0.6.txt}; + \addplot[mark=*,mark + size=1.5pt,color=blue] table + {randGraph/ind/vf2pInd10_0.8.txt}; + \addplot[mark=triangle*,mark + size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd10_0.8.txt}; + \addplot[mark=*,mark + size=1.5pt,color=blue] + table + {randGraph/ind/vf2pInd10_0.95.txt}; + \addplot[mark=triangle*,mark + size=1.8pt,color=red] + table + {randGraph/ind/vf2ppInd10_0.95.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1260,10 +1839,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd35_0.05.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.05.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd35_0.05.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1272,10 +1854,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd35_0.1.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.1.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd35_0.1.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1285,10 +1870,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd35_0.3.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.3.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd35_0.3.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1297,10 +1885,13 @@ \begin{center} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd35_0.6.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.6.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd35_0.6.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1309,41 +1900,65 @@ \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \space}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd35_0.8.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.8.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd35_0.8.txt}; \end{axis} \end{tikzpicture} \end{subfigure} \begin{subfigure}[b]{0.55\textwidth} \begin{tikzpicture} \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid -=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1.2pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; \addplot table {randGraph/ind/vf2pInd35_0.95.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.95.txt}; +\addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd35_0.95.txt}; \end{axis} \end{tikzpicture} \end{subfigure} \vspace*{-0.8cm} -\caption{IND on graphs having an average degree of 35.}\label{fig:randIND35} +\caption{IND on graphs having an average degree of + 35.}\label{fig:randIND35} \end{figure} \begin{figure}[H] \begin{center} \begin{tikzpicture} \begin{axis}[title={Rand IND Summary, $\delta = 35$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid -=major,mark size=1pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] +=major,mark size=1pt, legend style={at={(0,1)},anchor=north + west},scaled x ticks = false,x tick label style={/pgf/number + format/1000 sep = \thinspace}] %\addplot+[only marks] table {proteinsOrig.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.3.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.3.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.6.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.6.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.8.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.8.txt}; -\addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.95.txt}; -\addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.95.txt}; +\addplot[mark=*,mark size=1.5pt,color=blue] table + {randGraph/ind/vf2pInd35_0.3.txt}; + \addplot[mark=triangle*,mark size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd35_0.3.txt}; + \addplot[mark=*,mark size=1.5pt,color=blue] table + {randGraph/ind/vf2pInd35_0.6.txt}; + \addplot[mark=triangle*,mark + size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd35_0.6.txt}; + \addplot[mark=*,mark + size=1.5pt,color=blue] table + {randGraph/ind/vf2pInd35_0.8.txt}; + \addplot[mark=triangle*,mark + size=1.8pt,color=red] table + {randGraph/ind/vf2ppInd35_0.8.txt}; + \addplot[mark=*,mark + size=1.5pt,color=blue] + table + {randGraph/ind/vf2pInd35_0.95.txt}; + \addplot[mark=triangle*,mark + size=1.8pt,color=red] + table + {randGraph/ind/vf2ppInd35_0.95.txt}; \end{axis} \end{tikzpicture} \end{center} @@ -1351,23 +1966,46 @@ \caption{Cummulative chart for $\delta=35$.}\label{fig:randIND35Sum} \end{figure} -Based on these experiments, VF2++ is faster than VF2 Plus and able to handle really large graphs in milliseconds. Note that when $IND$ was considered and the small graphs had proportionally few nodes ($\rho = 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node orders(e.g. see the $\delta=10$ case on \textbf{Figure \ref{fig:randIND10})}). If these examples had been excluded, the charts would have seemed to be similar to the other ones. -Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 Plus slow slightly down, but remain practically usable even on graphs having 10 000 nodes. +Based on these experiments, VF2++ is faster than VF2 Plus and able to +handle really large graphs in milliseconds. Note that when $IND$ was +considered and the small graphs had proportionally few nodes ($\rho = +0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node +orders(e.g. see the $\delta=10$ case on \textbf{Figure + \ref{fig:randIND10})}). If these examples had been excluded, the +charts would have seemed to be similar to the other ones. +Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 +Plus slow slightly down, but remain practically usable even on graphs +having 10 000 nodes. -\newpage + \section{Conclusion} -In this thesis, after providing a short summary of the recent algorithms, a new graph matching algorithm based on VF2, called VF2++, has been presented and analyzed from a practical viewpoint. +In this paper, after providing a short summary of the recent +algorithms, a new graph matching algorithm based on VF2, called VF2++, +has been presented and analyzed from a practical viewpoint. -Recognizing the importance of the node order and determining an efficient one, VF2++ is able to match graphs of thousands of nodes in near practically linear time including preprocessing. In addition to the proper order, VF2++ uses more efficient consistency and cutting rules which are easy to compute and make the algorithm able to prune most of the unfruitful branches without going astray. +Recognizing the importance of the node order and determining an +efficient one, VF2++ is able to match graphs of thousands of nodes in +near practically linear time including preprocessing. In addition to +the proper order, VF2++ uses more efficient consistency and cutting +rules which are easy to compute and make the algorithm able to prune +most of the unfruitful branches without going astray. -In order to show the efficiency of the new method, it has been compared to VF2 Plus, which is the best concurrent algorithm based on \cite{VF2Plus}. +In order to show the efficiency of the new method, it has been +compared to VF2 Plus, which is the best concurrent algorithm based on +\cite{VF2Plus}. -The experiments show that VF2++ consistently outperforms VF2 Plus on biological graphs. It seems to be asymptotically faster on protein and on contact map graphs in the case of induced subgraph isomorphism, while in the case of graph isomorphism, it has definitely better asymptotic behaviour on protein graphs. +The experiments show that VF2++ consistently outperforms VF2 Plus on +biological graphs. It seems to be asymptotically faster on protein and +on contact map graphs in the case of induced subgraph isomorphism, +while in the case of graph isomorphism, it has definitely better +asymptotic behaviour on protein graphs. -Regarding random sparse graphs, not only has VF2++ proved itself to be faster than VF2 Plus, but it has a practically linear behaviour both in the case of induced subgraph- and graph isomorphism, as well. +Regarding random sparse graphs, not only has VF2++ proved itself to be +faster than VF2 Plus, but it has a practically linear behaviour both +in the case of induced subgraph- and graph isomorphism, as well. @@ -1381,8 +2019,7 @@ %% If you have bibdatabase file and want bibtex to generate the %% bibitems, please use %% -\bibliographystyle{elsarticle-num} -\bibliography{bibliography} +\bibliographystyle{elsarticle-num} \bibliography{bibliography} %% else use the following coding to input the bibitems directly in the %% TeX file.