176 |
176 |
177 %% main text |
177 %% main text |
178 \section{Introduction} |
178 \section{Introduction} |
179 \label{sec:intro} |
179 \label{sec:intro} |
180 |
180 |
181 In the last decades, combinatorial structures, and especially graphs have been considered with ever increasing interest, and applied to the solution of several new and revised questions. |
181 In the last decades, combinatorial structures, and especially graphs |
182 The expressiveness, the simplicity and the studiedness of graphs make them practical for modelling and appear constantly in several seemingly independent fields. |
182 have been considered with ever increasing interest, and applied to the |
183 Bioinformatics and chemistry are amongst the most relevant and most important fields. |
183 solution of several new and revised questions. The expressiveness, |
184 |
184 the simplicity and the studiedness of graphs make them practical for |
185 Complex biological systems arise from the interaction and cooperation of plenty of molecular components. Getting acquainted with such systems at the molecular level has primary importance, since protein-protein interaction, DNA-protein interaction, metabolic interaction, transcription factor binding, neuronal networks, and hormone signaling networks can be understood only this way. |
185 modelling and appear constantly in several seemingly independent |
186 |
186 fields. Bioinformatics and chemistry are amongst the most relevant |
187 For instance, a molecular structure can be considered as a graph, whose nodes correspond to atoms and whose edges to chemical bonds. The secondary structure of a protein can also be represented as a graph, where nodes are associated with aminoacids and the edges with hydrogen bonds. The nodes are often whole molecular components and the edges represent some relationships among them. |
187 and most important fields. |
188 The similarity and dissimilarity of objects corresponding to nodes are incorporated to the model by \emph{node labels}. |
188 |
189 Many other chemical and biological structures can easily be modeled in a similar way. Understanding such networks basically requires finding specific subgraphs, which can not avoid the application of graph matching algorithms. |
189 Complex biological systems arise from the interaction and cooperation |
190 |
190 of plenty of molecular components. Getting acquainted with such |
191 Finally, let some of the other real-world fields related to some variants of graph matching be briefly mentioned: pattern recognition and machine vision \cite{HorstBunkeApplications}, symbol recognition \cite{CordellaVentoSymbolRecognition}, face identification \cite{JianzhuangYongFaceIdentification}. |
191 systems at the molecular level has primary importance, since |
192 \\ |
192 protein-protein interaction, DNA-protein interaction, metabolic |
193 |
193 interaction, transcription factor binding, neuronal networks, and |
194 Subgraph and induced subgraph matching problems are known to be NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is one of the few problems in NP neither known to be in P nor NP-Complete. Although polynomial time isomorphism algorithms are known for various graph classes, like trees and planar graphs\cite{PlanarGraphIso}, bounded valence graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso} or permutation graphs\cite{PermGraphIso}. |
194 hormone signaling networks can be understood only this way. |
195 |
195 |
196 In the following, some algorithms based on other approaches are summarized, which do not need any restrictions on the graphs. However, an overall polynomial behaviour is not expectable from such an alternative, it may often have good performance, even on a graph class for which polynomial algorithm is known. Note that this summary containing only exact matching algorithms is far not complete, neither does it cover all the recent algorithms. |
196 For instance, a molecular structure can be considered as a graph, |
197 |
197 whose nodes correspond to atoms and whose edges to chemical bonds. The |
198 The first practically usable approach was due to \textbf{Ullmann}\cite{Ullmann} which is a commonly used depth-first search based algorithm with a complex heuristic for reducing the number of visited states. A major problem is its $\Theta(n^3)$ space complexity, which makes it impractical in the case of big sparse graphs. |
198 secondary structure of a protein can also be represented as a graph, |
199 |
199 where nodes are associated with aminoacids and the edges with hydrogen |
200 In a recent paper, \textbf{Ullmann}\cite{UllmannBit} presents an improved version of this algorithm based on a bit-vector solution for the binary Constraint Satisfaction Problem. |
200 bonds. The nodes are often whole molecular components and the edges |
201 |
201 represent some relationships among them. The similarity and |
202 The \textbf{Nauty} algorithm\cite{Nauty} transforms the two graphs to a canonical form before starting to check for the isomorphism. It has been considered as one of the fastest graph isomorphism algorithms, although graph categories were shown in which it takes exponentially many steps. This algorithm handles only the graph isomorphism problem. |
202 dissimilarity of objects corresponding to nodes are incorporated to |
203 |
203 the model by \emph{node labels}. Many other chemical and biological |
204 The \textbf{LAD} algorithm\cite{Lad} uses a depth-first search strategy and formulates the matching as a Constraint Satisfaction Problem to prune the search tree. The constraints are that the mapping has to be injective and edge-preserving, hence it is possible to handle new matching types as well. |
204 structures can easily be modeled in a similar way. Understanding such |
205 |
205 networks basically requires finding specific subgraphs, which can not |
206 The \textbf{RI} algorithm\cite{RI} and its variations are based on a state space representation. After reordering the nodes of the graphs, it uses some fast executable heuristic checks without using any complex pruning rules. It seems to run really efficiently on graphs coming from biology, and won the International Contest on Pattern Search in Biological Databases\cite{Content}. |
206 avoid the application of graph matching algorithms. |
207 |
207 |
208 The currently most commonly used algorithm is the \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was designed for solving pattern matching and computer vision problems, and has been one of the best overall algorithms for more than a decade. Although, it can't be up to new specialized algorithms, it is still widely used due to its simplicity and space efficiency. VF2 uses a state space representation and checks some conditions in each state to prune the search tree. |
208 Finally, let some of the other real-world fields related to some |
209 |
209 variants of graph matching be briefly mentioned: pattern recognition |
210 Our first graph matching algorithm was the first version of VF2 which recognizes the significance of the node ordering, more opportunities to increase the cutting efficiency and reduce its computational complexity. This project was initiated and sponsored by QuantumBio Inc.\cite{QUANTUMBIO} and the implementation --- along with a source code --- has been published as a part of LEMON\cite{LEMON} open source graph library. |
210 and machine vision \cite{HorstBunkeApplications}, symbol recognition |
211 |
211 \cite{CordellaVentoSymbolRecognition}, face identification |
212 This thesis introduces \textbf{VF2++}, a new further improved algorithm for the graph and (induced)subgraph isomorphism problem, which uses efficient cutting rules and determines a node order in which VF2 runs significantly faster on practical inputs. |
212 \cite{JianzhuangYongFaceIdentification}. \\ |
213 |
213 |
214 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has been published. It is considered to be as efficient as the RI algorithm and has a strictly better behavior on large graphs. The main idea of VF2 Plus is to precompute a heuristic node order of the small graph, in which the VF2 works more efficiently.\newline |
214 Subgraph and induced subgraph matching problems are known to be |
215 \newline |
215 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is |
|
216 one of the few problems in NP neither known to be in P nor |
|
217 NP-Complete. Although polynomial time isomorphism algorithms are known |
|
218 for various graph classes, like trees and planar |
|
219 graphs\cite{PlanarGraphIso}, bounded valence |
|
220 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso} |
|
221 or permutation graphs\cite{PermGraphIso}. |
|
222 |
|
223 In the following, some algorithms based on other approaches are |
|
224 summarized, which do not need any restrictions on the graphs. However, |
|
225 an overall polynomial behaviour is not expectable from such an |
|
226 alternative, it may often have good performance, even on a graph class |
|
227 for which polynomial algorithm is known. Note that this summary |
|
228 containing only exact matching algorithms is far not complete, neither |
|
229 does it cover all the recent algorithms. |
|
230 |
|
231 The first practically usable approach was due to |
|
232 \textbf{Ullmann}\cite{Ullmann} which is a commonly used depth-first |
|
233 search based algorithm with a complex heuristic for reducing the |
|
234 number of visited states. A major problem is its $\Theta(n^3)$ space |
|
235 complexity, which makes it impractical in the case of big sparse |
|
236 graphs. |
|
237 |
|
238 In a recent paper, \textbf{Ullmann}\cite{UllmannBit} presents an |
|
239 improved version of this algorithm based on a bit-vector solution for |
|
240 the binary Constraint Satisfaction Problem. |
|
241 |
|
242 The \textbf{Nauty} algorithm\cite{Nauty} transforms the two graphs to |
|
243 a canonical form before starting to check for the isomorphism. It has |
|
244 been considered as one of the fastest graph isomorphism algorithms, |
|
245 although graph categories were shown in which it takes exponentially |
|
246 many steps. This algorithm handles only the graph isomorphism problem. |
|
247 |
|
248 The \textbf{LAD} algorithm\cite{Lad} uses a depth-first search |
|
249 strategy and formulates the matching as a Constraint Satisfaction |
|
250 Problem to prune the search tree. The constraints are that the mapping |
|
251 has to be injective and edge-preserving, hence it is possible to |
|
252 handle new matching types as well. |
|
253 |
|
254 The \textbf{RI} algorithm\cite{RI} and its variations are based on a |
|
255 state space representation. After reordering the nodes of the graphs, |
|
256 it uses some fast executable heuristic checks without using any |
|
257 complex pruning rules. It seems to run really efficiently on graphs |
|
258 coming from biology, and won the International Contest on Pattern |
|
259 Search in Biological Databases\cite{Content}. |
|
260 |
|
261 The currently most commonly used algorithm is the |
|
262 \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was |
|
263 designed for solving pattern matching and computer vision problems, |
|
264 and has been one of the best overall algorithms for more than a |
|
265 decade. Although, it can't be up to new specialized algorithms, it is |
|
266 still widely used due to its simplicity and space efficiency. VF2 uses |
|
267 a state space representation and checks some conditions in each state |
|
268 to prune the search tree. |
|
269 |
|
270 Our first graph matching algorithm was the first version of VF2 which |
|
271 recognizes the significance of the node ordering, more opportunities |
|
272 to increase the cutting efficiency and reduce its computational |
|
273 complexity. This project was initiated and sponsored by QuantumBio |
|
274 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source |
|
275 code --- has been published as a part of LEMON\cite{LEMON} open source |
|
276 graph library. |
|
277 |
|
278 This paper introduces \textbf{VF2++}, a new further improved algorithm |
|
279 for the graph and (induced)subgraph isomorphism problem, which uses |
|
280 efficient cutting rules and determines a node order in which VF2 runs |
|
281 significantly faster on practical inputs. |
|
282 |
|
283 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has |
|
284 been published. It is considered to be as efficient as the RI |
|
285 algorithm and has a strictly better behavior on large graphs. The |
|
286 main idea of VF2 Plus is to precompute a heuristic node order of the |
|
287 small graph, in which the VF2 works more efficiently. |
216 |
288 |
217 \section{Problem Statement} |
289 \section{Problem Statement} |
218 This section provides a detailed description of the problems to be solved. |
290 This section provides a detailed description of the problems to be |
|
291 solved. |
219 \subsection{Definitions} |
292 \subsection{Definitions} |
220 |
293 |
221 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs. |
294 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and |
|
295 $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs. |
222 \begin{definition}\label{sec:ismorphic} |
296 \begin{definition}\label{sec:ismorphic} |
223 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M: V_{small} \longrightarrow V_{large}$ bijection, for which the following is true: |
297 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M: |
224 \begin{center} |
298 V_{small} \longrightarrow V_{large}$ bijection, for which the |
225 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (M(u),M(v))\in{E_{large}}$ |
299 following is true: |
|
300 \begin{center} |
|
301 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow |
|
302 (M(u),M(v))\in{E_{large}}$ |
226 \end{center} |
303 \end{center} |
227 \end{definition} |
304 \end{definition} |
228 For the sake of simplicity in this paper subgraphs and induced subgraphs are defined in a more general way than usual: |
305 For the sake of simplicity in this paper subgraphs and induced |
|
306 subgraphs are defined in a more general way than usual: |
229 \begin{definition} |
307 \begin{definition} |
230 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I: V_{small}\longrightarrow V_{large}$ injection, for which the following is true: |
308 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I: |
|
309 V_{small}\longrightarrow V_{large}$ injection, for which the |
|
310 following is true: |
231 \begin{center} |
311 \begin{center} |
232 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$ |
312 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$ |
233 \end{center} |
313 \end{center} |
234 \end{definition} |
314 \end{definition} |
235 |
315 |
236 \begin{definition} |
316 \begin{definition} |
237 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists I: V_{small}\longrightarrow V_{large}$ injection, for which the following is true: |
317 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists |
238 \begin{center} |
318 I: V_{small}\longrightarrow V_{large}$ injection, for which the |
239 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (I(u),I(v))\in E_{large}$ |
319 following is true: |
|
320 \begin{center} |
|
321 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow |
|
322 (I(u),I(v))\in E_{large}$ |
240 \end{center} |
323 \end{center} |
241 \end{definition} |
324 \end{definition} |
242 |
325 |
243 \begin{definition} |
326 \begin{definition} |
244 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node label function}, where K is an arbitrary set. The elements in K are the \textbf{node labels}. Two nodes, u and v are said to be \textbf{equivalent}, if $lab(u)=lab(v)$. |
327 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node |
|
328 label function}, where K is an arbitrary set. The elements in K |
|
329 are the \textbf{node labels}. Two nodes, u and v are said to be |
|
330 \textbf{equivalent}, if $lab(u)=lab(v)$. |
245 \end{definition} |
331 \end{definition} |
246 |
332 |
247 When node labels are also given, the matched nodes must have the same labels. |
333 When node labels are also given, the matched nodes must have the same |
248 For example, the node labeled isomorphism is phrased by |
334 labels. For example, the node labeled isomorphism is phrased by |
249 \begin{definition} |
335 \begin{definition} |
250 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label function lab} if $\exists M: V_{small} \longrightarrow V_{large}$ bijection, for which the following is true: |
336 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label |
251 \begin{center} |
337 function lab} if $\exists M: V_{small} \longrightarrow V_{large}$ |
252 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (M(u),M(v))\in{E_{large}})$ |
338 bijection, for which the following is true: |
253 and $(\forall u\in{V_{small}} : lab(u)=lab(M(u)))$ |
339 \begin{center} |
|
340 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow |
|
341 (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} : |
|
342 lab(u)=lab(M(u)))$ |
254 \end{center} |
343 \end{center} |
255 \end{definition} |
344 \end{definition} |
256 |
345 |
257 The other two definitions can be extended in the same way. |
346 The other two definitions can be extended in the same way. |
258 |
347 |
259 Note that edge label function can be defined similarly to node label function, and all the definitions can be extended with additional conditions, but it is out of the scope of this work. |
348 Note that edge label function can be defined similarly to node label |
260 |
349 function, and all the definitions can be extended with additional |
261 The equivalence of two nodes is usually defined by another relation, $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the definition given above if R is an equivalence relation, which does not mean restriction in biological and chemical applications. |
350 conditions, but it is out of the scope of this work. |
|
351 |
|
352 The equivalence of two nodes is usually defined by another relation, |
|
353 $\\R\subseteq (V_{small}\cup V_{large})^2$. This overlaps with the |
|
354 definition given above if R is an equivalence relation, which does not |
|
355 mean restriction in biological and chemical applications. |
262 |
356 |
263 \subsection{Common problems}\label{sec:CommProb} |
357 \subsection{Common problems}\label{sec:CommProb} |
264 |
358 |
265 The focus of this paper is on two extensively studied topics, the subgraph isomorphism and its variations. However, the following problems also appear in many applications. |
359 The focus of this paper is on two extensively studied topics, the |
266 |
360 subgraph isomorphism and its variations. However, the following |
267 The \textbf{subgraph matching problem} is the following: is $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node label? |
361 problems also appear in many applications. |
268 |
362 |
269 The \textbf{induced subgraph matching problem} asks the same about the existence of an induced subgraph. |
363 The \textbf{subgraph matching problem} is the following: is |
270 |
364 $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node |
271 The \textbf{graph isomorphism problem} can be defined as induced subgraph matching problem where the sizes of the two graphs are equal. |
365 label? |
272 |
366 |
273 In addition to existence, it may be needed to show such a subgraph, or it may be necessary to list all of them. |
367 The \textbf{induced subgraph matching problem} asks the same about the |
274 |
368 existence of an induced subgraph. |
275 It should be noted that some authors misleadingly refer to the term \emph{subgraph isomorphism problem} as an \emph{induced subgraph isomorphism problem}. |
369 |
276 |
370 The \textbf{graph isomorphism problem} can be defined as induced |
277 The following sections give the descriptions of VF2, VF2++, VF2 Plus and a particular comparison. |
371 subgraph matching problem where the sizes of the two graphs are equal. |
|
372 |
|
373 In addition to existence, it may be needed to show such a subgraph, or |
|
374 it may be necessary to list all of them. |
|
375 |
|
376 It should be noted that some authors misleadingly refer to the term |
|
377 \emph{subgraph isomorphism problem} as an \emph{induced subgraph |
|
378 isomorphism problem}. |
|
379 |
|
380 The following sections give the descriptions of VF2, VF2++, VF2 Plus |
|
381 and a particular comparison. |
278 |
382 |
279 \section{The VF2 Algorithm} |
383 \section{The VF2 Algorithm} |
280 This algorithm is the basis of both the VF2++ and the VF2 Plus. |
384 This algorithm is the basis of both the VF2++ and the VF2 Plus. VF2 |
281 VF2 is able to handle all the variations mentioned in \textbf{Section \ref{sec:CommProb})}. |
385 is able to handle all the variations mentioned in \textbf{Section |
282 Although it can also handle directed graphs, for the sake of simplicity, only the undirected case will be discussed. |
386 \ref{sec:CommProb})}. Although it can also handle directed graphs, |
|
387 for the sake of simplicity, only the undirected case will be |
|
388 discussed. |
283 |
389 |
284 |
390 |
285 \subsection{Common notations} |
391 \subsection{Common notations} |
286 \indent |
392 \indent Assume $G_{small}$ is searched in $G_{large}$. The following |
287 Assume $G_{small}$ is searched in $G_{large}$. |
393 definitions and notations will be used throughout the whole paper. |
288 The following definitions and notations will be used throughout the whole paper. |
|
289 \begin{definition} |
394 \begin{definition} |
290 A set $M\subseteq V_{small}\times V_{large}$ is called \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears in more than one pair in M. |
395 A set $M\subseteq V_{small}\times V_{large}$ is called |
291 That is, M uniquely associates some of the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice versa. |
396 \textbf{mapping}, if no node of $V_{small}$ or of $V_{large}$ appears |
|
397 in more than one pair in M. That is, M uniquely associates some of |
|
398 the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice |
|
399 versa. |
292 \end{definition} |
400 \end{definition} |
293 |
401 |
294 \begin{definition} |
402 \begin{definition} |
295 Mapping M \textbf{covers} a node v, if there exists a pair in M, which contains v. |
403 Mapping M \textbf{covers} a node v, if there exists a pair in M, which |
|
404 contains v. |
296 \end{definition} |
405 \end{definition} |
297 |
406 |
298 \begin{definition} |
407 \begin{definition} |
299 A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the nodes in $V_{small}$. |
408 A mapping $M$ is $\mathbf{whole\ mapping}$, if $M$ covers all the |
|
409 nodes in $V_{small}$. |
300 \end{definition} |
410 \end{definition} |
301 |
411 |
302 \begin{notation} |
412 \begin{notation} |
303 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$. |
413 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in |
|
414 V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in |
|
415 V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$. |
304 \end{notation} |
416 \end{notation} |
305 |
417 |
306 \begin{notation} |
418 \begin{notation} |
307 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$ and $v\in V_{small}\cup V_{large}$. |
419 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$, if such a node |
|
420 exist, otherwise $\mathbf{Pair(M,v)}$ is undefined. For a mapping $M$ |
|
421 and $v\in V_{small}\cup V_{large}$. |
308 \end{notation} |
422 \end{notation} |
309 |
423 |
310 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique |
424 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique |
311 |
425 |
312 The definitions of the isomorphism types can be rephrased on the existence of a special whole mapping $M$, since it represents a bijection. For example |
426 The definitions of the isomorphism types can be rephrased on the |
313 \begin{center} |
427 existence of a special whole mapping $M$, since it represents a |
314 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow (Pair(M,u),Pair(M,v))\in E_{large}$. |
428 bijection. For example |
|
429 \begin{center} |
|
430 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph |
|
431 isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v |
|
432 \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow |
|
433 (Pair(M,u),Pair(M,v))\in E_{large}$. |
315 \end{center} |
434 \end{center} |
316 |
435 |
317 \begin{definition} |
436 \begin{definition} |
318 A set of whole mappings is called \textbf{problem type}. |
437 A set of whole mappings is called \textbf{problem type}. |
319 \end{definition} |
438 \end{definition} |
320 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type which can be substituted by any problem type. |
439 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type |
321 |
440 which can be substituted by any problem type. |
322 A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using this notations, VF2 searches a whole mapping $W$ of type $PT$. |
441 |
323 |
442 A whole mapping $W\mathbf{\ is\ of\ type\ PT}$, if $W\in PT$. Using |
324 For example the problem type of graph isomorphism problem is the following. |
443 this notations, VF2 searches a whole mapping $W$ of type $PT$. |
325 A whole mapping $W$ is in $\mathbf{ISO}$, iff the bijection represented by $W$ satisfies \textbf{Definition \ref{sec:ismorphic})}. |
444 |
326 The subgraph- and induced subgraph matching problems can be formalized in a similar way. Let their problem types be denoted as $\mathbf{SUB}$ and $\mathbf{IND}$. |
445 For example the problem type of graph isomorphism problem is the |
|
446 following. A whole mapping $W$ is in $\mathbf{ISO}$, iff the |
|
447 bijection represented by $W$ satisfies \textbf{Definition |
|
448 \ref{sec:ismorphic})}. The subgraph- and induced subgraph matching |
|
449 problems can be formalized in a similar way. Let their problem types |
|
450 be denoted as $\mathbf{SUB}$ and $\mathbf{IND}$. |
327 |
451 |
328 \begin{definition} |
452 \begin{definition} |
329 \label{expPT} |
453 \label{expPT} |
330 $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of $G_{small}$ has to be mapped to an edge of $G_{large}$ for each mapping in $PT$. |
454 $PT$ is an \textbf{expanding problem type} if $\ \forall\ W\in |
|
455 PT:\ \forall u_1,u_2\in V_{small}:\ (u_1,u_2)\in E_{small}\Rightarrow |
|
456 (Pair(W,u_1),Pair(W,u_2))\in E_{large}$, that is each edge of |
|
457 $G_{small}$ has to be mapped to an edge of $G_{large}$ for each |
|
458 mapping in $PT$. |
331 \end{definition} |
459 \end{definition} |
332 |
460 |
333 Note that $ISO$, $SUB$ and $IND$ are expanding problem types. |
461 Note that $ISO$, $SUB$ and $IND$ are expanding problem types. |
334 |
462 |
335 This paper deals with the three problem types mentioned above only, but |
463 This paper deals with the three problem types mentioned above only, |
336 the following generic definitions make it possible to handle other types as well. |
464 but the following generic definitions make it possible to handle other |
337 Although it may be challenging to find a proper consistency function and an efficient |
465 types as well. Although it may be challenging to find a proper |
338 cutting function. |
466 consistency function and an efficient cutting function. |
339 |
467 |
340 \begin{definition} |
468 \begin{definition} |
341 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a \textbf{consistency function by } $\mathbf{PT}$, if the following holds. If there exists whole mapping $W$ of type PT for which $M\subseteq W$, then $Cons_{PT}(M)$ is true. |
469 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a |
|
470 \textbf{consistency function by } $\mathbf{PT}$, if the following |
|
471 holds. If there exists whole mapping $W$ of type PT for which |
|
472 $M\subseteq W$, then $Cons_{PT}(M)$ is true. |
342 \end{definition} |
473 \end{definition} |
343 |
474 |
344 \begin{definition} |
475 \begin{definition} |
345 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a \textbf{cutting function by } $\mathbf{PT}$, if the following holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a whole mapping W of type PT. |
476 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a |
|
477 \textbf{cutting function by } $\mathbf{PT}$, if the following |
|
478 holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a |
|
479 whole mapping W of type PT. |
346 \end{definition} |
480 \end{definition} |
347 |
481 |
348 \begin{definition} |
482 \begin{definition} |
349 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if $Cons_{PT}(M)$ is true. |
483 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$, if |
|
484 $Cons_{PT}(M)$ is true. |
350 \end{definition} |
485 \end{definition} |
351 |
486 |
352 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form. |
487 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form. |
353 \begin{notation} |
488 \begin{notation} |
354 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping. |
489 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$ and |
|
490 $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where |
|
491 $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping. |
355 \end{notation} |
492 \end{notation} |
356 |
493 |
357 $Cons_{PT}$ will be used to check the consistency of the already covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if no whole consistent mapping can contain the current mapping. |
494 $Cons_{PT}$ will be used to check the consistency of the already |
|
495 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if |
|
496 no whole consistent mapping can contain the current mapping. |
358 |
497 |
359 \subsection{Overview of the algorithm} |
498 \subsection{Overview of the algorithm} |
360 VF2 uses a state space representation of mappings, $Cons_{PT}$ for excluding inconsistency with the problem type and $Cut_{PT}$ for pruning the search tree. |
499 VF2 uses a state space representation of mappings, $Cons_{PT}$ for |
361 Each state $s$ of the matching process can be associated with a mapping $M(s)$. |
500 excluding inconsistency with the problem type and $Cut_{PT}$ for |
362 |
501 pruning the search tree. Each state $s$ of the matching process can |
363 \textbf{Algorithm~\ref{alg:VF2Pseu})} is a high level description of the VF2 matching algorithm. |
502 be associated with a mapping $M(s)$. |
|
503 |
|
504 \textbf{Algorithm~\ref{alg:VF2Pseu})} is a high level description of |
|
505 the VF2 matching algorithm. |
364 |
506 |
365 |
507 |
366 \begin{algorithm} |
508 \begin{algorithm} |
367 \algtext*{EndIf}%ne nyomtasson end if-et |
509 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne |
368 \algtext*{EndFor}%ne nyomtasson .. |
510 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson .. |
369 \algtext*{EndProcedure}%ne nyomtasson .. |
|
370 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu} |
511 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu} |
371 \begin{algorithmic}[1] |
512 \begin{algorithmic}[1] |
372 |
513 |
373 \Procedure{VF2}{State $s$, ProblemType $PT$} |
514 \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers |
374 \If{$M(s$) covers $V_{small}$} |
515 $V_{small}$} \State Output($M(s)$) \Else |
375 \State Output($M(s)$) |
|
376 \Else |
|
377 |
516 |
378 \State Compute the set $P(s)$ of the pairs candidate for inclusion in $M(s)$ |
517 \State Compute the set $P(s)$ of the pairs candidate for inclusion |
379 \ForAll{$p\in{P(s)}$} |
518 in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$ |
380 \If{Cons$_{PT}$($p, M(s)$) $\wedge$ $\neg$Cut$_{PT}$($p, M(s)$)} |
519 $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state |
381 \State Compute the nascent state $\tilde{s}$ by adding $p$ to $M(s)$ |
520 $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call} |
382 \State \textbf{call} VF2($\tilde{s}$, $PT$) |
521 VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure |
383 \EndIf |
|
384 \EndFor |
|
385 \EndIf |
|
386 \EndProcedure |
|
387 \end{algorithmic} |
522 \end{algorithmic} |
388 \end{algorithm} |
523 \end{algorithm} |
389 |
524 |
390 |
525 |
391 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it starts with an empty mapping. |
526 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it |
392 |
527 starts with an empty mapping. |
393 For each state $s$, the algorithm computes $P(s)$, the set of candidate node pairs for adding to the current state $s$. |
528 |
394 |
529 For each state $s$, the algorithm computes $P(s)$, the set of |
395 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup \{p\}$ is computed, and the whole process is recursively applied to $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it can be proved that $s$ can not be extended to a whole mapping. |
530 candidate node pairs for adding to the current state $s$. |
|
531 |
|
532 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and |
|
533 $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and |
|
534 $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup |
|
535 \{p\}$ is computed, and the whole process is recursively applied to |
|
536 $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$ or it |
|
537 can be proved that $s$ can not be extended to a whole mapping. |
396 |
538 |
397 In order to make sure of the correctness see |
539 In order to make sure of the correctness see |
398 \begin{claim} |
540 \begin{claim} |
399 Through consistent mappings, only consistent whole mappings can be reached, and all of the whole mappings are reachable through consistent mappings. |
541 Through consistent mappings, only consistent whole mappings can be |
|
542 reached, and all of the whole mappings are reachable through |
|
543 consistent mappings. |
400 \end{claim} |
544 \end{claim} |
401 |
545 |
402 Note that a state may be reached in many different ways, since the order of insertions into M does not influence the nascent mapping. In fact, the number of different ways which lead to the same state can be exponentially large. If $G_{small}$ and $G_{large}$ are circles with n nodes and n different node labels, there exists exactly one graph isomorphism between them, but it will be reached in $n!$ different ways. |
546 Note that a state may be reached in many different ways, since the |
|
547 order of insertions into M does not influence the nascent mapping. In |
|
548 fact, the number of different ways which lead to the same state can be |
|
549 exponentially large. If $G_{small}$ and $G_{large}$ are circles with n |
|
550 nodes and n different node labels, there exists exactly one graph |
|
551 isomorphism between them, but it will be reached in $n!$ different |
|
552 ways. |
403 |
553 |
404 However, one may observe |
554 However, one may observe |
405 |
555 |
406 \begin{claim} |
556 \begin{claim} |
407 \label{claim:claimTotOrd} |
557 \label{claim:claimTotOrd} |
408 Let $\prec$ an arbitrary total ordering relation on $V_{small}$. |
558 Let $\prec$ an arbitrary total ordering relation on $V_{small}$. If |
409 If the algorithm ignores each $p=(u,v) \in P(s)$, for which |
559 the algorithm ignores each $p=(u,v) \in P(s)$, for which |
410 \begin{center} |
560 \begin{center} |
411 $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$, |
561 $\exists (\hat{u},\hat{v})\in P(s): \hat{u} \prec u$, |
412 \end{center} |
562 \end{center} |
413 then no state can be reached more than ones and each state associated with a whole mapping remains reachable. |
563 then no state can be reached more than ones and each state associated |
|
564 with a whole mapping remains reachable. |
414 \end{claim} |
565 \end{claim} |
415 |
566 |
416 Note that the cornerstone of the improvements to VF2 is a proper choice of a total ordering. |
567 Note that the cornerstone of the improvements to VF2 is a proper |
|
568 choice of a total ordering. |
417 |
569 |
418 \subsection{The candidate set P(s)} |
570 \subsection{The candidate set P(s)} |
419 \label{candidateComputingVF2} |
571 \label{candidateComputingVF2} |
420 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$. |
572 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$. |
421 Suppose that $PT$ is an expanding problem type, see \textbf{Definition~\ref{expPT})}. |
573 Suppose that $PT$ is an expanding problem type, see |
|
574 \textbf{Definition~\ref{expPT})}. |
422 |
575 |
423 \begin{notation} |
576 \begin{notation} |
424 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}} \wedge \tilde{u}$ is covered by $M(s)\}$, and \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not covered by $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}} \wedge \tilde{v}$ is covered by $M(s)\}$ |
577 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by |
|
578 $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}} |
|
579 \wedge \tilde{u}$ is covered by $M(s)\}$, and |
|
580 \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not |
|
581 covered by |
|
582 $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}} |
|
583 \wedge \tilde{v}$ is covered by $M(s)\}$ |
425 \end{notation} |
584 \end{notation} |
426 |
585 |
427 The set $P(s)$ includes the pairs of uncovered neighbours of covered nodes and if there is not such a node pair, all the pairs containing two uncovered nodes are added. Formally, let |
586 The set $P(s)$ includes the pairs of uncovered neighbours of covered |
|
587 nodes and if there is not such a node pair, all the pairs containing |
|
588 two uncovered nodes are added. Formally, let |
428 \[ |
589 \[ |
429 P(s)\!=\! |
590 P(s)\!=\! |
430 \begin{cases} |
591 \begin{cases} |
431 T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if } T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq \emptyset,\\ |
592 T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if } |
432 (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s)) &\hspace{-0.15cm}otherwise. |
593 T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq |
|
594 \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s)) |
|
595 &\hspace{-0.15cm}otherwise. |
433 \end{cases} |
596 \end{cases} |
434 \] |
597 \] |
435 |
598 |
436 \subsection{Consistency} |
599 \subsection{Consistency} |
437 This section defines the consistency functions for the different problem types mentioned in \textbf{Section \ref{sec:CommProb})}. |
600 This section defines the consistency functions for the different |
|
601 problem types mentioned in \textbf{Section \ref{sec:CommProb})}. |
438 \begin{notation} |
602 \begin{notation} |
439 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} : (u,\tilde{u})\in E_{small}\}$\\ |
603 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} : |
440 Let $\mathbf{\Gamma_{large} (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$ |
604 (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large} |
|
605 (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$ |
441 \end{notation} |
606 \end{notation} |
442 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, |
607 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is |
443 $s$ is a state of the matching procedure, |
608 a state of the matching procedure, $M(s)$ is consistent mapping by |
444 $M(s)$ is consistent mapping by $PT$ and $lab(u)=lab(v)$. |
609 $PT$ and $lab(u)=lab(v)$. $Cons_{PT}(p,M(s))$ checks whether |
445 $Cons_{PT}(p,M(s))$ checks whether including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$. |
610 including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$. |
446 |
611 |
447 \subsubsection{Induced subgraph isomorphism} |
612 \subsubsection{Induced subgraph isomorphism} |
448 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small} \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline |
613 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow |
449 The following formulation gives an efficient way of calculating $Cons_{IND}$. |
614 (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small} |
|
615 \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$.\newline The |
|
616 following formulation gives an efficient way of calculating |
|
617 $Cons_{IND}$. |
450 \begin{claim} |
618 \begin{claim} |
451 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v) \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge |
619 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v) |
452 (\forall \tilde{u}\in \Gamma_{small}(u) \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a consistency function in the case of $IND$. |
620 \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge |
|
621 (\forall \tilde{u}\in \Gamma_{small}(u) |
|
622 \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a |
|
623 consistency function in the case of $IND$. |
453 \end{claim} |
624 \end{claim} |
454 |
625 |
455 \subsubsection{Graph isomorphism} |
626 \subsubsection{Graph isomorphism} |
456 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$ $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$. |
627 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $ISO$ |
|
628 $\Leftrightarrow$ $M(s)\cup \{(u,v)\}$ is a consistent mapping by |
|
629 $IND$. |
457 \begin{claim} |
630 \begin{claim} |
458 $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and only if it is a consistency function by $IND$. |
631 $Cons_{ISO}((u,v),M(s))$ is a consistency function by $ISO$ if and |
|
632 only if it is a consistency function by $IND$. |
459 \end{claim} |
633 \end{claim} |
460 \subsubsection{Subgraph isomorphism} |
634 \subsubsection{Subgraph isomorphism} |
461 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small} \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$. |
635 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $SUB$ $\Leftrightarrow |
|
636 (\forall \tilde{u}\in M_{small}:\\(u,\tilde{u})\in E_{small} |
|
637 \Rightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$. |
462 \newline |
638 \newline |
463 The following formulation gives an efficient way of calculating $Cons_{SUB}$. |
639 The following formulation gives an efficient way of calculating |
|
640 $Cons_{SUB}$. |
464 \begin{claim} |
641 \begin{claim} |
465 $Cons_{SUB}((u,v),M(s)):= |
642 $Cons_{SUB}((u,v),M(s)):= (\forall \tilde{u}\in \Gamma_{small}(u) |
466 (\forall \tilde{u}\in \Gamma_{small}(u) \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a consistency function by $SUB$. |
643 \ \cap\ M_{small}(s):\\(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a |
|
644 consistency function by $SUB$. |
467 \end{claim} |
645 \end{claim} |
468 |
646 |
469 \subsection{Cutting rules} |
647 \subsection{Cutting rules} |
470 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can be true only if it is impossible to extended $M(s)\cup \{p\}$ to a whole mapping. |
648 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently |
|
649 verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can |
|
650 be true only if it is impossible to extended $M(s)\cup \{p\}$ to a |
|
651 whole mapping. |
471 \begin{notation} |
652 \begin{notation} |
472 |
653 |
473 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash M_{small}(s))\backslash T_{small}(s)$, and \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash M_{large}(s))\backslash T_{large}(s)$. |
654 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash |
|
655 M_{small}(s))\backslash T_{small}(s)$, and |
|
656 \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash |
|
657 M_{large}(s))\backslash T_{large}(s)$. |
474 \end{notation} |
658 \end{notation} |
475 \subsubsection{Induced subgraph isomorphism} |
659 \subsubsection{Induced subgraph isomorphism} |
476 \begin{claim} |
660 \begin{claim} |
477 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is a cutting function by $IND$. |
661 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |
|
662 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap |
|
663 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap |
|
664 \tilde{T}_{small}(s)|$ is a cutting function by $IND$. |
478 \end{claim} |
665 \end{claim} |
479 \subsubsection{Graph isomorphism} |
666 \subsubsection{Graph isomorphism} |
480 Note that the cutting function of induced subgraph isomorphism defined above is a cutting function by $ISO$, too, however it is less efficient than the following while their computational complexity is the same. |
667 Note that the cutting function of induced subgraph isomorphism defined |
|
668 above is a cutting function by $ISO$, too, however it is less |
|
669 efficient than the following while their computational complexity is |
|
670 the same. |
481 \begin{claim} |
671 \begin{claim} |
482 $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is a cutting function by $ISO$. |
672 $Cut_{ISO}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| \neq |
|
673 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap |
|
674 \tilde{T}_{large}(s)| \neq |\Gamma_{small}(u)\cap |
|
675 \tilde{T}_{small}(s)|$ is a cutting function by $ISO$. |
483 \end{claim} |
676 \end{claim} |
484 |
677 |
485 \subsubsection{Subgraph isomorphism} |
678 \subsubsection{Subgraph isomorphism} |
486 \begin{claim} |
679 \begin{claim} |
487 $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by $SUB$. |
680 $Cut_{SUB}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |
|
681 |\Gamma_{small} (u)\ \cap\ T_{small}(s)|$ is a cutting function by |
|
682 $SUB$. |
488 \end{claim} |
683 \end{claim} |
489 Note that there is a significant difference between induced and non-induced subgraph isomorphism: |
684 Note that there is a significant difference between induced and |
|
685 non-induced subgraph isomorphism: |
490 |
686 |
491 \begin{claim} |
687 \begin{claim} |
492 \label{claimSUB} |
688 \label{claimSUB} |
493 $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ is \textbf{not} a cutting function by $SUB$. |
689 $Cut_{SUB}'((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |
|
690 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap |
|
691 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap \tilde{T}_{small}(s)|$ |
|
692 is \textbf{not} a cutting function by $SUB$. |
494 \end{claim} |
693 \end{claim} |
495 \begin{proof}$ $\\ |
694 \begin{proof}$ $\\ |
496 \vspace*{-0.5cm} |
695 \vspace*{-0.5cm} |
497 |
696 |
498 \begin{figure} |
697 \begin{figure} |
499 \begin{center} |
698 \begin{center} |
500 \begin{tikzpicture} |
699 \begin{tikzpicture} |
501 [scale=.8,auto=left,every node/.style={circle,fill=black!15}] |
700 [scale=.8,auto=left,every node/.style={circle,fill=black!15}] |
502 \node[rectangle,fill=black!15] at (4,6) {$G_{small}$}; |
701 \node[rectangle,fill=black!15] at (4,6) {$G_{small}$}; \node (u4) at |
503 \node (u4) at (2.5,10) {$u_4$}; |
702 (2.5,10) {$u_4$}; \node (u3) at (5.5,10) {$u_3$}; \node (u1) at |
504 \node (u3) at (5.5,10) {$u_3$}; |
703 (2.5,7) {$u_1$}; \node (u2) at (5.5,7) {$u_2$}; |
505 \node (u1) at (2.5,7) {$u_1$}; |
|
506 \node (u2) at (5.5,7) {$u_2$}; |
|
507 |
704 |
508 \node[rectangle,fill=black!30] at (13.5,6) {$G_{large}$}; |
705 \node[rectangle,fill=black!30] at (13.5,6) {$G_{large}$}; |
509 \node[fill=black!30] (v4) at (12,10) {$v_4$}; |
706 \node[fill=black!30] (v4) at (12,10) {$v_4$}; \node[fill=black!30] |
510 \node[fill=black!30] (v3) at (15,10) {$v_3$}; |
707 (v3) at (15,10) {$v_3$}; \node[fill=black!30] (v1) at (12,7) |
511 \node[fill=black!30] (v1) at (12,7) {$v_1$}; |
708 {$v_1$}; \node[fill=black!30] (v2) at (15,7) {$v_2$}; |
512 \node[fill=black!30] (v2) at (15,7) {$v_2$}; |
|
513 |
709 |
514 |
710 |
515 \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1} |
711 \foreach \from/\to in {u1/u2,u2/u3,u3/u4,u4/u1} \draw (\from) -- |
516 \draw (\from) -- (\to); |
712 (\to); \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3} \draw |
517 \foreach \from/\to in {v1/v2,v2/v3,v3/v4,v4/v1,v1/v3} |
713 (\from) -- (\to); |
518 \draw (\from) -- (\to); |
|
519 % \draw[dashed] (\from) -- (\to); |
714 % \draw[dashed] (\from) -- (\to); |
520 \end{tikzpicture} |
715 \end{tikzpicture} |
521 \caption{Graphs for the proof of \textbf{Claim \ref{claimSUB}}} \label{fig:proofSUB} |
716 \caption{Graphs for the proof of \textbf{Claim |
|
717 \ref{claimSUB}}} \label{fig:proofSUB} |
522 \end{center} |
718 \end{center} |
523 \end{figure} |
719 \end{figure} |
524 Let the two graphs of \textbf{Figure \ref{fig:proofSUB})} be the input graphs. |
720 Let the two graphs of \textbf{Figure \ref{fig:proofSUB})} be the input |
525 Suppose the total ordering relation is $u_1 \prec u_2 \prec u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add $(u_2,v_2)\in P(s)$.\newline |
721 graphs. Suppose the total ordering relation is $u_1 \prec u_2 \prec |
526 $Cons_{SUB}((u_2,v_2),M(s))=true$, so $M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function $Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the tree.\newline |
722 u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add |
527 On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true, since\\$0=|\Gamma_{large}(v_2)\cap \tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap \tilde{T}_{small}(s)|=1$ is true, but still the tree can not be pruned, because otherwise the $\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found. |
723 $(u_2,v_2)\in P(s)$.\newline $Cons_{SUB}((u_2,v_2),M(s))=true$, so |
|
724 $M\cup \{(u_2,v_2)\}$ is consistent by $SUB$. The cutting function |
|
725 $Cut_{SUB}((u_2,v_2),M(s))$ is false, so it does not let cut the |
|
726 tree.\newline On the other hand $Cut_{SUB}'((u_2,v_2),M(s))$ is true, |
|
727 since\\$0=|\Gamma_{large}(v_2)\cap |
|
728 \tilde{T}_{large}(s)|<|\Gamma_{small}(u_2)\cap |
|
729 \tilde{T}_{small}(s)|=1$ is true, but still the tree can not be |
|
730 pruned, because otherwise the |
|
731 $\{(u_1,v_1)(u_2,v_2)(u_3,v_3)(u_4,v_4)\}$ mapping can not be found. |
528 \end{proof} |
732 \end{proof} |
529 |
733 |
530 \newpage |
|
531 \section{The VF2++ Algorithm} |
734 \section{The VF2++ Algorithm} |
532 Although any total ordering relation makes the search space of VF2 a tree, its |
735 Although any total ordering relation makes the search space of VF2 a |
533 choice turns out to dramatically influence the number of visited states. The goal is to determine an efficient one as quickly as possible. |
736 tree, its choice turns out to dramatically influence the number of |
534 |
737 visited states. The goal is to determine an efficient one as quickly |
535 The main reason for VF2++' superiority over VF2 is twofold. Firstly, taking into account the structure and the node labeling of the graph, VF2++ determines a state order in which most of the unfruitful branches of the search space can be pruned immediately. Secondly, introducing more efficient --- nevertheless still easier to compute --- cutting rules reduces the chance of going astray even further. |
738 as possible. |
536 |
739 |
537 In addition to the usual subgraph isomorphism, specialized versions for induced subgraph isomorphism and for graph isomorphism have been designed. VF2++ has gained a runtime improvement of one order of magnitude respecting induced subgraph isomorphism and a better asymptotical behaviour in the case of graph isomorphism problem. |
740 The main reason for VF2++' superiority over VF2 is twofold. Firstly, |
538 |
741 taking into account the structure and the node labeling of the graph, |
539 Note that a weaker version of the cutting rules and the more efficient candidate |
742 VF2++ determines a state order in which most of the unfruitful |
540 set calculating were described in \cite{VF2Plus}, too. |
743 branches of the search space can be pruned immediately. Secondly, |
541 |
744 introducing more efficient --- nevertheless still easier to compute |
542 It should be noted that all the methods described in this section are extendable to handle directed graphs and edge labels as well. |
745 --- cutting rules reduces the chance of going astray even further. |
543 |
746 |
544 The basic ideas and the detailed description of VF2++ are provided in the following. |
747 In addition to the usual subgraph isomorphism, specialized versions |
|
748 for induced subgraph isomorphism and for graph isomorphism have been |
|
749 designed. VF2++ has gained a runtime improvement of one order of |
|
750 magnitude respecting induced subgraph isomorphism and a better |
|
751 asymptotical behaviour in the case of graph isomorphism problem. |
|
752 |
|
753 Note that a weaker version of the cutting rules and the more efficient |
|
754 candidate set calculating were described in \cite{VF2Plus}, too. |
|
755 |
|
756 It should be noted that all the methods described in this section are |
|
757 extendable to handle directed graphs and edge labels as well. |
|
758 |
|
759 The basic ideas and the detailed description of VF2++ are provided in |
|
760 the following. |
545 |
761 |
546 \subsection{Preparations} |
762 \subsection{Preparations} |
547 \begin{claim} |
763 \begin{claim} |
548 \label{claim:claimCoverFromLeft} |
764 \label{claim:claimCoverFromLeft} |
549 The total ordering relation uniquely determines a node order, in which the nodes of $V_{small}$ will be covered by VF2. From the point of view of the matching procedure, this means, that always the same node of $G_{small}$ will be covered on the d-th level. |
765 The total ordering relation uniquely determines a node order, in which |
|
766 the nodes of $V_{small}$ will be covered by VF2. From the point of |
|
767 view of the matching procedure, this means, that always the same node |
|
768 of $G_{small}$ will be covered on the d-th level. |
550 \end{claim} |
769 \end{claim} |
551 \begin{proof} |
770 \begin{proof} |
552 In order to make the search space a tree, the pairs in $\{(u,v)\in P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$. |
771 In order to make the search space a tree, the pairs in $\{(u,v)\in |
|
772 P(s) : \exists \hat{u} : \hat{u}\prec u\}$ are excluded from $P(s)$. |
553 \newline |
773 \newline |
554 Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} : \hat{u}\prec u\}$ |
774 Let $\tilde{P}(s):=P(s)\backslash \{(u,v)\in P(s) : \exists \hat{u} : |
|
775 \hat{u}\prec u\}$ |
555 \newline |
776 \newline |
556 The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} : \forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form $\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious, that only $\tilde{u}$ can be covered in $V_{small}$. Actually, $\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in $V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and $T_{small}(s)$ depends only on the covered nodes of $G_{small}$. |
777 The relation $\prec$ is a total ordering, so $\exists!\ \tilde{u} : |
|
778 \forall\ (u,v)\in \tilde{P}(s): u=\tilde u$. Since a pair form |
|
779 $\tilde{P}(s)$ is chosen for including into $M(s)$, it is obvious, |
|
780 that only $\tilde{u}$ can be covered in $V_{small}$. Actually, |
|
781 $\tilde{u}$ is the smallest element in $T_{small}(s)$ (or in |
|
782 $V_{small}\backslash M_{small}(s)$, if $T_{small}(s)$ were empty), and |
|
783 $T_{small}(s)$ depends only on the covered nodes of $G_{small}$. |
557 \newline |
784 \newline |
558 Simple induction on $d$ shows that the set of covered nodes of $G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if $d$ is given. |
785 Simple induction on $d$ shows that the set of covered nodes of |
|
786 $G_{small}$ is unique, if $d$ is given, so $\tilde{u}$ is unique if |
|
787 $d$ is given. |
559 \end{proof} |
788 \end{proof} |
560 |
789 |
561 \begin{definition} |
790 \begin{definition} |
562 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of $V_{small}$ is \textbf{matching order}, if exists $\prec$ total ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$. |
791 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of |
|
792 $V_{small}$ is \textbf{matching order}, if exists $\prec$ total |
|
793 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds |
|
794 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$. |
563 \end{definition} |
795 \end{definition} |
564 |
796 |
565 \begin{claim}\label{claim:MOclaim} |
797 \begin{claim}\label{claim:MOclaim} |
566 A total ordering is matching order, iff the nodes of every component form an interval in the node sequence, and every node connects to a previous node in its component except the first node of the component. The order of the components is arbitrary. |
798 A total ordering is matching order, iff the nodes of every component |
567 \\Formally spoken, an order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of $V_{small}$ is matching order $\Leftrightarrow$ $\forall G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in \{1,..,|V_{small}|\}$\newline |
799 form an interval in the node sequence, and every node connects to a |
|
800 previous node in its component except the first node of the |
|
801 component. The order of the components is arbitrary. \\Formally |
|
802 spoken, an order |
|
803 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of |
|
804 $V_{small}$ is matching order $\Leftrightarrow$ $\forall |
|
805 G'_{small}=(V'_{small},E'_{small})\ component\ of\ G_{small}: \forall |
|
806 i: (\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in |
|
807 V'_{small})\Rightarrow \exists k : k < i \wedge (\forall l: k\leq |
|
808 l\leq i \Rightarrow u_{l}\in V'_{small}) \wedge |
|
809 (u_{\sigma{(k)}},u_{\sigma{(i)}})\in E'_{small}$, where $i,j,k,l\in |
|
810 \{1,..,|V_{small}|\}$\newline |
568 \end{claim} |
811 \end{claim} |
569 \begin{proof} |
812 \begin{proof} |
570 Suppose a matching order is given. It has to be shown that the node sequence has a structure described above.\\ |
813 Suppose a matching order is given. It has to be shown that the node |
571 Let $G'_{small}=(V'_{small},E'_{small})$ be an arbitrary component and $i$ an arbitrary index. |
814 sequence has a structure described above.\\ Let |
|
815 $G'_{small}=(V'_{small},E'_{small})$ be an arbitrary component and $i$ |
|
816 an arbitrary index. |
572 \newline |
817 \newline |
573 $(\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in V'_{small})\Rightarrow u_{\sigma(i)}$ is not the first covered node in $G_{small}\ \Rightarrow u_{\sigma(i)}$ is connected to a covered node $u_{\sigma(k)}$ where $k<i$, since $u_{\sigma(i)}\in T_{small}(s)$ for some $s$ and $T_{small}(s)\subseteq{V'_{small}}$ contains only nodes connected to at least one covered node. It's easy to see, that $\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}$, since $T_{small}(s)$ contains only nodes connected to some covered ones, while it is not empty, but if it were empty, then $u_{\sigma(k)}$ and $u_{\sigma(i)}$ were not in the same component. |
818 $(\exists j : j<i\wedge u_{\sigma(j)},u_{\sigma(i)}\in |
574 |
819 V'_{small})\Rightarrow u_{\sigma(i)}$ is not the first covered node in |
575 Now, let us show that if a node sequence has the special structure described above, it has to be matching order.\\ $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ is a total ordering, which determines the matching order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$. |
820 $G_{small}\ \Rightarrow u_{\sigma(i)}$ is connected to a covered node |
|
821 $u_{\sigma(k)}$ where $k<i$, since $u_{\sigma(i)}\in T_{small}(s)$ for |
|
822 some $s$ and $T_{small}(s)\subseteq{V'_{small}}$ contains only nodes |
|
823 connected to at least one covered node. It's easy to see, that |
|
824 $\forall l: k\leq l\leq i \Rightarrow u_{l}\in V'_{small}$, since |
|
825 $T_{small}(s)$ contains only nodes connected to some covered ones, |
|
826 while it is not empty, but if it were empty, then $u_{\sigma(k)}$ and |
|
827 $u_{\sigma(i)}$ were not in the same component. |
|
828 |
|
829 Now, let us show that if a node sequence has the special structure |
|
830 described above, it has to be matching |
|
831 order.\\ $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ is |
|
832 a total ordering, which determines the matching order |
|
833 $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$. |
576 \end{proof} |
834 \end{proof} |
577 |
835 |
578 To summing up, a total ordering always uniquely determines a matching order, and every matching order can be determined by a total ordering, however, more than one different total orderings may determine the same matching order. |
836 To summing up, a total ordering always uniquely determines a matching |
|
837 order, and every matching order can be determined by a total ordering, |
|
838 however, more than one different total orderings may determine the |
|
839 same matching order. |
579 \subsection{Idea behind the algorithm} |
840 \subsection{Idea behind the algorithm} |
580 The goal is to find a matching order in which the algorithm is able to recognize inconsistency or prune the infeasible branches on the highest levels and goes deep only if it is needed. |
841 The goal is to find a matching order in which the algorithm is able to |
|
842 recognize inconsistency or prune the infeasible branches on the |
|
843 highest levels and goes deep only if it is needed. |
581 |
844 |
582 \begin{notation} |
845 \begin{notation} |
583 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the number of neighbours of u which are in H, where $u\in V_{small} $ and $H\subseteq V_{small}$. |
846 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the |
|
847 number of neighbours of u which are in H, where $u\in V_{small} $ and |
|
848 $H\subseteq V_{small}$. |
584 \end{notation} |
849 \end{notation} |
585 |
850 |
586 The principal question is the following. Suppose a state $s$ is given. For which node of $T_{small}(s)$ is the hardest to find a consistent pair in $G_{large}$? The more covered neighbours a node in $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has ---, the more rarely satisfiable consistency constraints for its pair are given. |
851 The principal question is the following. Suppose a state $s$ is |
587 |
852 given. For which node of $T_{small}(s)$ is the hardest to find a |
588 In biology, most of the graphs are sparse, thus several nodes in $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes reasonable to define a secondary and a tertiary order between them. |
853 consistent pair in $G_{large}$? The more covered neighbours a node in |
589 The observation above proves itself to be as determining, that the secondary ordering prefers nodes with the most uncovered neighbours among which have the same $Conn_{M_{small}(s)}$ to increase $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible. |
854 $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has |
590 The tertiary ordering prefers nodes having the rarest uncovered labels. |
855 ---, the more rarely satisfiable consistency constraints for its pair |
591 |
856 are given. |
592 Note that the secondary ordering is the same as the ordering by $deg$, which is a static data in front of the above used. |
857 |
593 |
858 In biology, most of the graphs are sparse, thus several nodes in |
594 These rules can easily result in a matching order which contains the nodes of a long path successively, whose nodes may have low $Conn$ and is easily matchable into $G_{large}$. To avoid that, a BFS order is used, which provides the shortest possible paths. |
859 $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes |
|
860 reasonable to define a secondary and a tertiary order between them. |
|
861 The observation above proves itself to be as determining, that the |
|
862 secondary ordering prefers nodes with the most uncovered neighbours |
|
863 among which have the same $Conn_{M_{small}(s)}$ to increase |
|
864 $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible. The |
|
865 tertiary ordering prefers nodes having the rarest uncovered labels. |
|
866 |
|
867 Note that the secondary ordering is the same as the ordering by $deg$, |
|
868 which is a static data in front of the above used. |
|
869 |
|
870 These rules can easily result in a matching order which contains the |
|
871 nodes of a long path successively, whose nodes may have low $Conn$ and |
|
872 is easily matchable into $G_{large}$. To avoid that, a BFS order is |
|
873 used, which provides the shortest possible paths. |
595 \newline |
874 \newline |
596 |
875 |
597 In the following, some examples on which the VF2 may be slow are described, although they are easily solvable by using a proper matching order. |
876 In the following, some examples on which the VF2 may be slow are |
|
877 described, although they are easily solvable by using a proper |
|
878 matching order. |
598 |
879 |
599 \begin{example} |
880 \begin{example} |
600 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways without node labels. Let $u\in V_{small}$ and $v\in V_{large}$. |
881 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways |
|
882 without node labels. Let $u\in V_{small}$ and $v\in V_{large}$. |
601 \newline |
883 \newline |
602 $lab(u):=black$ |
884 $lab(u):=black$ |
603 \newline |
885 \newline |
604 $lab(v):=black$ |
886 $lab(v):=black$ |
605 \newline |
887 \newline |
606 $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash \{u\})$ |
888 $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash |
|
889 \{u\})$ |
607 \newline |
890 \newline |
608 $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash \{v\})$ |
891 $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash |
|
892 \{v\})$ |
609 \newline |
893 \newline |
610 |
894 |
611 Now, any mapping by the node label $lab$ must contain $(u,v)$, since $u$ is black and no node in $V_{large}$ has a black label except $v$. If unfortunately $u$ were the last node which will get covered, VF2 would check only in the last steps, whether $u$ can be matched to $v$. |
895 Now, any mapping by the node label $lab$ must contain $(u,v)$, since |
|
896 $u$ is black and no node in $V_{large}$ has a black label except |
|
897 $v$. If unfortunately $u$ were the last node which will get covered, |
|
898 VF2 would check only in the last steps, whether $u$ can be matched to |
|
899 $v$. |
612 \newline |
900 \newline |
613 However, had $u$ been the first matched node, u would have been matched immediately to v, so all the mappings would have been precluded in which node labels can not correspond. |
901 However, had $u$ been the first matched node, u would have been |
|
902 matched immediately to v, so all the mappings would have been |
|
903 precluded in which node labels can not correspond. |
614 \end{example} |
904 \end{example} |
615 |
905 |
616 \begin{example} |
906 \begin{example} |
617 Suppose there is no node label given, $G_{small}$ is a small graph and can not be mapped into $G_{large}$ and $u\in V_{small}$. |
907 Suppose there is no node label given, $G_{small}$ is a small graph and |
|
908 can not be mapped into $G_{large}$ and $u\in V_{small}$. |
618 \newline |
909 \newline |
619 Let $G'_{small}:=(V_{small}\cup \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is, $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint from $G_{small}$ and one of its starting points is connected to $u\in V_{small}\}$. |
910 Let $G'_{small}:=(V_{small}\cup |
|
911 \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup |
|
912 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is, |
|
913 $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint |
|
914 from $G_{small}$ and one of its starting points is connected to $u\in |
|
915 V_{small}\}$. |
620 \newline |
916 \newline |
621 Is there a subgraph of $G_{large}$, which is isomorph with $G'_{small}$? |
917 Is there a subgraph of $G_{large}$, which is isomorph with |
|
918 $G'_{small}$? |
622 \newline |
919 \newline |
623 If unfortunately the nodes of the path were the first $k$ nodes in the matching order, the algorithm would iterate through all the possible k long paths in $G_{large}$, and it would recognize that no path can be extended to $G'_{small}$. |
920 If unfortunately the nodes of the path were the first $k$ nodes in the |
|
921 matching order, the algorithm would iterate through all the possible k |
|
922 long paths in $G_{large}$, and it would recognize that no path can be |
|
923 extended to $G'_{small}$. |
624 \newline |
924 \newline |
625 However, had it started by the matching of $G_{small}$, it would not have matched any nodes of the path. |
925 However, had it started by the matching of $G_{small}$, it would not |
|
926 have matched any nodes of the path. |
626 \end{example} |
927 \end{example} |
627 |
928 |
628 These examples may look artificial, but the same problems also appear in real-world examples, even though in a less obvious way. |
929 These examples may look artificial, but the same problems also appear |
|
930 in real-world examples, even though in a less obvious way. |
629 |
931 |
630 \subsection{Total ordering} |
932 \subsection{Total ordering} |
631 Instead of the total ordering relation, the matching order will be searched directly. |
933 Instead of the total ordering relation, the matching order will be |
|
934 searched directly. |
632 \begin{notation} |
935 \begin{notation} |
633 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} : l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ , where $l$ is a label and $\mathcal{M}\subseteq V_{small}$. |
936 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} : |
|
937 l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ , |
|
938 where $l$ is a label and $\mathcal{M}\subseteq V_{small}$. |
634 \end{notation} |
939 \end{notation} |
635 |
940 |
636 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$. |
941 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u : u\in S \wedge f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$. |
637 \end{definition} |
942 \end{definition} |
638 |
943 |
639 \begin{algorithm} |
944 \begin{algorithm} |
640 \algtext*{EndIf}%ne nyomtasson end if-et |
945 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne |
641 \algtext*{EndFor}%ne nyomtasson .. |
946 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson .. |
642 \algtext*{EndProcedure}%ne nyomtasson .. |
|
643 \algtext*{EndWhile} |
947 \algtext*{EndWhile} |
644 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu} |
948 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu} |
645 \begin{algorithmic}[1] |
949 \begin{algorithmic}[1] |
646 \Procedure{VF2++order}{} |
950 \Procedure{VF2++order}{} \State $\mathcal{M}$ := $\emptyset$ |
647 \State $\mathcal{M}$ := $\emptyset$ \Comment{matching order} |
951 \Comment{matching order} \While{$V_{small}\backslash \mathcal{M} |
648 \While{$V_{small}\backslash \mathcal{M} \neq\emptyset$} |
952 \neq\emptyset$} \State $r\in$ arg max$_{deg}$ (arg |
649 \State $r\in$ arg max$_{deg}$ (arg min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash \mathcal{M})$)\label{alg:findMin} |
953 min$_{F_\mathcal{M}\circ lab}(V_{small}\backslash |
650 \State Compute $T$, a BFS tree with root node $r$. |
954 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with |
651 \For{$d=0,1,...,depth(T)$} |
955 root node $r$. \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the |
652 \State $V_d$:=nodes of the $d$-th level |
956 $d$-th level \State Process $V_d$ \Comment{See Algorithm |
653 \State Process $V_d$ \Comment{See Algorithm \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2})} |
957 \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2})} \EndFor |
654 \EndFor |
958 \EndWhile \EndProcedure |
655 \EndWhile |
|
656 \EndProcedure |
|
657 \end{algorithmic} |
959 \end{algorithmic} |
658 \end{algorithm} |
960 \end{algorithm} |
659 |
961 |
660 \begin{algorithm} |
962 \begin{algorithm} |
661 \algtext*{EndIf}%ne nyomtasson end if-et |
963 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne |
662 \algtext*{EndFor}%ne nyomtasson .. |
964 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson .. |
663 \algtext*{EndProcedure}%ne nyomtasson .. |
|
664 \algtext*{EndWhile} |
965 \algtext*{EndWhile} |
665 \caption{\hspace{.5cm}$A\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1} |
966 \caption{\hspace{.5cm}$A\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1} |
666 \begin{algorithmic}[1] |
967 \begin{algorithmic}[1] |
667 \Procedure{VF2++ProcessLevel1}{$V_{d}$} |
968 \Procedure{VF2++ProcessLevel1}{$V_{d}$} \While{$V_d\neq\emptyset$} |
668 \While{$V_d\neq\emptyset$} |
969 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg |
669 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ lab}($ arg max$_{deg}($arg max$_{Conn_{\mathcal{M}}}(V_{d})))$ |
970 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$ |
670 \State $V_d:=V_d\backslash m$ |
971 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh |
671 \State Append node $m$ to the end of $\mathcal{M}$ |
972 $F_\mathcal{M}$ \EndWhile \EndProcedure |
672 \State Refresh $F_\mathcal{M}$ |
|
673 \EndWhile |
|
674 \EndProcedure |
|
675 \end{algorithmic} |
973 \end{algorithmic} |
676 \end{algorithm} |
974 \end{algorithm} |
677 |
975 |
678 \begin{algorithm} |
976 \begin{algorithm} |
679 \algtext*{EndIf}%ne nyomtasson end if-et |
977 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne |
680 \algtext*{EndFor}%ne nyomtasson .. |
978 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson .. |
681 \algtext*{EndProcedure}%ne nyomtasson .. |
|
682 \algtext*{EndWhile} |
979 \algtext*{EndWhile} |
683 \caption{\hspace{0.5cm}$Another\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess2} |
980 \caption{\hspace{0.5cm}$Another\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess2} |
684 \begin{algorithmic}[1] |
981 \begin{algorithmic}[1] |
685 \Procedure{VF2++ProcessLevel2}{$V_{d}$} |
982 \Procedure{VF2++ProcessLevel2}{$V_{d}$} \State Sort $V_d$ in |
686 \State Sort $V_d$ in descending lex. order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ |
983 descending lex. order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ |
687 \State Append the sorted $V_d$ to the end of $\mathcal{M}$ |
984 \State Append the sorted $V_d$ to the end of $\mathcal{M}$ \State |
688 \State Refresh $F_\mathcal{M}$ |
985 Refresh $F_\mathcal{M}$ \EndProcedure |
689 \EndProcedure |
|
690 \end{algorithmic} |
986 \end{algorithmic} |
691 \end{algorithm} |
987 \end{algorithm} |
692 \textbf{Algorithm~\ref{alg:VF2PPPseu})} is a high level description of the matching order procedure of VF2++. It computes a BFS tree for each component in ascending order of their rarest $lab$ and largest $deg$, whose root vertex is the component's minimal node. \textbf{Algorithm \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} are two different methods to process a level of the BFS tree. |
988 \textbf{Algorithm~\ref{alg:VF2PPPseu})} is a high level description of |
693 |
989 the matching order procedure of VF2++. It computes a BFS tree for each |
694 After sorting the nodes of the current level in descending lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$, \textbf{Algorithm \ref{alg:VF2PPProcess2})} appends them simultaneously to the matching order $\mathcal{M}$ and refreshes $F_\mathcal{M}$ just once, whereas \textbf{Algorithm \ref{alg:VF2PPProcess1})} appends the nodes separately to $\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it provides up-to-date label information and may result in a more efficient matching order. |
990 component in ascending order of their rarest $lab$ and largest $deg$, |
695 |
991 whose root vertex is the component's minimal node. \textbf{Algorithm |
696 \textbf{Claim~\ref{claim:MOclaim})} shows that \textbf{Algorithm \ref{alg:VF2PPPseu})} provides a matching order. |
992 \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} are |
|
993 two different methods to process a level of the BFS tree. |
|
994 |
|
995 After sorting the nodes of the current level in descending |
|
996 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$, |
|
997 \textbf{Algorithm \ref{alg:VF2PPProcess2})} appends them |
|
998 simultaneously to the matching order $\mathcal{M}$ and refreshes |
|
999 $F_\mathcal{M}$ just once, whereas \textbf{Algorithm |
|
1000 \ref{alg:VF2PPProcess1})} appends the nodes separately to |
|
1001 $\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it |
|
1002 provides up-to-date label information and may result in a more |
|
1003 efficient matching order. |
|
1004 |
|
1005 \textbf{Claim~\ref{claim:MOclaim})} shows that \textbf{Algorithm |
|
1006 \ref{alg:VF2PPPseu})} provides a matching order. |
697 |
1007 |
698 |
1008 |
699 \subsection{Cutting rules} |
1009 \subsection{Cutting rules} |
700 \label{VF2PPCuttingRules} |
1010 \label{VF2PPCuttingRules} |
701 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels. |
1011 This section presents the cutting rules of VF2++, which are improved |
|
1012 by using extra information coming from the node labels. |
702 \begin{notation} |
1013 \begin{notation} |
703 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in V_{large}$ and $l$ is a label. |
1014 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l |
|
1015 \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and |
|
1016 $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge |
|
1017 \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in |
|
1018 V_{large}$ and $l$ is a label. |
704 \end{notation} |
1019 \end{notation} |
705 |
1020 |
706 \subsubsection{Induced subgraph isomorphism} |
1021 \subsubsection{Induced subgraph isomorphism} |
707 \begin{claim} |
1022 \begin{claim} |
708 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND. |
1023 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND. |
709 \end{claim} |
1024 \end{claim} |
710 \begin{proof} |
1025 \begin{proof} |
711 It has to be shown, that $LabCut_{IND}((u,v),M(s))=true\Rightarrow$ the mapping can not be extended to a whole mapping.\\ |
1026 It has to be shown, that $LabCut_{IND}((u,v),M(s))=true\Rightarrow$ |
712 $LabCut_{IND}((u,v),M(s))=true,$ iff the following holds. $\\\exists l: |\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$. |
1027 the mapping can not be extended to a whole |
713 |
1028 mapping.\\ $LabCut_{IND}((u,v),M(s))=true,$ iff the following |
714 Suppose that $|\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| < |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)|$. Each node of $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$ has to be matched to a node in $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$, so $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$ can not be smaller than $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$. That is why $M(s)$ can not be extended to a whole mapping. |
1029 holds. $\\\exists l: |\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| < |
715 |
1030 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)| \vee |
716 Otherwise $|\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$ has to be true. Similarly, each node of $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ has to be matched to a node in $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, i.e. $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ can not be smaller than $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, if $M(s)$ is extendible. |
1031 |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |
|
1032 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$. |
|
1033 |
|
1034 Suppose that $|\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)| < |
|
1035 |\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)|$. Each node of |
|
1036 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$ has to be matched to a |
|
1037 node in $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$, so |
|
1038 $\Gamma_{large}^{l} (v)\ \cap\ T_{large}(s)$ can not be smaller than |
|
1039 $\Gamma_{small}^{l} (u)\ \cap\ T_{small}(s)$. That is why $M(s)$ can |
|
1040 not be extended to a whole mapping. |
|
1041 |
|
1042 Otherwise $|\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |
|
1043 |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|$ has to be |
|
1044 true. Similarly, each node of $\Gamma_{large}^{l}(v)\cap |
|
1045 \tilde{T}_{large}(s)$ has to be matched to a node in |
|
1046 $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, |
|
1047 i.e. $\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)$ can not be |
|
1048 smaller than $\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)$, if |
|
1049 $M(s)$ is extendible. |
717 \end{proof} |
1050 \end{proof} |
718 The following claims can be proven similarly. |
1051 The following claims can be proven similarly. |
719 \subsubsection{Graph isomorphism} |
1052 \subsubsection{Graph isomorphism} |
720 \begin{claim} |
1053 \begin{claim} |
721 \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO. |
1054 \[LabCut_{ISO}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!\neq\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| \neq |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by ISO. |
727 \end{claim} |
1060 \end{claim} |
728 |
1061 |
729 |
1062 |
730 |
1063 |
731 \subsection{Implementation details} |
1064 \subsection{Implementation details} |
732 This section provides a detailed summary of an efficient implementation of VF2++. |
1065 This section provides a detailed summary of an efficient |
|
1066 implementation of VF2++. |
733 \subsubsection{Storing a mapping} |
1067 \subsubsection{Storing a mapping} |
734 After fixing an arbitrary node order ($u_0, u_1, .., u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store the current mapping in the following way. |
1068 After fixing an arbitrary node order ($u_0, u_1, .., |
|
1069 u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store |
|
1070 the current mapping in the following way. |
735 \[ |
1071 \[ |
736 M[i] = |
1072 M[i] = |
737 \begin{cases} |
1073 \begin{cases} |
738 v & if\ (u_i,v)\ is\ in\ the\ mapping\\ |
1074 v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INVALID & |
739 INVALID & if\ no\ node\ has\ been\ mapped\ to\ u_i. |
1075 if\ no\ node\ has\ been\ mapped\ to\ u_i. |
740 \end{cases} |
1076 \end{cases} |
741 \] |
1077 \] |
742 Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$ means "no node". |
1078 Where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$ |
|
1079 means "no node". |
743 \subsubsection{Avoiding the recurrence} |
1080 \subsubsection{Avoiding the recurrence} |
744 Exploring the state space was described in a recursive fashion using sets (see \textbf{Algorithm~\ref{alg:VF2Pseu})}), which makes the algorithm easy to understand but does not show directly an efficient way of implementation. The following approach avoiding recursion and using lookup tables instead of sets is not only fast but has linear space complexity as well. |
1081 Exploring the state space was described in a recursive fashion using |
745 |
1082 sets (see \textbf{Algorithm~\ref{alg:VF2Pseu})}), which makes the |
746 The recursion of \textbf{Algorithm~\ref{alg:VF2Pseu})} can be realized as a while loop, which has a loop counter $depth$ denoting the all-time depth of the recursion. |
1083 algorithm easy to understand but does not show directly an efficient |
747 Fixing a matching order, let $M$ denote the array storing the all-time mapping. |
1084 way of implementation. The following approach avoiding recursion and |
748 The initial state is associated with the empty mapping, which means that $\forall i: M[i]=INVALID$ and $depth=0$. |
1085 using lookup tables instead of sets is not only fast but has linear |
749 In case of a recursive call, $depth$ has to be incremented, while in case of a return, it has to be decremented. |
1086 space complexity as well. |
750 Based on \textbf{Claim~\ref{claim:claimCoverFromLeft})}, $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before $depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and $\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes while the state is being processed, but the property is held before both stepping back to a predecessor state and exploring a successor state. |
1087 |
751 |
1088 The recursion of \textbf{Algorithm~\ref{alg:VF2Pseu})} can be realized |
752 The necessary part of the candidate set is easily maintainable or computable by following \textbf{Section~\ref{candidateComputingVF2})}. A much faster method has been designed for biological- and sparse graphs, see the next section for details. |
1089 as a while loop, which has a loop counter $depth$ denoting the |
|
1090 all-time depth of the recursion. Fixing a matching order, let $M$ |
|
1091 denote the array storing the all-time mapping. The initial state is |
|
1092 associated with the empty mapping, which means that $\forall i: |
|
1093 M[i]=INVALID$ and $depth=0$. In case of a recursive call, $depth$ has |
|
1094 to be incremented, while in case of a return, it has to be |
|
1095 decremented. Based on \textbf{Claim~\ref{claim:claimCoverFromLeft})}, |
|
1096 $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before |
|
1097 $depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and |
|
1098 $\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes |
|
1099 while the state is being processed, but the property is held before |
|
1100 both stepping back to a predecessor state and exploring a successor |
|
1101 state. |
|
1102 |
|
1103 The necessary part of the candidate set is easily maintainable or |
|
1104 computable by following |
|
1105 \textbf{Section~\ref{candidateComputingVF2})}. A much faster method |
|
1106 has been designed for biological- and sparse graphs, see the next |
|
1107 section for details. |
753 |
1108 |
754 \subsubsection{Calculating the candidates for a node} |
1109 \subsubsection{Calculating the candidates for a node} |
755 Being aware of \textbf{Claim~\ref{claim:claimCoverFromLeft})}, the task is not to maintain the candidate set, but to generate the candidate nodes in $G_{large}$ for a given node $u\in V_{small}$. |
1110 Being aware of \textbf{Claim~\ref{claim:claimCoverFromLeft})}, the |
756 In case of an expanding problem type and $M$ mapping, if a node $v\in V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall u'\in V_{small} : (u,u')\in E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in E_{large}$. That is, each covered neighbour of $u$ has to be mapped to a covered neighbour of $v$. |
1111 task is not to maintain the candidate set, but to generate the |
757 |
1112 candidate nodes in $G_{large}$ for a given node $u\in V_{small}$. In |
758 Having said that, an algorithm running in $\Theta(deg)$ time is describable if there exists a covered node in the component containing $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily --- such a node exists based on \textbf{Claim~\ref{claim:MOclaim})}. With all the candidates of $u$ being among the uncovered neighbours of $Pair(M,u')$, there are solely $deg(Pair(M,u'))$ nodes to check. |
1113 case of an expanding problem type and $M$ mapping, if a node $v\in |
759 |
1114 V_{large}$ is a potential pair of $u\in V_{small}$, then $\forall |
760 An easy trick is to choose an $u'$, for which $|\{uncovered\ neighbours\ $ $of\ Pair(M,u')\}|$ is the smallest possible. |
1115 u'\in V_{small} : (u,u')\in |
761 |
1116 E_{small}\ and\ u'\ is\ covered\ by\ M\ \Rightarrow (v,Pair(M,u'))\in |
762 Note that if $u$ is the first node of its component, then all the uncovered nodes of $G_{large}$ are candidates, so giving a sublinear method is impossible. |
1117 E_{large}$. That is, each covered neighbour of $u$ has to be mapped to |
|
1118 a covered neighbour of $v$. |
|
1119 |
|
1120 Having said that, an algorithm running in $\Theta(deg)$ time is |
|
1121 describable if there exists a covered node in the component containing |
|
1122 $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily |
|
1123 --- such a node exists based on |
|
1124 \textbf{Claim~\ref{claim:MOclaim})}. With all the candidates of $u$ |
|
1125 being among the uncovered neighbours of $Pair(M,u')$, there are solely |
|
1126 $deg(Pair(M,u'))$ nodes to check. |
|
1127 |
|
1128 An easy trick is to choose an $u'$, for which |
|
1129 $|\{uncovered\ neighbours\ $ $of\ Pair(M,u')\}|$ is the smallest |
|
1130 possible. |
|
1131 |
|
1132 Note that if $u$ is the first node of its component, then all the |
|
1133 uncovered nodes of $G_{large}$ are candidates, so giving a sublinear |
|
1134 method is impossible. |
763 |
1135 |
764 |
1136 |
765 \subsubsection{Determining the node order} |
1137 \subsubsection{Determining the node order} |
766 This section describes how the node order preprocessing method of VF2++ can efficiently be implemented. |
1138 This section describes how the node order preprocessing method of |
767 |
1139 VF2++ can efficiently be implemented. |
768 For using lookup tables, the node labels are associated with the numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It enables $F_\mathcal{M}$ to be stored in an array, for which $F_\mathcal{M}[i]=F_\mathcal{M}(i)$ where $i=0,1,..,|K|-1$. At first, $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes in $V_{small}$ having label i, which is easy to compute in $\Theta(|V_{small}|)$ steps. |
1140 |
769 |
1141 For using lookup tables, the node labels are associated with the |
770 $\mathcal{M}\subseteq V_{small}$ can be represented as an array of size $|V_{small}|$. |
1142 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It |
771 |
1143 enables $F_\mathcal{M}$ to be stored in an array, for which |
772 The BFS tree is computed by using a FIFO data structure which is usually implemented as a linked list, but one can avoid it by using the array $\mathcal{M}$ itself. $\mathcal{M}$ contains all the nodes seen before, a pointer shows where the first node of the FIFO is, and another one shows where the next explored node has to be inserted. So the nodes of each level of the BFS tree can be processed by \textbf{Algorithm \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} in place by swapping nodes. |
1144 $F_\mathcal{M}[i]=F_\mathcal{M}(i)$ where $i=0,1,..,|K|-1$. At first, |
773 |
1145 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes |
774 After a node $u$ gets to the next place of the node order, $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is one less covered node in $V_{large}$ with label $lab(u)$, that is why min selection sort is preferred which gives the elements from left to right in descending order, see \textbf{Algorithm \ref{alg:VF2PPProcess1})}. |
1146 in $V_{small}$ having label i, which is easy to compute in |
775 |
1147 $\Theta(|V_{small}|)$ steps. |
776 Note that using a $\Theta(n^2)$ sort absolutely does not slow down the procedure on biological (and on sparse) graphs, since they have few nodes on a level. If a level had a large number of nodes, \textbf{Algorithm \ref{alg:VF2PPProcess2})} would seem to be a better choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not be immediately refreshed, so it is unable to provide up-to-date label information. |
1148 |
777 |
1149 $\mathcal{M}\subseteq V_{small}$ can be represented as an array of |
778 Note that the \textit{while loop} of \textbf{Algorithm \ref{alg:VF2PPPseu})} takes one iteration per graph component and the graphs in biology are mostly connected. |
1150 size $|V_{small}|$. |
|
1151 |
|
1152 The BFS tree is computed by using a FIFO data structure which is |
|
1153 usually implemented as a linked list, but one can avoid it by using |
|
1154 the array $\mathcal{M}$ itself. $\mathcal{M}$ contains all the nodes |
|
1155 seen before, a pointer shows where the first node of the FIFO is, and |
|
1156 another one shows where the next explored node has to be inserted. So |
|
1157 the nodes of each level of the BFS tree can be processed by |
|
1158 \textbf{Algorithm \ref{alg:VF2PPProcess1})} and |
|
1159 \textbf{\ref{alg:VF2PPProcess2})} in place by swapping nodes. |
|
1160 |
|
1161 After a node $u$ gets to the next place of the node order, |
|
1162 $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is |
|
1163 one less covered node in $V_{large}$ with label $lab(u)$, that is why |
|
1164 min selection sort is preferred which gives the elements from left to |
|
1165 right in descending order, see \textbf{Algorithm |
|
1166 \ref{alg:VF2PPProcess1})}. |
|
1167 |
|
1168 Note that using a $\Theta(n^2)$ sort absolutely does not slow down the |
|
1169 procedure on biological (and on sparse) graphs, since they have few |
|
1170 nodes on a level. If a level had a large number of nodes, |
|
1171 \textbf{Algorithm \ref{alg:VF2PPProcess2})} would seem to be a better |
|
1172 choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the |
|
1173 efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not |
|
1174 be immediately refreshed, so it is unable to provide up-to-date label |
|
1175 information. |
|
1176 |
|
1177 Note that the \textit{while loop} of \textbf{Algorithm |
|
1178 \ref{alg:VF2PPPseu})} takes one iteration per graph component and |
|
1179 the graphs in biology are mostly connected. |
779 \subsubsection{Cutting rules} |
1180 \subsubsection{Cutting rules} |
780 In \textbf{Section \ref{VF2PPCuttingRules})}, the cutting rules were described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$ and $\tilde T_{large}$, which are dependent on the all-time mapping (i.e. on the all-time state). The aim is to check the labeled cutting rules of VF2++ in $\Theta(deg)$ time. |
1181 In \textbf{Section \ref{VF2PPCuttingRules})}, the cutting rules were |
781 |
1182 described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$ |
782 Firstly, suppose that these four sets are given in such a way, that checking whether a node is in a certain set takes constant time, e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an initially zero integer lookup table of size $|K|$. After incrementing $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap T_{large}(s)$, the first part of the cutting rules is checkable in $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$ to zero takes $\Theta(deg)$ time again, which makes it possible to use the same table through the whole algorithm. |
1183 and $\tilde T_{large}$, which are dependent on the all-time mapping |
783 The second part of the cutting rules can be verified using the same method with $\tilde T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and $T_{large}$. Thus, the overall complexity is $\Theta(deg)$. |
1184 (i.e. on the all-time state). The aim is to check the labeled cutting |
784 |
1185 rules of VF2++ in $\Theta(deg)$ time. |
785 An other integer lookup table storing the number of covered neighbours of each node in $G_{large}$ gives all the information about the sets $T_{large}$ and $\tilde T_{large}$, which is maintainable in $\Theta(deg)$ time when a pair is added or substracted by incrementing or decrementing the proper indices. A further improvement is that the values of $L[lab(u')]$ in case of checking $u$ is dependent only on $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an array of pairs (label, number of such labels) can be stored to skip the maintaining operations. Note that these arrays are at most of size $deg$. Skipping this trick, the number of covered neighbours has to be stored for each node of $G_{small}$ as well to get the sets $T_{small}$ and $\tilde T_{small}$. |
1186 |
786 |
1187 Firstly, suppose that these four sets are given in such a way, that |
787 Using similar tricks, the consistency function can be evaluated in $\Theta(deg)$ steps, as well. |
1188 checking whether a node is in a certain set takes constant time, |
|
1189 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an |
|
1190 initially zero integer lookup table of size $|K|$. After incrementing |
|
1191 $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and |
|
1192 decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap |
|
1193 T_{large}(s)$, the first part of the cutting rules is checkable in |
|
1194 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$ |
|
1195 to zero takes $\Theta(deg)$ time again, which makes it possible to use |
|
1196 the same table through the whole algorithm. The second part of the |
|
1197 cutting rules can be verified using the same method with $\tilde |
|
1198 T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and |
|
1199 $T_{large}$. Thus, the overall complexity is $\Theta(deg)$. |
|
1200 |
|
1201 An other integer lookup table storing the number of covered neighbours |
|
1202 of each node in $G_{large}$ gives all the information about the sets |
|
1203 $T_{large}$ and $\tilde T_{large}$, which is maintainable in |
|
1204 $\Theta(deg)$ time when a pair is added or substracted by incrementing |
|
1205 or decrementing the proper indices. A further improvement is that the |
|
1206 values of $L[lab(u')]$ in case of checking $u$ is dependent only on |
|
1207 $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an |
|
1208 array of pairs (label, number of such labels) can be stored to skip |
|
1209 the maintaining operations. Note that these arrays are at most of size |
|
1210 $deg$. Skipping this trick, the number of covered neighbours has to be |
|
1211 stored for each node of $G_{small}$ as well to get the sets |
|
1212 $T_{small}$ and $\tilde T_{small}$. |
|
1213 |
|
1214 Using similar tricks, the consistency function can be evaluated in |
|
1215 $\Theta(deg)$ steps, as well. |
788 |
1216 |
789 \section{The VF2 Plus Algorithm} |
1217 \section{The VF2 Plus Algorithm} |
790 The VF2 Plus algorithm is a recently improved version of VF2. It was compared with the state of the art algorithms in \cite{VF2Plus} and has proven itself to be competitive with RI, the best algorithm on biological graphs. |
1218 The VF2 Plus algorithm is a recently improved version of VF2. It was |
791 \\ |
1219 compared with the state of the art algorithms in \cite{VF2Plus} and |
792 A short summary of VF2 Plus follows, which uses the notation and the conventions of the original paper. |
1220 has proven itself to be competitive with RI, the best algorithm on |
|
1221 biological graphs. \\ A short summary of VF2 Plus follows, which uses |
|
1222 the notation and the conventions of the original paper. |
793 |
1223 |
794 \subsection{Ordering procedure} |
1224 \subsection{Ordering procedure} |
795 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$ with the lowest probability to find a pair in $V_{small}$ and the highest number of connections with the nodes already sorted by the algorithm. |
1225 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$ |
|
1226 with the lowest probability to find a pair in $V_{small}$ and the |
|
1227 highest number of connections with the nodes already sorted by the |
|
1228 algorithm. |
796 |
1229 |
797 \begin{definition} |
1230 \begin{definition} |
798 $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$. |
1231 $(u,v)$ is a \textbf{feasible pair}, if $lab(u)=lab(v)$ and |
|
1232 $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$. |
799 \end{definition} |
1233 \end{definition} |
800 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in $V_{large}$ |
1234 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in |
|
1235 $V_{large}$ |
801 \newline |
1236 \newline |
802 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in $V_{large}$ |
1237 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in |
|
1238 $V_{large}$ |
803 \newline |
1239 \newline |
804 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ |
1240 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of |
805 $M$ is the set of already sorted nodes, $T$ is the set of nodes candidate to be selected, and $degreeM$ of a node is the number of its neighbours in $M$. |
1241 already sorted nodes, $T$ is the set of nodes candidate to be |
|
1242 selected, and $degreeM$ of a node is the number of its neighbours in |
|
1243 $M$. |
806 \begin{algorithm} |
1244 \begin{algorithm} |
807 \algtext*{EndIf}%ne nyomtasson end if-et |
1245 \algtext*{EndIf}%ne nyomtasson end if-et \algtext*{EndFor}%ne |
808 \algtext*{EndFor}%ne nyomtasson .. |
1246 nyomtasson .. \algtext*{EndProcedure}%ne nyomtasson .. |
809 \algtext*{EndProcedure}%ne nyomtasson .. |
|
810 \algtext*{EndWhile} |
1247 \algtext*{EndWhile} |
811 \caption{}\label{alg:VF2PlusPseu} |
1248 \caption{}\label{alg:VF2PlusPseu} |
812 \begin{algorithmic}[1] |
1249 \begin{algorithmic}[1] |
813 \Procedure{VF2 Plus order}{} |
1250 \Procedure{VF2 Plus order}{} \State Select the node with the lowest |
814 \State Select the node with the lowest $P$. |
1251 $P$. \If {more nodes share the same $P$} \State select the one with |
815 \If {more nodes share the same $P$} |
1252 maximum degree \EndIf \If {more nodes share the same $P$ and have the |
816 \State select the one with maximum degree |
1253 max degree} \State select the first \EndIf \State Put the selected |
817 \EndIf |
1254 node in the set $M$. \label{alg:putIn} \State Put all its unsorted |
818 \If {more nodes share the same $P$ and have the max degree} |
1255 neighbours in the set $T$. \If {$M\neq V_{small}$} \State From set |
819 \State select the first |
1256 $T$ select the node with maximum $degreeM$. \If {more nodes have |
820 \EndIf |
1257 maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf |
821 \State Put the selected node in the set $M$. \label{alg:putIn} |
1258 \If {more nodes have maximum $degreeM$ and $P$} \State Select the |
822 \State Put all its unsorted neighbours in the set $T$. |
1259 first. \EndIf \State \textbf{goto \ref{alg:putIn}.} \EndIf |
823 \If {$M\neq V_{small}$} |
|
824 \State From set $T$ select the node with maximum $degreeM$. |
|
825 \If {more nodes have maximum $degreeM$} |
|
826 \State Select the one with the lowest $P$ |
|
827 \EndIf |
|
828 \If {more nodes have maximum $degreeM$ and $P$} |
|
829 \State Select the first. |
|
830 \EndIf |
|
831 \State \textbf{goto \ref{alg:putIn}.} |
|
832 \EndIf |
|
833 \EndProcedure |
1260 \EndProcedure |
834 \end{algorithmic} |
1261 \end{algorithmic} |
835 \end{algorithm} |
1262 \end{algorithm} |
836 |
1263 |
837 Using these notations, \textbf{Algorithm~\ref{alg:VF2PlusPseu})} provides the description of the sorting procedure. |
1264 Using these notations, \textbf{Algorithm~\ref{alg:VF2PlusPseu})} |
838 |
1265 provides the description of the sorting procedure. |
839 Note that $P(u)$ is not the exact probability of finding a consistent pair for $u$ by choosing a node of $V_{large}$ randomly, since $P_{lab}$ and $P_{deg}$ are not independent, though calculating the real probability would take quadratic time, which may be reduced by using fittingly lookup tables. |
1266 |
840 |
1267 Note that $P(u)$ is not the exact probability of finding a consistent |
841 \newpage |
1268 pair for $u$ by choosing a node of $V_{large}$ randomly, since |
|
1269 $P_{lab}$ and $P_{deg}$ are not independent, though calculating the |
|
1270 real probability would take quadratic time, which may be reduced by |
|
1271 using fittingly lookup tables. |
|
1272 |
842 \section{Experimental results} |
1273 \section{Experimental results} |
843 This section compares the performance of VF2++ and VF2 Plus. Both algorithms have run faster with orders of magnitude than VF2, thus its inclusion was not reasonable. |
1274 This section compares the performance of VF2++ and VF2 Plus. Both |
|
1275 algorithms have run faster with orders of magnitude than VF2, thus its |
|
1276 inclusion was not reasonable. |
844 \subsection{Biological graphs} |
1277 \subsection{Biological graphs} |
845 The tests have been executed on a recent biological dataset created for the International Contest on Pattern Search in Biological Databases\cite{Content}, which has been constructed of Molecule, Protein and Contact Map graphs extracted from the Protein Data Bank\cite{ProteinDataBank}. |
1278 The tests have been executed on a recent biological dataset created |
846 |
1279 for the International Contest on Pattern Search in Biological |
847 The molecule dataset contains small graphs with less than 100 nodes and an average degree of less than 3. The protein dataset contains graphs having 500-10 000 nodes and an average degree of 4, while the contact map dataset contains graphs with 150-800 nodes and an average degree of 20. |
1280 Databases\cite{Content}, which has been constructed of Molecule, |
848 \\ |
1281 Protein and Contact Map graphs extracted from the Protein Data |
849 |
1282 Bank\cite{ProteinDataBank}. |
850 In the following, the induced subgraph isomorphism and the graph isomorphism will be examined. |
1283 |
|
1284 The molecule dataset contains small graphs with less than 100 nodes |
|
1285 and an average degree of less than 3. The protein dataset contains |
|
1286 graphs having 500-10 000 nodes and an average degree of 4, while the |
|
1287 contact map dataset contains graphs with 150-800 nodes and an average |
|
1288 degree of 20. \\ |
|
1289 |
|
1290 In the following, the induced subgraph isomorphism and the graph |
|
1291 isomorphism will be examined. |
851 |
1292 |
852 \subsubsection{Induced subgraph isomorphism} |
1293 \subsubsection{Induced subgraph isomorphism} |
853 This dataset contains a set of graph pairs, and \textbf{all} the induced subgraph ismorphisms have to be found between them. \textbf{Figure \ref{fig:INDProt}), \ref{fig:INDContact}),} and \textbf{ \ref{fig:INDMolecule})} show the solution time of the problems in the problem set. |
1294 This dataset contains a set of graph pairs, and \textbf{all} the |
|
1295 induced subgraph ismorphisms have to be found between |
|
1296 them. \textbf{Figure \ref{fig:INDProt}), \ref{fig:INDContact}),} and |
|
1297 \textbf{ \ref{fig:INDMolecule})} show the solution time of the |
|
1298 problems in the problem set. |
854 |
1299 |
855 \begin{figure}[H] |
1300 \begin{figure}[H] |
856 \begin{center} |
1301 \begin{center} |
857 \begin{tikzpicture} |
1302 \begin{tikzpicture} |
858 \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
1303 \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
859 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] |
1304 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
860 %\addplot+[only marks] table {proteinsOrig.txt}; |
1305 west},scaled x ticks = false,x tick label style={/pgf/number |
861 \addplot[mark=*,mark size=1.2pt,color=blue] table {Orig/Proteins.256.txt}; |
1306 format/1000 sep = \thinspace}] %\addplot+[only marks] table |
862 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt}; |
1307 {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue] |
|
1308 table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark |
|
1309 size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt}; |
863 \end{axis} |
1310 \end{axis} |
864 \end{tikzpicture} |
1311 \end{tikzpicture} |
865 \end{center} |
1312 \end{center} |
866 \vspace*{-0.8cm} |
1313 \vspace*{-0.8cm} |
867 \caption{Both the algorithms have linear behaviour on protein graphs. VF2++ is more than 10 times faster than VF2 Plus.} \label{fig:INDProt} |
1314 \caption{Both the algorithms have linear behaviour on protein |
|
1315 graphs. VF2++ is more than 10 times faster than VF2 |
|
1316 Plus.} \label{fig:INDProt} |
868 \end{figure} |
1317 \end{figure} |
869 |
1318 |
870 \begin{figure}[H] |
1319 \begin{figure}[H] |
871 \begin{center} |
1320 \begin{center} |
872 \begin{tikzpicture} |
1321 \begin{tikzpicture} |
873 \begin{axis}[title=Contact Maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
1322 \begin{axis}[title=Contact Maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
874 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] |
1323 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
|
1324 west},scaled x ticks = false,x tick label style={/pgf/number |
|
1325 format/1000 sep = \thinspace}] |
875 %\addplot+[only marks] table {proteinsOrig.txt}; |
1326 %\addplot+[only marks] table {proteinsOrig.txt}; |
876 \addplot table {Orig/ContactMaps.128.txt}; |
1327 \addplot table {Orig/ContactMaps.128.txt}; |
877 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/ContactMaps.128.txt}; |
1328 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
|
1329 {VF2PPLabel/ContactMaps.128.txt}; |
878 \end{axis} |
1330 \end{axis} |
879 \end{tikzpicture} |
1331 \end{tikzpicture} |
880 \end{center} |
1332 \end{center} |
881 \vspace*{-0.8cm} |
1333 \vspace*{-0.8cm} |
882 \caption{On Contact Maps, VF2++ runs in near constant time, while VF2 Plus has a near linear behaviour.} \label{fig:INDContact} |
1334 \caption{On Contact Maps, VF2++ runs in near constant time, while VF2 |
|
1335 Plus has a near linear behaviour.} \label{fig:INDContact} |
883 \end{figure} |
1336 \end{figure} |
884 |
1337 |
885 \begin{figure}[H] |
1338 \begin{figure}[H] |
886 \begin{center} |
1339 \begin{center} |
887 \begin{tikzpicture} |
1340 \begin{tikzpicture} |
888 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
1341 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
889 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] |
1342 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
890 %\addplot+[only marks] table {proteinsOrig.txt}; |
1343 west},scaled x ticks = false,x tick label style={/pgf/number |
891 \addplot table {Orig/Molecules.32.txt}; |
1344 format/1000 sep = \thinspace}] |
892 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt}; |
1345 %\addplot+[only marks] table {proteinsOrig.txt}; |
|
1346 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark |
|
1347 size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt}; |
893 \end{axis} |
1348 \end{axis} |
894 \end{tikzpicture} |
1349 \end{tikzpicture} |
895 \end{center} |
1350 \end{center} |
896 \vspace*{-0.8cm} |
1351 \vspace*{-0.8cm} |
897 \caption{In the case of Molecules, the algorithms seem to have a similar behaviour, but VF2++ is almost two times faster even on such small graphs.} \label{fig:INDMolecule} |
1352 \caption{In the case of Molecules, the algorithms seem to have a |
|
1353 similar behaviour, but VF2++ is almost two times faster even on such |
|
1354 small graphs.} \label{fig:INDMolecule} |
898 \end{figure} |
1355 \end{figure} |
899 |
1356 |
900 |
1357 |
901 \subsubsection{Graph ismorphism} |
1358 \subsubsection{Graph ismorphism} |
902 In this experiment, the nodes of each graph in the database have been shuffled and an isomorphism between the shuffled and the original graph has been searched. For runtime results, see \textbf{Figure \ref{fig:ISOProt}), \ref{fig:ISOContact}),} and \textbf{\ref{fig:ISOMolecule})}. |
1359 In this experiment, the nodes of each graph in the database have been |
|
1360 shuffled and an isomorphism between the shuffled and the original |
|
1361 graph has been searched. For runtime results, see \textbf{Figure |
|
1362 \ref{fig:ISOProt}), \ref{fig:ISOContact}),} and |
|
1363 \textbf{\ref{fig:ISOMolecule})}. |
903 \begin{figure}[H] |
1364 \begin{figure}[H] |
904 \begin{center} |
1365 \begin{center} |
905 \begin{tikzpicture} |
1366 \begin{tikzpicture} |
906 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
1367 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
907 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] |
1368 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
908 %\addplot+[only marks] table {proteinsOrig.txt}; |
1369 west},scaled x ticks = false,x tick label style={/pgf/number |
909 \addplot table {Orig/proteinsIso.txt}; |
1370 format/1000 sep = \thinspace}] |
910 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt}; |
1371 %\addplot+[only marks] table {proteinsOrig.txt}; |
|
1372 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark |
|
1373 size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt}; |
911 \end{axis} |
1374 \end{axis} |
912 \end{tikzpicture} |
1375 \end{tikzpicture} |
913 \end{center} |
1376 \end{center} |
914 \vspace*{-0.8cm} |
1377 \vspace*{-0.8cm} |
915 \caption{On protein graphs, VF2 Plus has a super linear time complexity, while VF2++ runs in near constant time. The difference is about two order of magnitude on large graphs.}\label{fig:ISOProt} |
1378 \caption{On protein graphs, VF2 Plus has a super linear time |
|
1379 complexity, while VF2++ runs in near constant time. The difference |
|
1380 is about two order of magnitude on large graphs.}\label{fig:ISOProt} |
916 \end{figure} |
1381 \end{figure} |
917 |
1382 |
918 \begin{figure}[H] |
1383 \begin{figure}[H] |
919 \begin{center} |
1384 \begin{center} |
920 \begin{tikzpicture} |
1385 \begin{tikzpicture} |
921 \begin{axis}[title=Contact Maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
1386 \begin{axis}[title=Contact Maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
922 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] |
1387 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
923 %\addplot+[only marks] table {proteinsOrig.txt}; |
1388 west},scaled x ticks = false,x tick label style={/pgf/number |
924 \addplot table {Orig/contactMapsIso.txt}; |
1389 format/1000 sep = \thinspace}] |
925 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt}; |
1390 %\addplot+[only marks] table {proteinsOrig.txt}; |
|
1391 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark |
|
1392 size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt}; |
926 \end{axis} |
1393 \end{axis} |
927 \end{tikzpicture} |
1394 \end{tikzpicture} |
928 \end{center} |
1395 \end{center} |
929 \vspace*{-0.8cm} |
1396 \vspace*{-0.8cm} |
930 \caption{The results are closer to each other on Contact Maps, but VF2++ still performs consistently better.}\label{fig:ISOContact} |
1397 \caption{The results are closer to each other on Contact Maps, but |
|
1398 VF2++ still performs consistently better.}\label{fig:ISOContact} |
931 \end{figure} |
1399 \end{figure} |
932 |
1400 |
933 \begin{figure}[H] |
1401 \begin{figure}[H] |
934 \begin{center} |
1402 \begin{center} |
935 \begin{tikzpicture} |
1403 \begin{tikzpicture} |
936 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
1404 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
937 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] |
1405 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
938 %\addplot+[only marks] table {proteinsOrig.txt}; |
1406 west},scaled x ticks = false,x tick label style={/pgf/number |
939 \addplot table {Orig/moleculesIso.txt}; |
1407 format/1000 sep = \thinspace}] |
940 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt}; |
1408 %\addplot+[only marks] table {proteinsOrig.txt}; |
|
1409 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark |
|
1410 size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt}; |
941 \end{axis} |
1411 \end{axis} |
942 \end{tikzpicture} |
1412 \end{tikzpicture} |
943 \end{center} |
1413 \end{center} |
944 \vspace*{-0.8cm} |
1414 \vspace*{-0.8cm} |
945 \caption{In the case of Molecules, there is not such a significant difference, but VF2++ seems to be faster as the number of nodes increases.}\label{fig:ISOMolecule} |
1415 \caption{In the case of Molecules, there is not such a significant |
|
1416 difference, but VF2++ seems to be faster as the number of nodes |
|
1417 increases.}\label{fig:ISOMolecule} |
946 \end{figure} |
1418 \end{figure} |
947 |
1419 |
948 |
1420 |
949 \subsection{Random graphs} |
1421 \subsection{Random graphs} |
950 This section compares VF2++ with VF2 Plus on random graphs of a large size. The node labels are uniformly distributed. |
1422 This section compares VF2++ with VF2 Plus on random graphs of a large |
951 Let $\delta$ denote the average degree. |
1423 size. The node labels are uniformly distributed. Let $\delta$ denote |
952 For the parameters of problems solved in the experiments, please see the top of each chart. |
1424 the average degree. For the parameters of problems solved in the |
|
1425 experiments, please see the top of each chart. |
953 \subsubsection{Graph isomorphism} |
1426 \subsubsection{Graph isomorphism} |
954 To evaluate the efficiency of the algorithms in the case of graph isomorphism, connected graphs of less than 20 000 nodes have been considered. Generating a random graph and shuffling its nodes, an isomorphism had to be found. \textbf{Figure \ref{fig:randISO5}), \ref{fig:randISO10}), \ref{fig:randISO15}), \ref{fig:randISO35}), \ref{fig:randISO45}),} and \textbf{\ref{fig:randISO100}) } show the runtime results on graph sets of various density. |
1427 To evaluate the efficiency of the algorithms in the case of graph |
|
1428 isomorphism, connected graphs of less than 20 000 nodes have been |
|
1429 considered. Generating a random graph and shuffling its nodes, an |
|
1430 isomorphism had to be found. \textbf{Figure \ref{fig:randISO5}), |
|
1431 \ref{fig:randISO10}), \ref{fig:randISO15}), \ref{fig:randISO35}), |
|
1432 \ref{fig:randISO45}),} and \textbf{\ref{fig:randISO100}) } show the |
|
1433 runtime results on graph sets of various density. |
955 |
1434 |
956 \begin{figure}[H] |
1435 \begin{figure}[H] |
957 \begin{center} |
1436 \begin{center} |
958 \begin{tikzpicture} |
1437 \begin{tikzpicture} |
959 \begin{axis}[title={Random ISO, $\delta = 5$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
1438 \begin{axis}[title={Random ISO, $\delta = 5$},xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
960 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] |
1439 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
|
1440 west},scaled x ticks = false,x tick label style={/pgf/number |
|
1441 format/1000 sep = \thinspace}] |
961 %\addplot+[only marks] table {proteinsOrig.txt}; |
1442 %\addplot+[only marks] table {proteinsOrig.txt}; |
962 \addplot table {randGraph/iso/vf2pIso5_1.txt}; |
1443 \addplot table {randGraph/iso/vf2pIso5_1.txt}; |
963 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/iso/vf2ppIso5_1.txt}; |
1444 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
|
1445 {randGraph/iso/vf2ppIso5_1.txt}; |
964 \end{axis} |
1446 \end{axis} |
965 \end{tikzpicture} |
1447 \end{tikzpicture} |
966 \end{center} |
1448 \end{center} |
967 \vspace*{-0.8cm} |
1449 \vspace*{-0.8cm} |
968 \caption{}\label{fig:randISO5} |
1450 \caption{}\label{fig:randISO5} |
1258 \vspace*{-0.8cm} |
1837 \vspace*{-0.8cm} |
1259 \begin{subfigure}[b]{0.55\textwidth} |
1838 \begin{subfigure}[b]{0.55\textwidth} |
1260 \begin{center} |
1839 \begin{center} |
1261 \begin{tikzpicture} |
1840 \begin{tikzpicture} |
1262 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1841 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1263 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] |
1842 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
|
1843 west},scaled x ticks = false,x tick label style={/pgf/number |
|
1844 format/1000 sep = \space}] |
1264 %\addplot+[only marks] table {proteinsOrig.txt}; |
1845 %\addplot+[only marks] table {proteinsOrig.txt}; |
1265 \addplot table {randGraph/ind/vf2pInd35_0.05.txt}; |
1846 \addplot table {randGraph/ind/vf2pInd35_0.05.txt}; |
1266 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.05.txt}; |
1847 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
|
1848 {randGraph/ind/vf2ppInd35_0.05.txt}; |
1267 \end{axis} |
1849 \end{axis} |
1268 \end{tikzpicture} |
1850 \end{tikzpicture} |
1269 \end{center} |
1851 \end{center} |
1270 \end{subfigure} |
1852 \end{subfigure} |
1271 \begin{subfigure}[b]{0.55\textwidth} |
1853 \begin{subfigure}[b]{0.55\textwidth} |
1272 \begin{center} |
1854 \begin{center} |
1273 \begin{tikzpicture} |
1855 \begin{tikzpicture} |
1274 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1856 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.1$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1275 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] |
1857 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
|
1858 west},scaled x ticks = false,x tick label style={/pgf/number |
|
1859 format/1000 sep = \space}] |
1276 %\addplot+[only marks] table {proteinsOrig.txt}; |
1860 %\addplot+[only marks] table {proteinsOrig.txt}; |
1277 \addplot table {randGraph/ind/vf2pInd35_0.1.txt}; |
1861 \addplot table {randGraph/ind/vf2pInd35_0.1.txt}; |
1278 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.1.txt}; |
1862 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
|
1863 {randGraph/ind/vf2ppInd35_0.1.txt}; |
1279 \end{axis} |
1864 \end{axis} |
1280 \end{tikzpicture} |
1865 \end{tikzpicture} |
1281 \end{center} |
1866 \end{center} |
1282 \end{subfigure} |
1867 \end{subfigure} |
1283 \hspace{1cm} |
1868 \hspace{1cm} |
1284 \begin{subfigure}[b]{0.55\textwidth} |
1869 \begin{subfigure}[b]{0.55\textwidth} |
1285 \begin{center} |
1870 \begin{center} |
1286 \begin{tikzpicture} |
1871 \begin{tikzpicture} |
1287 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1872 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.3$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1288 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] |
1873 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
|
1874 west},scaled x ticks = false,x tick label style={/pgf/number |
|
1875 format/1000 sep = \space}] |
1289 %\addplot+[only marks] table {proteinsOrig.txt}; |
1876 %\addplot+[only marks] table {proteinsOrig.txt}; |
1290 \addplot table {randGraph/ind/vf2pInd35_0.3.txt}; |
1877 \addplot table {randGraph/ind/vf2pInd35_0.3.txt}; |
1291 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.3.txt}; |
1878 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
|
1879 {randGraph/ind/vf2ppInd35_0.3.txt}; |
1292 \end{axis} |
1880 \end{axis} |
1293 \end{tikzpicture} |
1881 \end{tikzpicture} |
1294 \end{center} |
1882 \end{center} |
1295 \end{subfigure} |
1883 \end{subfigure} |
1296 \begin{subfigure}[b]{0.55\textwidth} |
1884 \begin{subfigure}[b]{0.55\textwidth} |
1297 \begin{center} |
1885 \begin{center} |
1298 \begin{tikzpicture} |
1886 \begin{tikzpicture} |
1299 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1887 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.6$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1300 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] |
1888 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
|
1889 west},scaled x ticks = false,x tick label style={/pgf/number |
|
1890 format/1000 sep = \space}] |
1301 %\addplot+[only marks] table {proteinsOrig.txt}; |
1891 %\addplot+[only marks] table {proteinsOrig.txt}; |
1302 \addplot table {randGraph/ind/vf2pInd35_0.6.txt}; |
1892 \addplot table {randGraph/ind/vf2pInd35_0.6.txt}; |
1303 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.6.txt}; |
1893 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
|
1894 {randGraph/ind/vf2ppInd35_0.6.txt}; |
1304 \end{axis} |
1895 \end{axis} |
1305 \end{tikzpicture} |
1896 \end{tikzpicture} |
1306 \end{center} |
1897 \end{center} |
1307 \end{subfigure} |
1898 \end{subfigure} |
1308 \begin{subfigure}[b]{0.55\textwidth} |
1899 \begin{subfigure}[b]{0.55\textwidth} |
1309 |
1900 |
1310 \begin{tikzpicture} |
1901 \begin{tikzpicture} |
1311 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1902 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1312 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \space}] |
1903 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
|
1904 west},scaled x ticks = false,x tick label style={/pgf/number |
|
1905 format/1000 sep = \space}] |
1313 %\addplot+[only marks] table {proteinsOrig.txt}; |
1906 %\addplot+[only marks] table {proteinsOrig.txt}; |
1314 \addplot table {randGraph/ind/vf2pInd35_0.8.txt}; |
1907 \addplot table {randGraph/ind/vf2pInd35_0.8.txt}; |
1315 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.8.txt}; |
1908 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
|
1909 {randGraph/ind/vf2ppInd35_0.8.txt}; |
1316 \end{axis} |
1910 \end{axis} |
1317 \end{tikzpicture} |
1911 \end{tikzpicture} |
1318 \end{subfigure} |
1912 \end{subfigure} |
1319 \begin{subfigure}[b]{0.55\textwidth} |
1913 \begin{subfigure}[b]{0.55\textwidth} |
1320 \begin{tikzpicture} |
1914 \begin{tikzpicture} |
1321 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1915 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.95$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1322 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] |
1916 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
|
1917 west},scaled x ticks = false,x tick label style={/pgf/number |
|
1918 format/1000 sep = \thinspace}] |
1323 %\addplot+[only marks] table {proteinsOrig.txt}; |
1919 %\addplot+[only marks] table {proteinsOrig.txt}; |
1324 \addplot table {randGraph/ind/vf2pInd35_0.95.txt}; |
1920 \addplot table {randGraph/ind/vf2pInd35_0.95.txt}; |
1325 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.95.txt}; |
1921 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
|
1922 {randGraph/ind/vf2ppInd35_0.95.txt}; |
1326 \end{axis} |
1923 \end{axis} |
1327 \end{tikzpicture} |
1924 \end{tikzpicture} |
1328 \end{subfigure} |
1925 \end{subfigure} |
1329 \vspace*{-0.8cm} |
1926 \vspace*{-0.8cm} |
1330 \caption{IND on graphs having an average degree of 35.}\label{fig:randIND35} |
1927 \caption{IND on graphs having an average degree of |
|
1928 35.}\label{fig:randIND35} |
1331 \end{figure} |
1929 \end{figure} |
1332 |
1930 |
1333 \begin{figure}[H] |
1931 \begin{figure}[H] |
1334 \begin{center} |
1932 \begin{center} |
1335 \begin{tikzpicture} |
1933 \begin{tikzpicture} |
1336 \begin{axis}[title={Rand IND Summary, $\delta = 35$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid |
1934 \begin{axis}[title={Rand IND Summary, $\delta = 35$, $\rho = 0.3, 0.6, 0.8, 0.95$},height=17cm,width=16cm,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},line width=0.8pt,grid |
1337 =major,mark size=1pt, legend style={at={(0,1)},anchor=north west},scaled x ticks = false,x tick label style={/pgf/number format/1000 sep = \thinspace}] |
1935 =major,mark size=1pt, legend style={at={(0,1)},anchor=north |
1338 %\addplot+[only marks] table {proteinsOrig.txt}; |
1936 west},scaled x ticks = false,x tick label style={/pgf/number |
1339 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.3.txt}; |
1937 format/1000 sep = \thinspace}] |
1340 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.3.txt}; |
1938 %\addplot+[only marks] table {proteinsOrig.txt}; |
1341 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.6.txt}; |
1939 \addplot[mark=*,mark size=1.5pt,color=blue] table |
1342 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.6.txt}; |
1940 {randGraph/ind/vf2pInd35_0.3.txt}; |
1343 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.8.txt}; |
1941 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1344 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.8.txt}; |
1942 {randGraph/ind/vf2ppInd35_0.3.txt}; |
1345 \addplot[mark=*,mark size=1.5pt,color=blue] table {randGraph/ind/vf2pInd35_0.95.txt}; |
1943 \addplot[mark=*,mark size=1.5pt,color=blue] table |
1346 \addplot[mark=triangle*,mark size=1.8pt,color=red] table {randGraph/ind/vf2ppInd35_0.95.txt}; |
1944 {randGraph/ind/vf2pInd35_0.6.txt}; |
|
1945 \addplot[mark=triangle*,mark |
|
1946 size=1.8pt,color=red] table |
|
1947 {randGraph/ind/vf2ppInd35_0.6.txt}; |
|
1948 \addplot[mark=*,mark |
|
1949 size=1.5pt,color=blue] table |
|
1950 {randGraph/ind/vf2pInd35_0.8.txt}; |
|
1951 \addplot[mark=triangle*,mark |
|
1952 size=1.8pt,color=red] table |
|
1953 {randGraph/ind/vf2ppInd35_0.8.txt}; |
|
1954 \addplot[mark=*,mark |
|
1955 size=1.5pt,color=blue] |
|
1956 table |
|
1957 {randGraph/ind/vf2pInd35_0.95.txt}; |
|
1958 \addplot[mark=triangle*,mark |
|
1959 size=1.8pt,color=red] |
|
1960 table |
|
1961 {randGraph/ind/vf2ppInd35_0.95.txt}; |
1347 \end{axis} |
1962 \end{axis} |
1348 \end{tikzpicture} |
1963 \end{tikzpicture} |
1349 \end{center} |
1964 \end{center} |
1350 \vspace*{-0.8cm} |
1965 \vspace*{-0.8cm} |
1351 \caption{Cummulative chart for $\delta=35$.}\label{fig:randIND35Sum} |
1966 \caption{Cummulative chart for $\delta=35$.}\label{fig:randIND35Sum} |
1352 \end{figure} |
1967 \end{figure} |
1353 |
1968 |
1354 Based on these experiments, VF2++ is faster than VF2 Plus and able to handle really large graphs in milliseconds. Note that when $IND$ was considered and the small graphs had proportionally few nodes ($\rho = 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node orders(e.g. see the $\delta=10$ case on \textbf{Figure \ref{fig:randIND10})}). If these examples had been excluded, the charts would have seemed to be similar to the other ones. |
1969 Based on these experiments, VF2++ is faster than VF2 Plus and able to |
1355 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 Plus slow slightly down, but remain practically usable even on graphs having 10 000 nodes. |
1970 handle really large graphs in milliseconds. Note that when $IND$ was |
1356 |
1971 considered and the small graphs had proportionally few nodes ($\rho = |
1357 |
1972 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node |
1358 |
1973 orders(e.g. see the $\delta=10$ case on \textbf{Figure |
1359 |
1974 \ref{fig:randIND10})}). If these examples had been excluded, the |
1360 \newpage |
1975 charts would have seemed to be similar to the other ones. |
|
1976 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 |
|
1977 Plus slow slightly down, but remain practically usable even on graphs |
|
1978 having 10 000 nodes. |
|
1979 |
|
1980 |
|
1981 |
|
1982 |
|
1983 |
1361 \section{Conclusion} |
1984 \section{Conclusion} |
1362 In this thesis, after providing a short summary of the recent algorithms, a new graph matching algorithm based on VF2, called VF2++, has been presented and analyzed from a practical viewpoint. |
1985 In this paper, after providing a short summary of the recent |
1363 |
1986 algorithms, a new graph matching algorithm based on VF2, called VF2++, |
1364 Recognizing the importance of the node order and determining an efficient one, VF2++ is able to match graphs of thousands of nodes in near practically linear time including preprocessing. In addition to the proper order, VF2++ uses more efficient consistency and cutting rules which are easy to compute and make the algorithm able to prune most of the unfruitful branches without going astray. |
1987 has been presented and analyzed from a practical viewpoint. |
1365 |
1988 |
1366 In order to show the efficiency of the new method, it has been compared to VF2 Plus, which is the best concurrent algorithm based on \cite{VF2Plus}. |
1989 Recognizing the importance of the node order and determining an |
1367 |
1990 efficient one, VF2++ is able to match graphs of thousands of nodes in |
1368 The experiments show that VF2++ consistently outperforms VF2 Plus on biological graphs. It seems to be asymptotically faster on protein and on contact map graphs in the case of induced subgraph isomorphism, while in the case of graph isomorphism, it has definitely better asymptotic behaviour on protein graphs. |
1991 near practically linear time including preprocessing. In addition to |
1369 |
1992 the proper order, VF2++ uses more efficient consistency and cutting |
1370 Regarding random sparse graphs, not only has VF2++ proved itself to be faster than VF2 Plus, but it has a practically linear behaviour both in the case of induced subgraph- and graph isomorphism, as well. |
1993 rules which are easy to compute and make the algorithm able to prune |
|
1994 most of the unfruitful branches without going astray. |
|
1995 |
|
1996 In order to show the efficiency of the new method, it has been |
|
1997 compared to VF2 Plus, which is the best concurrent algorithm based on |
|
1998 \cite{VF2Plus}. |
|
1999 |
|
2000 The experiments show that VF2++ consistently outperforms VF2 Plus on |
|
2001 biological graphs. It seems to be asymptotically faster on protein and |
|
2002 on contact map graphs in the case of induced subgraph isomorphism, |
|
2003 while in the case of graph isomorphism, it has definitely better |
|
2004 asymptotic behaviour on protein graphs. |
|
2005 |
|
2006 Regarding random sparse graphs, not only has VF2++ proved itself to be |
|
2007 faster than VF2 Plus, but it has a practically linear behaviour both |
|
2008 in the case of induced subgraph- and graph isomorphism, as well. |
1371 |
2009 |
1372 |
2010 |
1373 |
2011 |
1374 %% The Appendices part is started with the command \appendix; |
2012 %% The Appendices part is started with the command \appendix; |
1375 %% appendix sections are then done as normal sections |
2013 %% appendix sections are then done as normal sections |