183 In the last decades, combinatorial structures, and especially graphs |
177 In the last decades, combinatorial structures, and especially graphs |
184 have been considered with ever increasing interest, and applied to the |
178 have been considered with ever increasing interest, and applied to the |
185 solution of several new and revised questions. The expressiveness, |
179 solution of several new and revised questions. The expressiveness, |
186 the simplicity and the studiedness of graphs make them practical for |
180 the simplicity and the studiedness of graphs make them practical for |
187 modelling and appear constantly in several seemingly independent |
181 modelling and appear constantly in several seemingly independent |
188 fields. Bioinformatics and chemistry are amongst the most relevant |
182 fields, such as bioinformatics and chemistry. |
189 and most important fields. |
|
190 |
183 |
191 Complex biological systems arise from the interaction and cooperation |
184 Complex biological systems arise from the interaction and cooperation |
192 of plenty of molecular components. Getting acquainted with such |
185 of plenty of molecular components. Getting acquainted with such |
193 systems at the molecular level has primary importance, since |
186 systems at the molecular level is of primary importance, since |
194 protein-protein interaction, DNA-protein interaction, metabolic |
187 protein-protein interaction, DNA-protein interaction, metabolic |
195 interaction, transcription factor binding, neuronal networks, and |
188 interaction, transcription factor binding, neuronal networks, and |
196 hormone signaling networks can be understood only this way. |
189 hormone signaling networks can be understood this way. |
197 |
190 |
198 For instance, a molecular structure can be considered as a graph, |
191 Many chemical and biological structures can easily be modeled |
199 whose nodes correspond to atoms and whose edges to chemical bonds. The |
192 as graphs, for instance, a molecular structure can be |
200 secondary structure of a protein can also be represented as a graph, |
193 considered as a graph, whose nodes correspond to atoms and whose |
201 where nodes are associated with aminoacids and the edges with hydrogen |
194 edges to chemical bonds. The similarity and dissimilarity of |
202 bonds. The nodes are often whole molecular components and the edges |
195 objects corresponding to nodes are incorporated to the model |
203 represent some relationships among them. The similarity and |
196 by \emph{node labels}. Understanding such networks basically |
204 dissimilarity of objects corresponding to nodes are incorporated to |
197 requires finding specific subgraphs, thus calls for efficient |
205 the model by \emph{node labels}. Many other chemical and biological |
198 graph matching algorithms. |
206 structures can easily be modeled in a similar way. Understanding such |
199 |
207 networks basically requires finding specific subgraphs, which can not |
200 Other real-world fields related to some |
208 avoid the application of graph matching algorithms. |
201 variants of graph matching include pattern recognition |
209 |
|
210 Finally, let some of the other real-world fields related to some |
|
211 variants of graph matching be briefly mentioned: pattern recognition |
|
212 and machine vision \cite{HorstBunkeApplications}, symbol recognition |
202 and machine vision \cite{HorstBunkeApplications}, symbol recognition |
213 \cite{CordellaVentoSymbolRecognition}, face identification |
203 \cite{CordellaVentoSymbolRecognition}, face identification |
214 \cite{JianzhuangYongFaceIdentification}. \\ |
204 \cite{JianzhuangYongFaceIdentification}. \\ |
215 |
205 |
216 Subgraph and induced subgraph matching problems are known to be |
206 Subgraph and induced subgraph matching problems are known to be |
251 strategy and formulates the matching as a Constraint Satisfaction |
240 strategy and formulates the matching as a Constraint Satisfaction |
252 Problem to prune the search tree. The constraints are that the mapping |
241 Problem to prune the search tree. The constraints are that the mapping |
253 has to be injective and edge-preserving, hence it is possible to |
242 has to be injective and edge-preserving, hence it is possible to |
254 handle new matching types as well. |
243 handle new matching types as well. |
255 |
244 |
256 The \textbf{RI} algorithm\cite{RI} and its variations are based on a |
245 The \emph{RI} algorithm\cite{RI} and its variations are based on a |
257 state space representation. After reordering the nodes of the graphs, |
246 state space representation. After reordering the nodes of the graphs, |
258 it uses some fast executable heuristic checks without using any |
247 it uses some fast executable heuristic checks without using any |
259 complex pruning rules. It seems to run really efficiently on graphs |
248 complex pruning rules. It seems to run really efficiently on graphs |
260 coming from biology, and won the International Contest on Pattern |
249 coming from biology, and won the International Contest on Pattern |
261 Search in Biological Databases\cite{Content}. |
250 Search in Biological Databases\cite{Content}. |
262 |
251 |
263 The currently most commonly used algorithm is the |
252 The currently most commonly used algorithm is the |
264 \textbf{VF2}\cite{VF2}, the improved version of VF\cite{VF}, which was |
253 \emph{VF2}\cite{VF2}, the improved version of \emph{VF}\cite{VF}, which was |
265 designed for solving pattern matching and computer vision problems, |
254 designed for solving pattern matching and computer vision problems, |
266 and has been one of the best overall algorithms for more than a |
255 and has been one of the best overall algorithms for more than a |
267 decade. Although, it can't be up to new specialized algorithms, it is |
256 decade. Although, it can't be up to new specialized algorithms, it is |
268 still widely used due to its simplicity and space efficiency. VF2 uses |
257 still widely used due to its simplicity and space efficiency. VF2 uses |
269 a state space representation and checks some conditions in each state |
258 a state space representation and checks some conditions in each state |
270 to prune the search tree. |
259 to prune the search tree. |
271 |
260 |
272 Our first graph matching algorithm was the first version of VF2 which |
261 Meanwhile, another variant called \emph{VF2 Plus}\cite{VF2Plus} has |
273 recognizes the significance of the node ordering, more opportunities |
|
274 to increase the cutting efficiency and reduce its computational |
|
275 complexity. This project was initiated and sponsored by QuantumBio |
|
276 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source |
|
277 code --- has been published as a part of LEMON\cite{LEMON} open source |
|
278 graph library. |
|
279 |
|
280 This paper introduces \textbf{VF2++}, a new further improved algorithm |
|
281 for the graph and (induced)subgraph isomorphism problem, which uses |
|
282 efficient cutting rules and determines a node order in which VF2 runs |
|
283 significantly faster on practical inputs. |
|
284 |
|
285 Meanwhile, another variant called \textbf{VF2 Plus}\cite{VF2Plus} has |
|
286 been published. It is considered to be as efficient as the RI |
262 been published. It is considered to be as efficient as the RI |
287 algorithm and has a strictly better behavior on large graphs. The |
263 algorithm and has a strictly better behavior on large graphs. The |
288 main idea of VF2 Plus is to precompute a heuristic node order of the |
264 main idea of VF2 Plus is to precompute a heuristic node order of the |
289 small graph, in which the VF2 works more efficiently. |
265 small graph, in which the VF2 works more efficiently. |
290 |
266 |
|
267 This paper introduces \emph{VF2++}, a new further improved algorithm |
|
268 for the graph and (induced)subgraph isomorphism problem, which uses |
|
269 efficient cutting rules and determines a node order in which VF2 runs |
|
270 significantly faster on practical inputs. |
|
271 |
|
272 This project was initiated and sponsored by QuantumBio |
|
273 Inc.\cite{QUANTUMBIO} and the implementation --- along with a source |
|
274 code --- has been published as a part of LEMON\cite{LEMON} open source |
|
275 graph library. |
|
276 |
291 \section{Problem Statement} |
277 \section{Problem Statement} |
292 This section provides a detailed description of the problems to be |
278 This section provides a formal description of the problems to be |
293 solved. |
279 solved. |
294 \subsection{Definitions} |
280 \subsection{Definitions} |
295 |
281 |
296 Throughout the paper $G_{small}=(V_{small}, E_{small})$ and |
282 Throughout the paper $G_{1}=(V_{1}, E_{1})$ and |
297 $G_{large}=(V_{large}, E_{large})$ denote two undirected graphs. |
283 $G_{2}=(V_{2}, E_{2})$ denote two undirected graphs. |
298 \begin{definition}\label{sec:ismorphic} |
284 |
299 $G_{small}$ and $G_{large}$ are \textbf{isomorphic} if $\exists M: |
|
300 V_{small} \longrightarrow V_{large}$ bijection, for which the |
|
301 following is true: |
|
302 \begin{center} |
|
303 $\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow |
|
304 (M(u),M(v))\in{E_{large}}$ |
|
305 \end{center} |
|
306 \end{definition} |
|
307 For the sake of simplicity in this paper subgraphs and induced |
|
308 subgraphs are defined in a more general way than usual: |
|
309 \begin{definition} |
285 \begin{definition} |
310 $G_{small}$ is a \textbf{subgraph} of $G_{large}$ if $\exists I: |
286 $\mathcal{L}: (V_{1}\cup V_{2}) \longrightarrow K$ is a \textbf{node |
311 V_{small}\longrightarrow V_{large}$ injection, for which the |
|
312 following is true: |
|
313 \begin{center} |
|
314 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Rightarrow (I(u),I(v))\in E_{large}$ |
|
315 \end{center} |
|
316 \end{definition} |
|
317 |
|
318 \begin{definition} |
|
319 $G_{small}$ is an \textbf{induced subgraph} of $G_{large}$ if $\exists |
|
320 I: V_{small}\longrightarrow V_{large}$ injection, for which the |
|
321 following is true: |
|
322 \begin{center} |
|
323 $\forall u,v \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow |
|
324 (I(u),I(v))\in E_{large}$ |
|
325 \end{center} |
|
326 \end{definition} |
|
327 |
|
328 \begin{definition} |
|
329 $lab: (V_{small}\cup V_{large}) \longrightarrow K$ is a \textbf{node |
|
330 label function}, where K is an arbitrary set. The elements in K |
287 label function}, where K is an arbitrary set. The elements in K |
331 are the \textbf{node labels}. Two nodes, u and v are said to be |
288 are the \textbf{node labels}. Two nodes, u and v are said to be |
332 \textbf{equivalent} if $lab(u)=lab(v)$. |
289 \textbf{equivalent} if $\mathcal{L}(u)=\mathcal{L}(v)$. |
333 \end{definition} |
290 \end{definition} |
334 |
291 |
335 When node labels are given, the matched nodes must have the same |
292 For the sake of simplicity, in this paper the graph, subgraph and induced subgraph isomorphisms are defined in a more general way. |
336 labels. For example, the node labeled isomorphism is phrased by |
293 |
|
294 \begin{definition}\label{sec:ismorphic} |
|
295 $G_{1}$ and $G_{2}$ are \textbf{isomorphic} (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}: |
|
296 V_{1} \longrightarrow V_{2}$ bijection, for which the |
|
297 following is true: |
|
298 \begin{center} |
|
299 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and |
|
300 $\forall u,v\in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in{E_{2}}$ |
|
301 \end{center} |
|
302 \end{definition} |
|
303 |
337 \begin{definition} |
304 \begin{definition} |
338 $G_{small}$ and $G_{large}$ are \textbf{isomorphic by the node label |
305 $G_{1}$ is a \textbf{subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists \mathfrak{m}: |
339 function lab} if $\exists M: V_{small} \longrightarrow V_{large}$ |
306 V_{1}\longrightarrow V_{2}$ injection, for which the |
340 bijection, for which the following is true: |
307 following is true: |
341 \begin{center} |
308 \begin{center} |
342 $(\forall u,v\in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow |
309 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and |
343 (M(u),M(v))\in{E_{large}})$ and $(\forall u\in{V_{small}} : |
310 $\mathbb{M}$ |
344 lab(u)=lab(M(u)))$ |
311 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Rightarrow (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$ |
345 \end{center} |
312 \end{center} |
346 \end{definition} |
313 \end{definition} |
347 |
314 |
348 Note that he other two definitions can be extended in the same way. |
315 \begin{definition} |
|
316 $G_{1}$ is an \textbf{induced subgraph} of $G_{2}$ (by the node label $\mathcal{L}$) if $\exists |
|
317 \mathfrak{m}: V_{1}\longrightarrow V_{2}$ injection, for which the |
|
318 following is true: |
|
319 \begin{center} |
|
320 $\forall u\in{V_{1}} : \mathcal{L}(u)=\mathcal{L}(\mathfrak{m}(u))$ and |
|
321 |
|
322 $\forall u,v \in{V_{1}} : (u,v)\in{E_{1}} \Leftrightarrow |
|
323 (\mathfrak{m}(u),\mathfrak{m}(v))\in E_{2}$ |
|
324 \end{center} |
|
325 \end{definition} |
|
326 |
349 |
327 |
350 \subsection{Common problems}\label{sec:CommProb} |
328 \subsection{Common problems}\label{sec:CommProb} |
351 |
329 |
352 The focus of this paper is on two extensively studied topics, the |
330 The focus of this paper is on two extensively studied topics, the |
353 subgraph isomorphism and its variations. However, the following |
331 subgraph isomorphism and its variations. However, the following |
354 problems also appear in many applications. |
332 problems also appear in many applications. |
355 |
333 |
356 The \textbf{subgraph matching problem} is the following: is |
334 The \textbf{subgraph matching problem} is the following: is |
357 $G_{small}$ isomorphic to any subgraph of $G_{large}$ by a given node |
335 $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node |
358 label? |
336 label? |
359 |
337 |
360 The \textbf{induced subgraph matching problem} asks the same about the |
338 The \textbf{induced subgraph matching problem} asks the same about the |
361 existence of an induced subgraph. |
339 existence of an induced subgraph. |
362 |
340 |
380 for the sake of simplicity, only the undirected case will be |
358 for the sake of simplicity, only the undirected case will be |
381 discussed. |
359 discussed. |
382 |
360 |
383 |
361 |
384 \subsection{Common notations} |
362 \subsection{Common notations} |
385 \indent Assume $G_{small}$ is searched in $G_{large}$. The following |
363 \indent Assume $G_{1}$ is searched in $G_{2}$. The following |
386 definitions and notations will be used throughout the whole paper. |
364 definitions and notations will be used throughout the whole paper. |
387 \begin{definition} |
365 \begin{definition} |
388 A set $M\subseteq V_{small}\times V_{large}$ is called |
366 An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$. |
389 \textbf{mapping} if no node of $V_{small}\cup V_{large}$ appears |
|
390 in more than one pair in M. That is, M uniquely associates some of |
|
391 the nodes in $V_{small}$ with some nodes of $V_{large}$ and vice |
|
392 versa. |
|
393 \end{definition} |
367 \end{definition} |
394 |
368 |
|
369 \begin{notation} |
|
370 $\mathfrak{D}(f)$ and $\mathfrak{R}(f)$ denote the domain and the range of a function $f$, respectively. |
|
371 \end{notation} |
|
372 |
395 \begin{definition} |
373 \begin{definition} |
396 Mapping $M$ \textbf{covers} a node v if there exists a pair in M, which |
374 Mapping $\mathfrak{m}$ \textbf{covers} a node $u\in V_1\cup V_2$ if $u\in \mathfrak{D}(\mathfrak{m})\cup \mathfrak{R}(\mathfrak{m})$. |
397 contains v. |
|
398 \end{definition} |
375 \end{definition} |
399 |
376 |
400 \begin{definition} |
377 \begin{definition} |
401 A mapping $M$ is $\mathbf{whole\ mapping}$ if $M$ covers all the |
378 A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the |
402 nodes in $V_{small}$. |
379 nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$. |
403 \end{definition} |
380 \end{definition} |
404 |
381 |
|
382 \begin{definition} |
|
383 Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : \mathfrak{m}(w)=f(w)$ and $f(u)=v$ holds. Where $u\notin\mathfrak{D}(\mathfrak{m})$ and $v\notin\mathfrak{R}(\mathfrak{m})$, otherwise $extend(\mathfrak{m},(u,v))$ is undefined. |
|
384 \end{definition} |
|
385 |
405 \begin{notation} |
386 \begin{notation} |
406 Let $\mathbf{M_{small}(s)} := \{u\in V_{small} : \exists v\in |
387 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type |
407 V_{large}: (u,v)\in M(s)\}$ and $\mathbf{M_{large}(s)} := \{v\in |
388 which can be substituted by any of the $\mathbf{ISO}$, $\mathbf{SUB}$ |
408 V_{large} : \exists u\in V_{small}: (u,v)\in M(s)\}$. |
389 and $\mathbf{IND}$ problems. |
409 \end{notation} |
390 \end{notation} |
410 |
391 |
411 \begin{notation} |
|
412 Let $\mathbf{Pair(M,v)}$ be the pair of $v$ in $M$ if such a node |
|
413 exist, otherwise $\mathbf{Pair(M,v)}$ is undefined, where $M$ is a mapping and $v\in V_{small}\cup V_{large}$. |
|
414 \end{notation} |
|
415 |
|
416 Note that if $\mathbf{Pair(M,v)}$ exists, then it is unique |
|
417 |
|
418 The definitions of the isomorphism types can be rephrased on the |
|
419 existence of a special whole mapping $M$, since it represents a |
|
420 bijection. For example |
|
421 \begin{center} |
|
422 $M\subseteq V_{small}\times V_{large}$ represents an induced subgraph |
|
423 isomorphism $\Leftrightarrow$ $M$ is whole mapping and $\forall u,v |
|
424 \in{V_{small}} : (u,v)\in{E_{small}} \Leftrightarrow |
|
425 (Pair(M,u),Pair(M,v))\in E_{large}$. |
|
426 \end{center} |
|
427 |
|
428 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type |
|
429 which can be substituted by any of $\mathbf{ISO}$, $\mathbf{SUB}$ |
|
430 and $\mathbf{IND}$. |
|
431 |
|
432 \begin{definition} |
392 \begin{definition} |
433 Let M be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a |
393 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a |
434 \textbf{consistency function by } $\mathbf{PT}$ if the following |
394 \textbf{consistency function by } $\mathbf{PT}$ if the following |
435 holds. If there exists whole mapping $W$ of type $PT$ for which |
395 holds. If there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$. |
436 $M\subseteq W$, then $Cons_{PT}(M)$ is true. |
|
437 \end{definition} |
396 \end{definition} |
438 |
397 |
439 \begin{definition} |
398 \begin{definition} |
440 Let M be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a |
399 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a |
441 \textbf{cutting function by } $\mathbf{PT}$ if the following |
400 \textbf{cutting function by } $\mathbf{PT}$ if the following |
442 holds. $\mathbf{Cut_{PT}(M)}$ is false if $M$ can be extended to a |
401 holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$. |
443 whole mapping $W$ of type $PT$. |
|
444 \end{definition} |
402 \end{definition} |
445 |
403 |
446 \begin{definition} |
404 \begin{definition} |
447 $M$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if |
405 $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if |
448 $Cons_{PT}(M)$ is true. |
406 $Cons_{PT}(\mathfrak{m})$ is true. |
449 \end{definition} |
407 \end{definition} |
450 |
408 |
451 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form. |
409 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form. |
452 \begin{notation} |
410 \begin{notation} |
453 Let $\mathbf{Cons_{PT}(p, M)}:=Cons_{PT}(M\cup\{p\})$, and |
411 Let $\mathbf{Cons_{PT}(p, \mathfrak{m})}:=Cons_{PT}(extend(\mathfrak{m},p))$, and |
454 $\mathbf{Cut_{PT}(p, M)}:=Cut_{PT}(M\cup\{p\})$, where |
412 $\mathbf{Cut_{PT}(p, \mathfrak{m})}:=Cut_{PT}(extend(\mathfrak{m},p))$, where |
455 $p\in{V_{small}\!\times\!V_{large}}$ and $M\cup\{p\}$ is mapping. |
413 $p\in{V_{1}\backslash\mathfrak{D}(\mathfrak{m}) \!\times\!V_{2}\backslash\mathfrak{R}(\mathfrak{m})}$. |
456 \end{notation} |
414 \end{notation} |
457 |
415 |
458 $Cons_{PT}$ will be used to check the consistency of the already |
416 $Cons_{PT}$ will be used to check the consistency of the already |
459 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if |
417 covered nodes, while $Cut_{PT}$ is for looking ahead to recognize if |
460 no whole consistent mapping can contain the current mapping. |
418 no whole consistent mapping can contain the current mapping. |
461 |
419 |
462 \subsection{Overview of the algorithm} |
420 \subsection{Overview of the algorithm} |
463 VF2 uses a state space representation of mappings, $Cons_{PT}$ for |
421 VF2 uses a state space representation of mappings, $Cons_{PT}$ for |
464 excluding inconsistency with the problem type and $Cut_{PT}$ for |
422 excluding inconsistency with the problem type and $Cut_{PT}$ for |
465 pruning the search tree. Each state $s$ of the matching process can |
423 pruning the search tree. |
466 be associated with a mapping $M(s)$. |
|
467 |
424 |
468 Algorithm~\ref{alg:VF2Pseu} is a high level description of |
425 Algorithm~\ref{alg:VF2Pseu} is a high level description of |
469 the VF2 matching algorithm. |
426 the VF2 matching algorithm. Each state of the matching process can |
|
427 be associated with a mapping $\mathfrak{m}$. The initial state |
|
428 is associated with a mapping $\mathfrak{m}$, for which |
|
429 $\mathfrak{D}(\mathfrak{m})=\emptyset$, i.e. it starts with an empty mapping. |
470 |
430 |
471 |
431 |
472 \begin{algorithm} |
432 \begin{algorithm} |
473 \algtext*{EndIf}%ne nyomtasson end if-et |
433 \algtext*{EndIf}%ne nyomtasson end if-et |
474 \algtext*{EndFor}%ne |
434 \algtext*{EndFor}%ne |
475 \algtext*{EndProcedure}%ne nyomtasson .. |
435 \algtext*{EndProcedure}%ne nyomtasson .. |
476 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu} |
436 \caption{\hspace{0.5cm}$A\ high\ level\ description\ of\ VF2$}\label{alg:VF2Pseu} |
477 \begin{algorithmic}[1] |
437 \begin{algorithmic}[1] |
478 |
438 |
479 \Procedure{VF2}{State $s$, ProblemType $PT$} \If{$M(s$) covers |
439 \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$} |
480 $V_{small}$} \State Output($M(s)$) \Else |
440 \If{$\mathfrak{m}$ covers |
481 |
441 $V_{1}$} \State Output($\mathfrak{m}$) |
482 \State Compute the set $P(s)$ of the pairs candidate for inclusion |
442 \Else |
483 in $M(s)$ \ForAll{$p\in{P(s)}$} \If{Cons$_{PT}$($p, M(s)$) $\wedge$ |
443 \State Compute the set $P_\mathfrak{m}$ of the pairs candidate for inclusion |
484 $\neg$Cut$_{PT}$($p, M(s)$)} \State Compute the nascent state |
444 in $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$ |
485 $\tilde{s}$ by adding $p$ to $M(s)$ \State \textbf{call} |
445 $\neg$Cut$_{PT}$($p,\mathfrak{m}$)} |
486 VF2($\tilde{s}$, $PT$) \EndIf \EndFor \EndIf \EndProcedure |
446 \State \textbf{call} |
|
447 VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure |
487 \end{algorithmic} |
448 \end{algorithmic} |
488 \end{algorithm} |
449 \end{algorithm} |
489 |
450 |
490 |
451 |
491 The initial state $s_0$ is associated with $M(s_0)=\emptyset$, i.e. it |
452 For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of |
492 starts with an empty mapping. |
453 candidate node pairs for adding to the current mapping $\mathfrak{m}_s$. |
493 |
454 |
494 For each state $s$, the algorithm computes $P(s)$, the set of |
455 For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and |
495 candidate node pairs for adding to the current state $s$. |
456 $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and |
496 |
457 the latter is false, the whole process is recursively applied to |
497 For each pair $p$ in $P(s)$, $Cons_{PT}(p,M(s))$ and |
458 $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it |
498 $Cut_{PT}(p,M(s))$ are evaluated. If $Cons_{PT}(p,M(s))$ is true and |
459 can be proved that $\mathfrak{m}$ can not be extended to a whole mapping. |
499 $Cut_{PT}(p,M(s))$ is false, the successor state $\tilde{s}=s\cup |
|
500 \{p\}$ is computed, and the whole process is recursively applied to |
|
501 $\tilde{s}$. Otherwise, $\tilde{s}$ is not consistent by $PT$, or it |
|
502 can be proved that $s$ can not be extended to a whole mapping. |
|
503 |
460 |
504 In order to make sure of the correctness, see |
461 In order to make sure of the correctness, see |
505 \begin{claim} |
462 \begin{claim} |
506 Through consistent mappings, only consistent whole mappings can be |
463 Through consistent mappings, only consistent whole mappings can be |
507 reached, and all of the whole mappings are reachable through |
464 reached, and all the consistent whole mappings are reachable through |
508 consistent mappings. |
465 consistent mappings. |
509 \end{claim} |
466 \end{claim} |
510 |
467 |
511 Note that a state may be reached in exponentially many different ways, since the |
468 Note that a mapping may be reached in exponentially many different ways, since the |
512 order of insertions into $M$ does not influence the nascent mapping. |
469 order of extensions does not influence the nascent mapping. |
513 |
470 |
514 However, one may observe |
471 However, one may observe |
515 |
472 |
516 \begin{claim} |
473 \begin{claim} |
517 \label{claim:claimTotOrd} |
474 \label{claim:claimTotOrd} |
518 Let $\prec$ an arbitrary total ordering relation on $V_{small}$. If |
475 Let $\prec$ be an arbitrary total ordering relation on $V_{1}$. If |
519 the algorithm ignores each $p=(u,v) \in P(s)$, for which |
476 the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which |
520 \begin{center} |
477 \begin{center} |
521 $\exists (\tilde{u},\tilde{v})\in P(s): \tilde{u} \prec u$, |
478 $\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$, |
522 \end{center} |
479 \end{center} |
523 then no state can be reached more than once, and each state associated |
480 then no mapping can be reached more than once, and each whole mapping remains reachable. |
524 with a whole mapping remains reachable. |
|
525 \end{claim} |
481 \end{claim} |
526 |
482 |
527 Note that the cornerstone of the improvements to VF2 is a proper |
483 Note that the cornerstone of the improvements to VF2 is a proper |
528 choice of a total ordering. |
484 choice of a total ordering. |
529 |
485 |
530 \subsection{The candidate set P(s)} |
486 \subsection{The candidate set} |
531 \label{candidateComputingVF2} |
487 \label{candidateComputingVF2} |
532 Let $P(s)$ be the set of the candidate pairs for inclusion in $M(s)$. |
488 Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$. |
533 |
489 |
534 \begin{notation} |
490 \begin{notation} |
535 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by |
491 Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and |
536 $M(s)\wedge\exists \tilde{u}\in{V_{small}: (u,\tilde{u})\in E_{small}} |
492 $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$. |
537 \wedge \tilde{u}$ is covered by $M(s)\}$, and |
|
538 \\ $\mathbf{T_{large}(s)}\!:=\!\{v \in\!V_{large}\!:\!v$ is not |
|
539 covered by |
|
540 $M(s)\wedge\!\exists\tilde{v}\!\in\!{V_{large}\!:\!(v,\tilde{v})\in\!E_{large}} |
|
541 \wedge \tilde{v}$ is covered by $M(s)\}$ |
|
542 \end{notation} |
493 \end{notation} |
543 |
494 |
544 The set $P(s)$ includes the pairs of uncovered neighbours of covered |
495 The set $P_\mathfrak{m}$ includes the pairs of uncovered neighbours of covered |
545 nodes, and if there is not such a node pair, all the pairs containing |
496 nodes, and if there is not such a node pair, all the pairs containing |
546 two uncovered nodes are added. Formally, let |
497 two uncovered nodes are added. Formally, let |
547 \[ |
498 \[ |
548 P(s)\!=\! |
499 P_\mathfrak{m}\!=\! |
549 \begin{cases} |
500 \begin{cases} |
550 T_{small}(s)\times T_{large}(s)&\hspace{-0.15cm}\text{if } |
501 T_{1}(\mathfrak{m})\times T_{2}(\mathfrak{m})&\hspace{-0.15cm}\text{if } |
551 T_{small}(s)\!\neq\!\emptyset\!\wedge\!T_{large}(s)\!\neq |
502 T_{1}(\mathfrak{m})\!\neq\!\emptyset\ \text{and }T_{2}(\mathfrak{m})\!\neq |
552 \emptyset,\\ (V_{small}\!\setminus\!M_{small}(s))\!\times\!(V_{large}\!\setminus\!M_{large}(s)) |
503 \emptyset,\\ (V_{1}\!\setminus\!\mathfrak{D}(\mathfrak{m}))\!\times\!(V_{2}\!\setminus\!\mathfrak{R}(\mathfrak{m})) |
553 &\hspace{-0.15cm}otherwise. |
504 &\hspace{-0.15cm}\text{otherwise}. |
554 \end{cases} |
505 \end{cases} |
555 \] |
506 \] |
556 |
507 |
557 \subsection{Consistency} |
508 \subsection{Consistency} |
558 Suppose $p=(u,v)$, where $u\in V_{small}$ and $v\in V_{large}$, $s$ is |
509 Suppose $p=(u,v)$, where $u\in V_{1}$ and $v\in V_{2}$, $\mathfrak{m}$ is a consistent mapping by |
559 a state of the matching procedure, $M(s)$ is consistent mapping by |
510 $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether |
560 $PT$ and $lab(u)=lab(v)$. $Cons_{PT}(p,M(s))$ checks whether |
511 including pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$. |
561 including pair $p$ into $M(s)$ leads to a consistent mapping by $PT$. |
|
562 |
512 |
563 For example, the consistency function of induced subgraph isomorphism is as follows. |
513 For example, the consistency function of induced subgraph isomorphism is as follows. |
564 \begin{notation} |
514 \begin{notation} |
565 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} : |
515 Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} : |
566 (u,\tilde{u})\in E_{small}\}$, and $\mathbf{\Gamma_{large} |
516 (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2} |
567 (v)}:=\{\tilde{v}\in V_{large} : (v,\tilde{v})\in E_{large}\}$, where $u\in V_{small}$ and $v\in V_{large}$. |
517 (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$. |
568 \end{notation} |
518 \end{notation} |
569 |
519 |
570 $M(s)\cup \{(u,v)\}$ is a consistent mapping by $IND$ $\Leftrightarrow |
520 $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ $\Leftrightarrow |
571 (\forall \tilde{u}\in M_{small}: (u,\tilde{u})\in E_{small} |
521 (\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1} |
572 \Leftrightarrow (v,Pair(M(s),\tilde{u}))\in E_{large})$. The |
522 \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. The |
573 following formulation gives an efficient way of calculating |
523 following formulation gives an efficient way of calculating |
574 $Cons_{IND}$. |
524 $Cons_{IND}$. |
575 \begin{claim} |
525 \begin{claim} |
576 $Cons_{IND}((u,v),M(s)):=(\forall \tilde{v}\in \Gamma_{large}(v) |
526 $Cons_{IND}((u,v),\mathfrak{m}):=\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge |
577 \ \cap\ M_{large}(s):\\(Pair(M(s),\tilde{v}),u)\in E_{small})\wedge |
527 (\forall \tilde{u}\in \Gamma_{1}(u) |
578 (\forall \tilde{u}\in \Gamma_{small}(u) |
528 \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is a |
579 \ \cap\ M_{small}(s):(v,Pair(M(s),\tilde{u}))\in E_{large})$ is a |
|
580 consistency function in the case of $IND$. |
529 consistency function in the case of $IND$. |
581 \end{claim} |
530 \end{claim} |
582 |
531 |
583 \subsection{Cutting rules} |
532 \subsection{Cutting rules} |
584 $Cut_{PT}(p,M(s))$ is defined by a collection of efficiently |
533 $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently |
585 verifiable conditions. The requirement is that $Cut_{PT}(p,M(s))$ can |
534 verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can |
586 be true only if it is impossible to extended $M(s)\cup \{p\}$ to a |
535 be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a |
587 whole mapping. |
536 whole mapping. |
588 |
537 |
589 As an example, the cutting function of induced subgraph isomorphism is presented. |
538 As an example, the cutting function of induced subgraph isomorphism is presented. |
590 \begin{notation} |
539 \begin{notation} |
591 Let $\mathbf{\tilde{T}_{small}}(s):=(V_{small}\backslash |
540 Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash |
592 M_{small}(s))\backslash T_{small}(s)$, and |
541 \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and |
593 \\ $\mathbf{\tilde{T}_{large}}(s):=(V_{large}\backslash |
542 \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash |
594 M_{large}(s))\backslash T_{large}(s)$. |
543 \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$. |
595 \end{notation} |
544 \end{notation} |
596 |
545 |
597 \begin{claim} |
546 \begin{claim} |
598 $Cut_{IND}((u,v),M(s)):= |\Gamma_{large} (v)\ \cap\ T_{large}(s)| < |
547 $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| < |
599 |\Gamma_{small} (u)\ \cap\ T_{small}(s)| \vee |\Gamma_{large}(v)\cap |
548 |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap |
600 \tilde{T}_{large}(s)| < |\Gamma_{small}(u)\cap |
549 \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap |
601 \tilde{T}_{small}(s)|$ is a cutting function by $IND$. |
550 \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function by $IND$. |
602 \end{claim} |
551 \end{claim} |
603 |
|
604 |
552 |
605 \section{The VF2++ Algorithm} |
553 \section{The VF2++ Algorithm} |
606 Although any total ordering relation makes the search space of VF2 a |
554 Although any total ordering relation makes the search space of VF2 a |
607 tree, its choice turns out to dramatically influence the number of |
555 tree, its choice turns out to dramatically influence the number of |
608 visited states. The goal is to determine an efficient one as quickly |
556 visited states. The goal is to determine an efficient one as quickly |
623 |
571 |
624 Note that a weaker version of the cutting rules and the more efficient |
572 Note that a weaker version of the cutting rules and the more efficient |
625 candidate set calculating were described in \cite{VF2Plus}, too. |
573 candidate set calculating were described in \cite{VF2Plus}, too. |
626 |
574 |
627 It should be noted that all the methods described in this section are |
575 It should be noted that all the methods described in this section are |
628 extendable to handle directed graphs and edge labels as well. |
576 extendable to handle directed graphs and edge labels as well.\newline |
629 |
577 |
630 The basic ideas and the detailed description of VF2++ are provided in |
578 The basic ideas and the detailed description of VF2++ are provided in |
631 the following. |
579 the following. |
632 |
580 |
633 \subsection{Preparations} |
|
634 \begin{claim} |
|
635 \label{claim:claimCoverFromLeft} |
|
636 The total ordering relation uniquely determines a node order, in which |
|
637 the nodes of $V_{small}$ will be covered by VF2. From the point of |
|
638 view of the matching procedure, this means, that always the same node |
|
639 of $G_{small}$ will be covered on the d-th level. |
|
640 \end{claim} |
|
641 |
|
642 \begin{definition} |
|
643 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{small}|)})$ of |
|
644 $V_{small}$ is \textbf{matching order} if exists $\prec$ total |
|
645 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds |
|
646 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{small}|\}$. |
|
647 \end{definition} |
|
648 |
|
649 \begin{claim}\label{claim:MOclaim} |
|
650 A total ordering is matching order iff the nodes of every component |
|
651 form an interval in the node sequence, and every node connects to a |
|
652 previous node in its component except the first node of each component. |
|
653 \end{claim} |
|
654 |
|
655 To summing up, a total ordering always uniquely determines a matching |
|
656 order, and every matching order can be determined by a total ordering, |
|
657 however, more than one different total orderings may determine the |
|
658 same matching order. |
|
659 \subsection{Idea behind the algorithm} |
|
660 The goal is to find a matching order in which the algorithm is able to |
581 The goal is to find a matching order in which the algorithm is able to |
661 recognize inconsistency or prune the infeasible branches on the |
582 recognize inconsistency or prune the infeasible branches on the |
662 highest levels and goes deep only if it is needed. |
583 highest levels and goes deep only if it is needed. |
663 |
584 |
664 \begin{notation} |
585 \begin{notation} |
665 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{small}(u)\cap H\}|$, that is the |
586 Let $\mathbf{Conn_{H}(u)}:=|\Gamma_{1}(u)\cap H\}|$, that is the |
666 number of neighbours of u which are in H, where $u\in V_{small} $ and |
587 number of neighbours of u which are in H, where $u\in V_{1} $ and |
667 $H\subseteq V_{small}$. |
588 $H\subseteq V_{1}$. |
668 \end{notation} |
589 \end{notation} |
669 |
590 |
670 The principal question is the following. Suppose a state $s$ is |
591 The principal question is the following. Suppose a mapping $\mathfrak{m}$ is |
671 given. For which node of $T_{small}(s)$ is the hardest to find a |
592 given. For which node of $T_{1}(\mathfrak{m})$ is the hardest to find a |
672 consistent pair in $G_{large}$? The more covered neighbours a node in |
593 consistent pair in $G_{2}$? The more covered neighbours a node in |
673 $T_{small}(s)$ has --- i.e. the largest $Conn_{M_{small}(s)}$ it has |
594 $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has |
674 ---, the more rarely satisfiable consistency constraints for its pair |
595 ---, the more rarely satisfiable consistency constraints for its pair |
675 are given. |
596 are given. |
676 |
597 |
677 In biology, most of the graphs are sparse, thus several nodes in |
598 In biology, most of the graphs are sparse, thus several nodes in |
678 $T_{small}(s)$ may have the same $Conn_{M_{small}(s)}$, which makes |
599 $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes |
679 reasonable to define a secondary and a tertiary order between them. |
600 reasonable to define a secondary and a tertiary order between them. |
680 The observation above proves itself to be as determining, that the |
601 The observation above proves itself to be as determining, that the |
681 secondary ordering prefers nodes with the most uncovered neighbours |
602 secondary ordering prefers nodes with the most uncovered neighbours |
682 among which have the same $Conn_{M_{small}(s)}$ to increase |
603 among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase |
683 $Conn_{M_{small}(s)}$ of uncovered nodes so much, as possible. The |
604 $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes so much, as possible. The |
684 tertiary ordering prefers nodes having the rarest uncovered labels. |
605 tertiary ordering prefers nodes having the rarest uncovered labels. |
685 |
606 |
686 Note that the secondary ordering is the same as the ordering by $deg$, |
607 Note that the secondary ordering is the same as the ordering by $deg$, |
687 which is a static data in front of the above used. |
608 which is a static data in front of the above used. |
688 |
609 |
689 These rules can easily result in a matching order which contains the |
610 These rules can easily result in a matching order which contains the |
690 nodes of a long path successively, whose nodes may have low $Conn$ and |
611 nodes of a long path successively, whose nodes may have low $Conn$ and |
691 is easily matchable into $G_{large}$. To avoid that, a BFS order is |
612 is easily matchable into $G_{2}$. To avoid that, a BFS order is |
692 used, which provides the shortest possible paths. |
613 used, which provides the shortest possible paths. |
693 \newline |
614 \newline |
694 |
615 |
695 In the following, some examples on which the VF2 may be slow are |
616 In the following, some examples on which the VF2 may be slow are |
696 described, although they are easily solvable by using a proper |
617 described, although they are easily solvable by using a proper |
697 matching order. |
618 matching order. |
698 |
619 |
699 \begin{example} |
620 \begin{example} |
700 Suppose $G_{small}$ can be mapped into $G_{large}$ in many ways |
621 Suppose $G_{1}$ can be mapped into $G_{2}$ in many ways |
701 without node labels. Let $u\in V_{small}$ and $v\in V_{large}$. |
622 without node labels. Let $u\in V_{1}$ and $v\in V_{2}$. |
702 \newline |
623 \newline |
703 $lab(u):=black$ |
624 $\mathcal{L}(u):=black$ |
704 \newline |
625 \newline |
705 $lab(v):=black$ |
626 $\mathcal{L}(v):=black$ |
706 \newline |
627 \newline |
707 $lab(\tilde{u}):=red \ \forall \tilde{u}\in (V_{small}\backslash |
628 $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in (V_{1}\backslash |
708 \{u\})$ |
629 \{u\})$ |
709 \newline |
630 \newline |
710 $lab(\tilde{v}):=red \ \forall \tilde{v}\in (V_{large}\backslash |
631 $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in (V_{2}\backslash |
711 \{v\})$ |
632 \{v\})$ |
712 \newline |
633 \newline |
713 |
634 |
714 Now, any mapping by the node label $lab$ must contain $(u,v)$, since |
635 Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since |
715 $u$ is black and no node in $V_{large}$ has a black label except |
636 $u$ is black and no node in $V_{2}$ has a black label except |
716 $v$. If unfortunately $u$ were the last node which will get covered, |
637 $v$. If unfortunately $u$ were the last node which will get covered, |
717 VF2 would check only in the last steps, whether $u$ can be matched to |
638 VF2 would check only in the last steps, whether $u$ can be matched to |
718 $v$. |
639 $v$. |
719 \newline |
640 \newline |
720 However, had $u$ been the first matched node, u would have been |
641 However, had $u$ been the first matched node, u would have been |
721 matched immediately to v, so all the mappings would have been |
642 matched immediately to v, so all the mappings would have been |
722 precluded in which node labels can not correspond. |
643 precluded in which node labels can not correspond. |
723 \end{example} |
644 \end{example} |
724 |
645 |
725 \begin{example} |
646 \begin{example} |
726 Suppose there is no node label given, $G_{small}$ is a small graph and |
647 Suppose there is no node label given, $G_{1}$ is a small graph and |
727 can not be mapped into $G_{large}$ and $u\in V_{small}$. |
648 can not be mapped into $G_{2}$ and $u\in V_{1}$. |
728 \newline |
649 \newline |
729 Let $G'_{small}:=(V_{small}\cup |
650 Let $G'_{1}:=(V_{1}\cup |
730 \{u'_{1},u'_{2},..,u'_{k}\},E_{small}\cup |
651 \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup |
731 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is, |
652 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is, |
732 $G'_{small}$ is $G_{small}\cup \{ a\ k$ long path, which is disjoint |
653 $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint |
733 from $G_{small}$ and one of its starting points is connected to $u\in |
654 from $G_{1}$ and one of its starting points is connected to $u\in |
734 V_{small}\}$. |
655 V_{1}\}$. |
735 \newline |
656 \newline |
736 Is there a subgraph of $G_{large}$, which is isomorph with |
657 Is there a subgraph of $G_{2}$, which is isomorph with |
737 $G'_{small}$? |
658 $G'_{1}$? |
738 \newline |
659 \newline |
739 If unfortunately the nodes of the path were the first $k$ nodes in the |
660 If unfortunately the nodes of the path were the first $k$ nodes in the |
740 matching order, the algorithm would iterate through all the possible k |
661 matching order, the algorithm would iterate through all the possible k |
741 long paths in $G_{large}$, and it would recognize that no path can be |
662 long paths in $G_{2}$, and it would recognize that no path can be |
742 extended to $G'_{small}$. |
663 extended to $G'_{1}$. |
743 \newline |
664 \newline |
744 However, had it started by the matching of $G_{small}$, it would not |
665 However, had it started by the matching of $G_{1}$, it would not |
745 have matched any nodes of the path. |
666 have matched any nodes of the path. |
746 \end{example} |
667 \end{example} |
747 |
668 |
748 These examples may look artificial, but the same problems also appear |
669 These examples may look artificial, but the same problems also appear |
749 in real-world instances, even though in a less obvious way. |
670 in real-world instances, even though in a less obvious way. |
750 |
671 |
|
672 \subsection{Preparations} |
|
673 \begin{claim} |
|
674 \label{claim:claimCoverFromLeft} |
|
675 The total ordering relation uniquely determines a node order, in which |
|
676 the nodes of $V_{1}$ will be covered by VF2. From the point of |
|
677 view of the matching procedure, this means, that always the same node |
|
678 of $G_{1}$ will be covered on the d-th level. |
|
679 \end{claim} |
|
680 |
|
681 \begin{definition} |
|
682 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of |
|
683 $V_{1}$ is \textbf{matching order} if exists $\prec$ total |
|
684 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds |
|
685 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$. |
|
686 \end{definition} |
|
687 |
|
688 \begin{claim}\label{claim:MOclaim} |
|
689 A total ordering is matching order iff the nodes of every component |
|
690 form an interval in the node sequence, and every node connects to a |
|
691 previous node in its component except the first node of each component. |
|
692 \end{claim} |
|
693 |
|
694 To summing up, a total ordering always uniquely determines a matching |
|
695 order, and every matching order can be determined by a total ordering, |
|
696 however, more than one different total orderings may determine the |
|
697 same matching order. |
|
698 |
751 \subsection{Total ordering} |
699 \subsection{Total ordering} |
752 Instead of the total ordering relation, the matching order will be |
700 The matching order will be searched directly. |
753 searched directly. |
|
754 \begin{notation} |
701 \begin{notation} |
755 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{large} : |
702 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} : |
756 l=lab(v)\}|-|\{u\in V_{small}\backslash \mathcal{M} : l=lab(u)\}|$ , |
703 l=\mathcal{L}(v)\}|-|\{u\in V_{1}\backslash \mathcal{M} : l=\mathcal{L}(u)\}|$ , |
757 where $l$ is a label and $\mathcal{M}\subseteq V_{small}$. |
704 where $l$ is a label and $\mathcal{M}\subseteq V_{1}$. |
758 \end{notation} |
705 \end{notation} |
759 |
706 |
760 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$. |
707 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$. |
761 \end{definition} |
708 \end{definition} |
762 |
709 |
805 provides a matching order. |
752 provides a matching order. |
806 |
753 |
807 |
754 |
808 \subsection{Cutting rules} |
755 \subsection{Cutting rules} |
809 \label{VF2PPCuttingRules} |
756 \label{VF2PPCuttingRules} |
810 This section presents the cutting rule of VF2++ in the case of IND, which is improved |
757 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels. |
811 by using extra information coming from the node labels. For other problem types, the rules can be formulated similarly. |
|
812 \begin{notation} |
758 \begin{notation} |
813 Let $\mathbf{\Gamma_{small}^{l}(u)}:=\{\tilde{u} : lab(\tilde{u})=l |
759 Let $\mathbf{\Gamma_{1}^{l}(u)}:=\{\tilde{u} : \mathcal{L}(\tilde{u})=l |
814 \wedge \tilde{u}\in \Gamma_{small} (u)\}$ and |
760 \wedge \tilde{u}\in \Gamma_{1} (u)\}$ and |
815 $\mathbf{\Gamma_{large}^{l}(v)}:=\{\tilde{v} : lab(\tilde{v})=l \wedge |
761 $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge |
816 \tilde{v}\in \Gamma_{large} (v)\}$, where $u\in V_{small}$, $v\in |
762 \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in |
817 V_{large}$ and $l$ is a label. |
763 V_{2}$ and $l$ is a label. |
818 \end{notation} |
764 \end{notation} |
819 |
765 |
|
766 \subsubsection{Induced subgraph isomorphism} |
820 \begin{claim} |
767 \begin{claim} |
821 \[LabCut_{IND}((u,v),M(s))\!:=\!\!\!\!\!\bigvee_{l\ is\ label}\!\!\!\!\!\!\!|\Gamma_{large}^{l} (v) \cap T_{large}(s)|\!<\!|\Gamma_{small}^{l}(u)\cap T_{small}(s)|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{large}^{l}(v)\cap \tilde{T}_{large}(s)| < |\Gamma_{small}^{l}(u)\cap \tilde{T}_{small}(s)|\] is a cutting function by IND. |
768 \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by IND. |
822 \end{claim} |
769 \end{claim} |
823 |
770 \subsubsection{Graph isomorphism} |
824 |
771 \begin{claim} |
825 \subsection{Implementation details} |
772 \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by ISO. |
|
773 \end{claim} |
|
774 |
|
775 \subsubsection{Subgraph isomorphism} |
|
776 \begin{claim} |
|
777 \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function by SUB. |
|
778 \end{claim} |
|
779 |
|
780 |
|
781 |
|
782 \section{Implementation details} |
826 This section provides a detailed summary of an efficient |
783 This section provides a detailed summary of an efficient |
827 implementation of VF2++. |
784 implementation of VF2++. |
828 \subsubsection{Storing a mapping} |
785 \subsubsection{Storing a mapping} |
829 After fixing an arbitrary node order ($u_0, u_1, .., |
786 After fixing an arbitrary node order ($u_0, u_1, .., |
830 u_{|G_{small}|-1}$) of $G_{small}$, an array $M$ is usable to store |
787 u_{|G_{1}|-1}$) of $G_{1}$, an array $M$ is usable to store |
831 the current mapping in the following way. |
788 the current mapping in the following way. |
832 \[ |
789 \[ |
833 M[i] = |
790 M[i] = |
834 \begin{cases} |
791 \begin{cases} |
835 v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INVALID & |
792 v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D & |
836 if\ no\ node\ has\ been\ mapped\ to\ u_i, |
793 if\ no\ node\ has\ been\ mapped\ to\ u_i, |
837 \end{cases} |
794 \end{cases} |
838 \] |
795 \] |
839 where $i\in\{0,1, ..,|G_{small}|-1\}$, $v\in V_{large}$ and $INVALID$ |
796 where $i\in\{0,1, ..,|G_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$ |
840 means "no node". |
797 means "no node". |
841 \subsubsection{Avoiding the recurrence} |
798 \subsubsection{Avoiding the recurrence} |
842 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized |
799 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized |
843 as a \textit{while loop}, which has a loop counter $depth$ denoting the |
800 as a \textit{while loop}, which has a loop counter $depth$ denoting the |
844 all-time depth of the recursion. Fixing a matching order, let $M$ |
801 all-time depth of the recursion. Fixing a matching order, let $M$ |
845 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft}, |
802 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft}, |
846 $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before |
803 $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before |
847 $depth$. $M[depth]$ changes |
804 $depth$. $M[depth]$ changes |
848 while the state is being processed, but the property is held before |
805 while the state is being processed, but the property is held before |
849 both stepping back to a predecessor state and exploring a successor |
806 both stepping back to a predecessor state and exploring a successor |
850 state. |
807 state. |
851 |
808 |
877 |
834 |
878 For using lookup tables, the node labels are associated with the |
835 For using lookup tables, the node labels are associated with the |
879 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It |
836 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It |
880 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order |
837 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order |
881 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes |
838 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes |
882 in $V_{small}$ having label i, which is easy to compute in |
839 in $V_{1}$ having label $i$, which is easy to compute in |
883 $\Theta(|V_{small}|)$ steps. |
840 $\Theta(|V_{1}|)$ steps. |
884 |
841 |
885 Representing $\mathcal{M}\subseteq V_{small}$ as an array of |
842 Representing $\mathcal{M}\subseteq V_{1}$ as an array of |
886 size $|V_{small}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes. |
843 size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes. |
887 |
844 |
888 \subsubsection{Cutting rules} |
845 \subsubsection{Cutting rules} |
889 In Section~\ref{VF2PPCuttingRules}, the cutting rules were |
846 In Section~\ref{VF2PPCuttingRules}, the cutting rules were |
890 described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$ |
847 described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$ |
891 and $\tilde T_{large}$, which are dependent on the all-time mapping |
848 and $\tilde T_{2}$, which are dependent on the all-time mapping |
892 (i.e. on the all-time state). The aim is to check the labeled cutting |
849 (i.e. on the all-time state). The aim is to check the labeled cutting |
893 rules of VF2++ in $\Theta(deg)$ time. |
850 rules of VF2++ in $\Theta(deg)$ time. |
894 |
851 |
895 Firstly, suppose that these four sets are given in such a way, that |
852 Firstly, suppose that these four sets are given in such a way, that |
896 checking whether a node is in a certain set takes constant time, |
853 checking whether a node is in a certain set takes constant time, |
897 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an |
854 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an |
898 initially zero integer lookup table of size $|K|$. After incrementing |
855 initially zero integer lookup table of size $|K|$. After incrementing |
899 $L[lab(u')]$ for all $u'\in \Gamma_{small}(u) \cap T_{small}(s)$ and |
856 $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and |
900 decrementing $L[lab(v')]$ for all $v'\in\Gamma_{large} (v) \cap |
857 decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap |
901 T_{large}(s)$, the first part of the cutting rules is checkable in |
858 T_{2}(s)$, the first part of the cutting rules is checkable in |
902 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$ |
859 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$ |
903 to zero takes $\Theta(deg)$ time again, which makes it possible to use |
860 to zero takes $\Theta(deg)$ time again, which makes it possible to use |
904 the same table through the whole algorithm. The second part of the |
861 the same table through the whole algorithm. The second part of the |
905 cutting rules can be verified using the same method with $\tilde |
862 cutting rules can be verified using the same method with $\tilde |
906 T_{small}$ and $\tilde T_{large}$ instead of $T_{small}$ and |
863 T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and |
907 $T_{large}$. Thus, the overall complexity is $\Theta(deg)$. |
864 $T_{2}$. Thus, the overall complexity is $\Theta(deg)$. |
908 |
865 |
909 An other integer lookup table storing the number of covered neighbours |
866 Another integer lookup table storing the number of covered neighbours |
910 of each node in $G_{large}$ gives all the information about the sets |
867 of each node in $G_{2}$ gives all the information about the sets |
911 $T_{large}$ and $\tilde T_{large}$, which is maintainable in |
868 $T_{2}$ and $\tilde T_{2}$, which is maintainable in |
912 $\Theta(deg)$ time when a pair is added or substracted by incrementing |
869 $\Theta(deg)$ time when a pair is added or substracted by incrementing |
913 or decrementing the proper indices. A further improvement is that the |
870 or decrementing the proper indices. A further improvement is that the |
914 values of $L[lab(u')]$ in case of checking $u$ is dependent only on |
871 values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on |
915 $u$, i.e. on the size of the mapping, so for each $u\in V_{small}$ an |
872 $u$, i.e. on the size of the mapping, so for each $u\in V_{1}$ an |
916 array of pairs (label, number of such labels) can be stored to skip |
873 array of pairs (label, number of such labels) can be stored to skip |
917 the maintaining operations. Note that these arrays are at most of size |
874 the maintaining operations. Note that these arrays are at most of size |
918 $deg$. Skipping this trick, the number of covered neighbours has to be |
875 $deg$. |
919 stored for each node of $G_{small}$ as well to get the sets |
876 |
920 $T_{small}$ and $\tilde T_{small}$. |
877 Using similar techniques, the consistency function can be evaluated in |
921 |
|
922 Using similar tricks, the consistency function can be evaluated in |
|
923 $\Theta(deg)$ steps, as well. |
878 $\Theta(deg)$ steps, as well. |
924 |
879 |
925 \section{The VF2 Plus Algorithm} |
|
926 The VF2 Plus algorithm is a recently improved version of VF2. It was |
|
927 compared with the state of the art algorithms in \cite{VF2Plus} and |
|
928 has proved itself to be competitive with RI, the best algorithm on |
|
929 biological graphs. \\ A short summary of VF2 Plus follows, which uses |
|
930 the notation and the conventions of the original paper. |
|
931 |
|
932 \subsection{Ordering procedure} |
|
933 VF2 Plus uses a sorting procedure that prefers nodes in $V_{small}$ |
|
934 with the lowest probability to find a pair in $V_{small}$ and the |
|
935 highest number of connections with the nodes already sorted by the |
|
936 algorithm. |
|
937 |
|
938 \begin{definition} |
|
939 $(u,v)$ is a \textbf{feasible pair} if $lab(u)=lab(v)$ and |
|
940 $deg(u)\leq deg(v)$, where $u\in{V_{small}}$ and $ v\in{V_{large}}$. |
|
941 \end{definition} |
|
942 $P_{lab}(L):=$ a priori probability to find a node with label $L$ in |
|
943 $V_{large}$ |
|
944 \newline |
|
945 $P_{deg}(d):=$ a priori probability to find a node with degree $d$ in |
|
946 $V_{large}$ |
|
947 \newline |
|
948 $P(u):=P_{lab}(L)*\bigcup_{d'>d}P_{deg}(d')$\\ $M$ is the set of |
|
949 already sorted nodes, $T$ is the set of nodes candidate to be |
|
950 selected, and $degreeM$ of a node is the number of its neighbours in |
|
951 $M$. |
|
952 \begin{algorithm} |
|
953 \algtext*{EndIf}%ne nyomtasson end if-et |
|
954 \algtext*{EndFor}%nenyomtasson .. |
|
955 \algtext*{EndProcedure}%ne nyomtasson .. |
|
956 \algtext*{EndWhile} |
|
957 \caption{}\label{alg:VF2PlusPseu} |
|
958 \begin{algorithmic}[1] |
|
959 \Procedure{VF2 Plus order}{} \State Select the node with the lowest |
|
960 $P$. \If {more nodes share the same $P$} \State select the one with |
|
961 maximum degree \EndIf \If {more nodes share the same $P$ and have the |
|
962 max degree} \State select the first \EndIf \State Put the selected |
|
963 node in the set $M$. \label{alg:putIn} \State Put all its unsorted |
|
964 neighbours in the set $T$. \If {$M\neq V_{small}$} \State From set |
|
965 $T$ select the node with maximum $degreeM$. \If {more nodes have |
|
966 maximum $degreeM$} \State Select the one with the lowest $P$ \EndIf |
|
967 \If {more nodes have maximum $degreeM$ and $P$} \State Select the |
|
968 first. \EndIf \State \textbf{goto \ref{alg:putIn}.} \EndIf |
|
969 \EndProcedure |
|
970 \end{algorithmic} |
|
971 \end{algorithm} |
|
972 |
|
973 Using these notations, Algorithm~\ref{alg:VF2PlusPseu} |
|
974 provides the description of the sorting procedure. |
|
975 |
|
976 Note that $P(u)$ is not the exact probability of finding a consistent |
|
977 pair for $u$ by choosing a node of $V_{large}$ randomly, since |
|
978 $P_{lab}$ and $P_{deg}$ are not independent, though calculating the |
|
979 real probability would take quadratic time, which may be reduced by |
|
980 using fittingly lookup tables. |
|
981 |
|
982 \section{Experimental results} |
880 \section{Experimental results} |
983 This section compares the performance of VF2++ and VF2 Plus. Both |
881 This section compares the performance of VF2++ and VF2 Plus. According to |
984 algorithms have run faster with orders of magnitude than VF2, thus its |
882 our experience, both algorithms run faster than VF2 with orders of |
985 inclusion was not reasonable. |
883 magnitude, thus its inclusion was not reasonable. |
|
884 |
|
885 The algorithms were implemented in C++ using the open source |
|
886 LEMON graph and network optimization library\cite{LEMON}. The test were carried out on a linux based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM. |
986 \subsection{Biological graphs} |
887 \subsection{Biological graphs} |
987 The tests have been executed on a recent biological dataset created |
888 The tests have been executed on a recent biological dataset created |
988 for the International Contest on Pattern Search in Biological |
889 for the International Contest on Pattern Search in Biological |
989 Databases\cite{Content}, which has been constructed of molecule, |
890 Databases\cite{Content}, which has been constructed of molecule, |
990 protein and contact map graphs extracted from the Protein Data |
891 protein and contact map graphs extracted from the Protein Data |