170 Complex biological systems arise from the interaction and cooperation |
170 Complex biological systems arise from the interaction and cooperation |
171 of plenty of molecular components. Getting acquainted with such |
171 of plenty of molecular components. Getting acquainted with such |
172 systems at the molecular level is of primary importance, since |
172 systems at the molecular level is of primary importance, since |
173 protein-protein interaction, DNA-protein interaction, metabolic |
173 protein-protein interaction, DNA-protein interaction, metabolic |
174 interaction, transcription factor binding, neuronal networks, and |
174 interaction, transcription factor binding, neuronal networks, and |
175 hormone signaling networks can be understood this way. |
175 hormone signalling networks can be understood this way. |
176 |
176 |
177 Many chemical and biological structures can easily be modeled |
177 Many chemical and biological structures can easily be modelled |
178 as graphs, for instance, a molecular structure can be |
178 as graphs, for instance, a molecular structure can be |
179 considered as a graph, whose nodes correspond to atoms and whose |
179 considered as a graph, whose nodes correspond to atoms and whose |
180 edges to chemical bonds. The similarity and dissimilarity of |
180 edges to chemical bonds. The similarity and dissimilarity of |
181 objects corresponding to nodes are incorporated to the model |
181 objects corresponding to nodes are incorporated to the model |
182 by \emph{node labels}. Understanding such networks basically |
182 by \emph{node labels}. Understanding such networks basically |
183 requires finding specific subgraphs, thus it calls for efficient |
183 requires finding specific subgraphs, thus it calls for efficient |
184 graph matching algorithms. |
184 graph matching algorithms. |
185 |
185 |
186 Other real-world fields related to some |
186 Other real-world fields related to some |
187 variants of graph matching include pattern recognition |
187 variants of graph matching include pattern recognition |
188 and machine vision \cite{HorstBunkeApplications}, symbol recognition |
188 and machine vision~\cite{HorstBunkeApplications}, symbol recognition~\cite{CordellaVentoSymbolRecognition}, and face identification~\cite{JianzhuangYongFaceIdentification}. \\ |
189 \cite{CordellaVentoSymbolRecognition}, face identification |
|
190 \cite{JianzhuangYongFaceIdentification}. \\ |
|
191 |
189 |
192 Subgraph and induced subgraph matching problems are known to be |
190 Subgraph and induced subgraph matching problems are known to be |
193 NP-Complete\cite{SubgraphNPC}, while the graph isomorphism problem is |
191 NP-Complete~\cite{SubgraphNPC}, while the graph isomorphism problem is |
194 one of the few problems in NP neither known to be in P nor |
192 one of the few problems in NP neither known to be in P nor |
195 NP-Complete. Although polynomial time isomorphism algorithms are known |
193 NP-Complete. Although polynomial-time isomorphism algorithms are known |
196 for various graph classes, like trees and planar |
194 for various graph classes, like trees and planar |
197 graphs\cite{PlanarGraphIso}, bounded valence |
195 graphs~\cite{PlanarGraphIso}, bounded valence |
198 graphs\cite{BondedDegGraphIso}, interval graphs\cite{IntervalGraphIso} |
196 graphs~\cite{BondedDegGraphIso}, interval graphs~\cite{IntervalGraphIso} |
199 or permutation graphs\cite{PermGraphIso}, and recently, an FPT algorithm has been presented for the coloured hypergraph isomorphism problem in \cite{ColoredHiperGraphIso}. |
197 or permutation graphs~\cite{PermGraphIso}. Furthermore, an FPT algorithm has also been presented for the coloured hypergraph isomorphism problem in~\cite{ColoredHiperGraphIso}. |
200 |
198 |
201 In the following, some algorithms based on other approaches are |
199 In the following, some algorithms based on other approaches are |
202 summarized, which do not need any restrictions on the graphs. Even though, |
200 summarized, which do not need any restrictions on the graphs. Even though, |
203 an overall polynomial behaviour is not expectable from such an |
201 an overall polynomial behaviour is not expectable from such an |
204 alternative, it may often have good practical performance, in fact, |
202 alternative, it may often have good practical performance, in fact, |
205 it might be the best choice even on a graph class for which polynomial |
203 it might be the best choice even on a graph class for which polynomial |
206 algorithm is known. |
204 algorithm is known. |
207 |
205 |
208 The first practically usable approach was due to |
206 The first practically usable approach was due to |
209 \emph{Ullmann}\cite{Ullmann} which is a commonly used depth-first |
207 \emph{Ullmann}~\cite{Ullmann}, which is a commonly used algorithm based on depth-first |
210 search based algorithm with a complex heuristic for reducing the |
208 search with a complex heuristic for reducing the |
211 number of visited states. A major problem is its $\Theta(n^3)$ space |
209 number of visited states. A major problem is its $\Theta(n^3)$ space |
212 complexity, which makes it impractical in the case of big sparse |
210 complexity, which makes it impractical in the case of big sparse |
213 graphs. |
211 graphs. |
214 |
212 |
215 In a recent paper, Ullmann\cite{UllmannBit} presents an |
213 In a recent paper, Ullmann~\cite{UllmannBit} presents an |
216 improved version of this algorithm based on a bit-vector solution for |
214 improved version of this algorithm based on a bit-vector solution for |
217 the binary Constraint Satisfaction Problem. |
215 the binary Constraint Satisfaction Problem. |
218 |
216 |
219 The \emph{Nauty} algorithm\cite{Nauty} transforms the two graphs to |
217 The \emph{Nauty} algorithm~\cite{Nauty} transforms the two graphs to |
220 a canonical form before starting to check for the isomorphism. It has |
218 a canonical form before starting to check for isomorphism. It has |
221 been considered as one of the fastest graph isomorphism algorithms, |
219 been considered as one of the fastest graph isomorphism algorithms, |
222 although graph categories were shown in which it takes exponentially |
220 although graph categories were shown in which it takes exponentially |
223 many steps. This algorithm handles only the graph isomorphism problem. |
221 many steps. This algorithm handles only the graph isomorphism problem. |
224 |
222 |
225 The \emph{LAD} algorithm\cite{Lad} uses a depth-first search |
223 The \emph{LAD} algorithm~\cite{Lad} uses a depth-first search |
226 strategy and formulates the matching as a Constraint Satisfaction |
224 strategy and formulates the matching as a Constraint Satisfaction |
227 Problem to prune the search tree. The constraints are that the mapping |
225 Problem to prune the search tree. The constraints are that the mapping |
228 has to be injective and edge-preserving, hence it is possible to |
226 has to be injective and edge-preserving, hence it is possible to |
229 handle new matching types as well. |
227 handle new matching types as well. |
230 |
228 |
231 The \emph{RI} algorithm\cite{RI} and its variations are based on a |
229 The \emph{RI} algorithm~\cite{RI} and its variations are based on a |
232 state space representation. After reordering the nodes of the graphs, |
230 state space representation. After reordering the nodes of the graphs, |
233 it uses some fast executable heuristic checks without using any |
231 it uses some fast executable heuristic checks without using any |
234 complex pruning rules. It seems to run really efficiently on graphs |
232 complex pruning rules. It seems to run really efficiently on graphs |
235 coming from biology, and won the International Contest on Pattern |
233 coming from biology, and won the International Contest on Pattern |
236 Search in Biological Databases\cite{Content}. |
234 Search in Biological Databases~\cite{Content}. |
237 |
235 |
238 The currently most commonly used algorithm is the |
236 Currently, the most commonly used algorithm is the |
239 \emph{VF2}\cite{VF2}, the improved version of \emph{VF}\cite{VF}, which was |
237 \emph{VF2}~\cite{VF2}, the improved version of \emph{VF}~\cite{VF}, which was |
240 designed for solving pattern matching and computer vision problems, |
238 designed for solving pattern matching and computer vision problems, |
241 and has been one of the best overall algorithms for more than a |
239 and has been one of the best overall algorithms for more than a |
242 decade. Although, it can't be up to new specialized algorithms, it is |
240 decade. Although, it is not as fast as some of the new specialized algorithms, it is still widely used due to its simplicity and space efficiency. VF2 uses |
243 still widely used due to its simplicity and space efficiency. VF2 uses |
|
244 a state space representation and checks some conditions in each state |
241 a state space representation and checks some conditions in each state |
245 to prune the search tree. |
242 to prune the search tree. |
246 |
243 |
247 Meanwhile, another variant called \emph{VF2 Plus}\cite{VF2Plus} has |
244 Meanwhile, another variant called \emph{VF2~Plus}~\cite{VF2Plus} has |
248 been published. It is considered to be as efficient as the RI |
245 been published. It is considered to be as efficient as the RI |
249 algorithm and has a strictly better behavior on large graphs. The |
246 algorithm and has a strictly better behaviour on large graphs. The |
250 main idea of VF2 Plus is to precompute a heuristic node order of the |
247 main idea of VF2~Plus is to precompute a heuristic node order of the graph to be embedded, in which VF2 works more efficiently. |
251 small graph, in which the VF2 works more efficiently. |
|
252 |
248 |
253 This paper introduces \emph{VF2++}, a new further improved algorithm |
249 This paper introduces \emph{VF2++}, a new further improved algorithm |
254 for the graph and (induced)subgraph isomorphism problem, which uses |
250 for the graph and (induced) subgraph isomorphism problems, which uses |
255 efficient cutting rules and determines a node order in which VF2 runs |
251 efficient cutting rules and determines a node order in which VF2 runs |
256 significantly faster on practical inputs. |
252 significantly faster on practical inputs. |
257 |
253 |
258 The rest of the paper is structured as |
254 The rest of the paper is structured as |
259 follows. Section~\ref{sec:ProbStat} defines the exact problems to be |
255 follows. Section~\ref{sec:ProbStat} defines the exact problems to be |
260 solved, Section~\ref{sec:VF2Alg} provides a description of VF2. Based |
256 solved, Section~\ref{sec:VF2Alg} provides a description of VF2. Based |
261 on that, Section~\ref{sec:VF2ppAlg} introduces VF2++. Some technical |
257 on that, Section~\ref{sec:VF2ppAlg} introduces VF2++. Some technical |
262 details necessary for an efficient implementation are discussed in |
258 details necessary for an efficient implementation are discussed in |
263 Section~\ref{sec:VF2ppImpl}. Finally, Section~\ref{sec:ExpRes} |
259 Section~\ref{sec:VF2ppImpl}. Finally, Section~\ref{sec:ExpRes} |
264 provide a detailed experimental evaluation of VF2++ and its comparison |
260 provides a detailed experimental evaluation of VF2++ and its comparison |
265 to the state-of-the-art algorithm. |
261 to the state-of-the-art algorithm. |
266 |
262 |
267 It must also be mentioned that the C++ implementations of the |
263 It must also be mentioned that the C++ implementations of the |
268 algorithms have been made available for evaluation and use under an |
264 algorithms have been made available for evaluation and use under an |
269 open source license as a part of LEMON\cite{LEMON} open source graph |
265 open-source license as a part of LEMON~\cite{LEMON} graph |
270 library. |
266 library. |
271 |
267 |
272 \section{Problem Statement}\label{sec:ProbStat} |
268 \section{Problem Statement}\label{sec:ProbStat} |
273 This section provides a formal description of the problems to be |
269 This section provides a formal description of the problems to be |
274 solved. |
270 solved. |
319 \end{definition} |
315 \end{definition} |
320 |
316 |
321 |
317 |
322 \subsection{Common problems}\label{sec:CommProb} |
318 \subsection{Common problems}\label{sec:CommProb} |
323 |
319 |
324 The focus of this paper is on two extensively studied topics, the |
320 The focus of this paper is on the following problems appearing in many applications. |
325 subgraph isomorphism and its variations. However, the following |
321 |
326 problems also appear in many applications. |
322 The \textbf{subgraph isomorphism problem} is the following: is |
327 |
|
328 The \textbf{subgraph matching problem} is the following: is |
|
329 $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node |
323 $G_{1}$ isomorphic to any subgraph of $G_{2}$ by a given node |
330 label? |
324 label? |
331 |
325 |
332 The \textbf{induced subgraph matching problem} asks the same about the |
326 The \textbf{induced subgraph isomorphism problem} asks the same about the |
333 existence of an induced subgraph. |
327 existence of an induced subgraph. |
334 |
328 |
335 The \textbf{graph isomorphism problem} can be defined as induced |
329 The \textbf{graph isomorphism problem} can be defined as induced |
336 subgraph matching problem where the sizes of the two graphs are equal. |
330 subgraph matching problem where the sizes of the two graphs are equal. |
337 |
331 |
338 In addition, one may want to find a \textbf{single} mapping or \textbf{enumerate} all of them. |
332 In addition, one may want to find a \textbf{single} embedding or \textbf{enumerate} all of them. |
339 |
|
340 Note that some authors refer to the term |
|
341 \emph{subgraph isomorphism problem} as an \emph{induced subgraph |
|
342 isomorphism problem}. |
|
343 |
333 |
344 \section{The VF2 Algorithm}\label{sec:VF2Alg} |
334 \section{The VF2 Algorithm}\label{sec:VF2Alg} |
345 This algorithm is the basis of both the VF2++ and the VF2 Plus. VF2 |
335 This algorithm is the basis of both the VF2++ and the VF2~Plus. VF2 |
346 is able to handle all the variations mentioned in Section |
336 is able to handle all the variations mentioned in Section~\ref{sec:CommProb}. Although it can also handle directed graphs, |
347 \ref{sec:CommProb}. Although it can also handle directed graphs, |
|
348 for the sake of simplicity, only the undirected case will be |
337 for the sake of simplicity, only the undirected case will be |
349 discussed. |
338 discussed. |
350 |
339 |
351 |
340 |
352 \subsection{Common notations} |
341 \subsection{Common notations} |
353 \indent Assume $G_{1}$ is searched in $G_{2}$. The following |
342 \indent Assume $G_{1}$ is searched in $G_{2}$. The following |
354 definitions and notations will be used throughout the whole paper. |
343 definitions and notations will be used throughout this paper. |
355 \begin{definition} |
344 \begin{definition} |
356 An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$. |
345 An injection $\mathfrak{m} : D \longrightarrow V_2$ is called (partial) \textbf{mapping}, where $D\subseteq V_1$. |
357 \end{definition} |
346 \end{definition} |
358 |
347 |
359 \begin{notation} |
348 \begin{notation} |
368 A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the |
357 A mapping $\mathfrak{m}$ is $\mathbf{whole\ mapping}$ if $\mathfrak{m}$ covers all the |
369 nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$. |
358 nodes of $V_{1}$, i.e. $\mathfrak{D}(\mathfrak{m})=V_1$. |
370 \end{definition} |
359 \end{definition} |
371 |
360 |
372 \begin{definition} |
361 \begin{definition} |
373 Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : \mathfrak{m}(w)=f(w)$ and $f(u)=v$ holds. Where $u\in V_1\setminus\mathfrak{D}(\mathfrak{m})$ and $v\in V_2\setminus\mathfrak{R}(\mathfrak{m})$, otherwise $extend(\mathfrak{m},(u,v))$ is undefined. |
362 Let \textbf{extend}$(\mathfrak{m},(u,v))$ denote the function $f : \mathfrak{D}(\mathfrak{m})\cup\{u\}\longrightarrow\mathfrak{R}(\mathfrak{m})\cup\{v\}$, for which $\forall w\in \mathfrak{D}(\mathfrak{m}) : f(w)=\mathfrak{m}(w)$ and $f(u)=v$ holds, where $u\in V_1\setminus\mathfrak{D}(\mathfrak{m})$ and $v\in V_2\setminus\mathfrak{R}(\mathfrak{m})$; otherwise $extend(\mathfrak{m},(u,v))$ is undefined. |
374 \end{definition} |
363 \end{definition} |
375 |
364 |
376 \begin{notation} |
365 \begin{notation} |
377 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type |
366 Throughout the paper, $\mathbf{PT}$ denotes a generic problem type |
378 which can be substituted by any of the $\mathbf{ISO}$, $\mathbf{SUB}$ |
367 which can be substituted by any of the $\mathbf{SUB}$, $\mathbf{IND}$ |
379 and $\mathbf{IND}$ problems. |
368 and $\mathbf{ISO}$ problems, which stand for the the problems mentioned in Section~\ref{sec:CommProb} respectively. |
380 \end{notation} |
369 \end{notation} |
381 |
370 |
382 \begin{definition} |
371 \begin{definition} |
383 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cons_{PT}}$ is a |
372 Let $\mathfrak{m}$ be a mapping. The \textbf{consistency function for } $\mathbf{PT}$ is a logical function, for which $\mathbf{Cons_{PT}}(\mathfrak{m})$ is true if and only if $\mathfrak{m}$ satisfies the requirements of $\mathbf{PT}$ considering the subgraphs of $G_{1}$ and $G_{2}$ induced by $\mathfrak{D}(\mathfrak{m})$ and $\mathfrak{R}(\mathfrak{m})$, respectively. |
384 \textbf{consistency function by } $\mathbf{PT}$ if the following |
373 |
385 holds. If there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$. |
374 %$\mathbf{Cons_{PT}}(\mathfrak{m})$ is true if there exists a whole mapping $w$ satisfying the requirements of $PT$, for which $\mathfrak{m}$ is exactly $w$ restricted to $\mathfrak{D}(\mathfrak{m})$. |
386 \end{definition} |
375 \end{definition} |
387 |
376 |
388 \begin{definition} |
377 \begin{definition} |
389 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a |
378 Let $\mathfrak{m}$ be a mapping. A logical function $\mathbf{Cut_{PT}}$ is a |
390 \textbf{cutting function by } $\mathbf{PT}$ if the following |
379 \textbf{cutting function for } $\mathbf{PT}$ if the following |
391 holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$. |
380 holds. $\mathbf{Cut_{PT}(\mathfrak{m})}$ is false if there exists a sequence of extend operations, which results in a whole mapping satisfying the requirements of $PT$. |
392 \end{definition} |
381 \end{definition} |
393 |
382 |
394 \begin{definition} |
383 \begin{definition} |
395 $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if |
384 A mapping $\mathfrak{m}$ is said to be \textbf{consistent mapping by} $\mathbf{PT}$ if |
396 $Cons_{PT}(\mathfrak{m})$ is true. |
385 $Cons_{PT}(\mathfrak{m})$ is true. |
397 \end{definition} |
386 \end{definition} |
398 |
387 |
399 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form. |
388 $Cons_{PT}$ and $Cut_{PT}$ will often be used in the following form. |
400 \begin{notation} |
389 \begin{notation} |
428 |
416 |
429 \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$} |
417 \Procedure{VF2}{Mapping $\mathfrak{m}$, ProblemType $PT$} |
430 \If{$\mathfrak{m}$ covers |
418 \If{$\mathfrak{m}$ covers |
431 $V_{1}$} \State Output($\mathfrak{m}$) |
419 $V_{1}$} \State Output($\mathfrak{m}$) |
432 \Else |
420 \Else |
433 \State Compute the set $P_\mathfrak{m}$ of the pairs candidate for inclusion |
421 \State Compute the set $P_\mathfrak{m}$ of the candidate pairs for extending $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$ |
434 in $\mathfrak{m}$ \ForAll{$p\in{P_\mathfrak{m}}$} \If{Cons$_{PT}$($p,\mathfrak{m}$) $\wedge$ |
|
435 $\neg$Cut$_{PT}$($p,\mathfrak{m}$)} |
422 $\neg$Cut$_{PT}$($p,\mathfrak{m}$)} |
436 \State \textbf{call} |
423 \State \textbf{call} |
437 VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure |
424 VF2($extend(\mathfrak{m},p)$, $PT$) \EndIf \EndFor \EndIf \EndProcedure |
438 \end{algorithmic} |
425 \end{algorithmic} |
439 \end{algorithm} |
426 \end{algorithm} |
440 |
427 |
441 |
428 |
442 For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of |
429 For the current mapping $\mathfrak{m}$, the algorithm computes $P_\mathfrak{m}$, the set of |
443 candidate node pairs for adding to the current mapping $\mathfrak{m}_s$. |
430 candidate node pairs for extending the current mapping $\mathfrak{m}$. |
444 |
431 |
445 For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and |
432 For each pair $p$ in $P_\mathfrak{m}$, $Cons_{PT}(p,\mathfrak{m})$ and |
446 $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and |
433 $Cut_{PT}(p,\mathfrak{m})$ are evaluated. If the former is true and |
447 the latter is false, the whole process is recursively applied to |
434 the latter is false, the whole process is recursively applied to |
448 $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it |
435 $extend(\mathfrak{m},p)$. Otherwise, $extend(\mathfrak{m},p)$ is not consistent by $PT$, or it |
449 can be proved that $\mathfrak{m}$ can not be extended to a whole mapping. |
436 can be proved that $\mathfrak{m}$ can not be extended to a whole mapping. |
450 |
437 |
451 In order to make sure of the correctness, see |
438 In order to make sure of the correctness, see Claim~\ref{claim:consMapps}. |
452 \begin{claim} |
439 \begin{claim}\label{claim:consMapps} |
453 Through consistent mappings, only consistent whole mappings can be |
440 Through consistent mappings, only consistent whole mappings can be |
454 reached, and all the consistent whole mappings are reachable through |
441 reached, and all the consistent whole mappings are reachable through |
455 consistent mappings. |
442 consistent mappings. |
456 \end{claim} |
443 \end{claim} |
457 |
444 |
458 Note that a mapping may be reached in exponentially many different ways, since the |
445 Note that a mapping may be reached in exponentially many different ways, since the |
459 order of extensions does not influence the nascent mapping. |
446 order of extensions does not influence the nascent mapping. |
460 |
447 |
461 However, one may observe |
448 However, one may make the following observations. |
|
449 |
|
450 %\begin{claim} |
|
451 %\label{claim:claimTotOrd} |
|
452 %Let $\prec$ be an arbitrary total ordering relation on $V_{1}$. If |
|
453 %the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which |
|
454 %\begin{center} |
|
455 %$\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$, |
|
456 %\end{center} |
|
457 %then no mapping can be reached more than once, and each whole mapping %remains reachable. |
|
458 %\end{claim} |
|
459 |
|
460 \begin{definition} |
|
461 A total order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of |
|
462 $V_{1}$ is \textbf{matching order} if VF2 can cover $u_{\sigma(d)}$ on the $d$-th level for all $d\in\{1,..,|V_{1}|\}$. |
|
463 \end{definition} |
462 |
464 |
463 \begin{claim} |
465 \begin{claim} |
464 \label{claim:claimTotOrd} |
466 \label{claim:claimTotOrd} |
465 Let $\prec$ be an arbitrary total ordering relation on $V_{1}$. If |
467 If VF2 is prescribed to cover the nodes of $G_1$ according to a matching order, then no mapping can be reached more than once and each whole mapping remains reachable. |
466 the algorithm ignores each $p=(u,v) \in P_\mathfrak{m}$, for which |
|
467 \begin{center} |
|
468 $\exists (\tilde{u},\tilde{v})\in P_\mathfrak{m}: \tilde{u} \prec u$, |
|
469 \end{center} |
|
470 then no mapping can be reached more than once, and each whole mapping remains reachable. |
|
471 \end{claim} |
468 \end{claim} |
472 |
469 |
473 Note that the cornerstone of the improvements to VF2 is a proper |
470 Note that the cornerstone of the improvements to VF2 is to choose a proper |
474 choice of a total ordering. |
471 matching order. |
475 |
472 |
476 \subsection{The candidate set} |
473 \subsection{The candidate set} |
477 \label{candidateComputingVF2} |
474 \label{candidateComputingVF2} |
478 Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$. |
475 Let $P_\mathfrak{m}$ be the set of the candidate pairs for inclusion in $\mathfrak{m}$. |
479 |
476 |
480 \begin{notation} |
477 \begin{notation} |
481 Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and |
478 Let $\mathbf{T_{1}(\mathfrak{m})}:=\{u \in V_{1}\backslash\mathfrak{D}(\mathfrak{m}) : \exists \tilde{u}\in{\mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1}}\}$, and |
482 $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$. |
479 $\mathbf{T_{2}(\mathfrak{m})} := \{v \in V_{2}\backslash\mathfrak{R}(\mathfrak{m}) : \exists\tilde{v}\in{\mathfrak{R}(\mathfrak{m}):(v,\tilde{v})\in E_{2}}\}$. |
483 \end{notation} |
480 \end{notation} |
484 |
481 |
485 The set $P_\mathfrak{m}$ includes the pairs of uncovered neighbours of covered |
482 The set $P_\mathfrak{m}$ contains the pairs of uncovered neighbours of covered |
486 nodes, and if there is not such a node pair, all the pairs containing |
483 nodes, and if there is not such a node pair, all the pairs containing |
487 two uncovered nodes are added. Formally, let |
484 two uncovered nodes are added. Formally, let |
488 \[ |
485 \[ |
489 P_\mathfrak{m}\!=\! |
486 P_\mathfrak{m}\!=\! |
490 \begin{cases} |
487 \begin{cases} |
494 &\hspace{-0.15cm}\text{otherwise}. |
491 &\hspace{-0.15cm}\text{otherwise}. |
495 \end{cases} |
492 \end{cases} |
496 \] |
493 \] |
497 |
494 |
498 \subsection{Consistency} |
495 \subsection{Consistency} |
499 Suppose $p=(u,v)$, where $u\in V_{1}$ and $v\in V_{2}$, $\mathfrak{m}$ is a consistent mapping by |
496 Let $p=(u,v)\in V_{1}\times V_{2}$, and suppose $\mathfrak{m}$ is a consistent mapping by |
500 $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether |
497 $PT$. $Cons_{PT}(p,\mathfrak{m})$ checks whether |
501 including pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$. |
498 adding pair $p$ into $\mathfrak{m}$ leads to a consistent mapping by $PT$. |
502 |
499 |
503 For example, the consistency function of induced subgraph isomorphism is as follows. |
500 For example, the consistency function of the induced subgraph isomorphism problem is the following. |
504 \begin{notation} |
501 \begin{notation} |
505 Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} : |
502 Let $\mathbf{\Gamma_{1} (u)}:=\{\tilde{u}\in V_{1} : |
506 (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2} |
503 (u,\tilde{u})\in E_{1}\}$, and $\mathbf{\Gamma_{2} |
507 (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$. |
504 (v)}:=\{\tilde{v}\in V_{2} : (v,\tilde{v})\in E_{2}\}$, where $u\in V_{1}$ and $v\in V_{2}$. That is, $\mathbf{\Gamma_{i} (w)}$ denotes the set of neighbours of node $w$ in $G_i$ $(i=1,2)$. |
508 \end{notation} |
505 \end{notation} |
509 |
506 |
510 $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ $\Leftrightarrow |
|
511 (\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1} |
|
512 \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. The |
|
513 following formulation gives an efficient way of calculating |
|
514 $Cons_{IND}$. |
|
515 \begin{claim} |
507 \begin{claim} |
516 $Cons_{IND}((u,v),\mathfrak{m}):=\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge |
508 $extend(\mathfrak{m},(u,v))$ is a consistent mapping by $IND$ if and only if $\mathfrak{m}$ is consistent and $(\forall \tilde{u}\in \mathfrak{D}(\mathfrak{m}): (u,\tilde{u})\in E_{1} |
|
509 \Leftrightarrow (v,\mathfrak{m}(\tilde{u}))\in E_{2})$. |
|
510 \end{claim} |
|
511 |
|
512 The following formulation gives an efficient way of calculating $Cons_{IND}$. |
|
513 |
|
514 \begin{claim} |
|
515 $Cons_{IND}((u,v),\mathfrak{m}):=Cons_{IND}(\mathfrak{m})\wedge\mathcal{L}(u)\!\!=\!\!\mathcal{L}(v)\wedge(\forall \tilde{v}\in \Gamma_{2}(v)\cap\mathfrak{R}(\mathfrak{m}):(u,\mathfrak{m}^{-1}(\tilde{v}))\in E_{1})\wedge |
517 (\forall \tilde{u}\in \Gamma_{1}(u) |
516 (\forall \tilde{u}\in \Gamma_{1}(u) |
518 \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is a |
517 \cap \mathfrak{D}(\mathfrak{m}):(v,\mathfrak{m}(\tilde{u}))\in E_{2})$ is the consistency function for $IND$. |
519 consistency function in the case of $IND$. |
|
520 \end{claim} |
518 \end{claim} |
521 |
519 |
522 \subsection{Cutting rules} |
520 \subsection{Cutting rules} |
523 $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently |
521 $Cut_{PT}(p,\mathfrak{m})$ is defined by a collection of efficiently |
524 verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can |
522 verifiable conditions. The requirement is that $Cut_{PT}(p,\mathfrak{m})$ can |
525 be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a |
523 be true only if it is impossible to extend $extend(\mathfrak{m},p)$ to a |
526 whole mapping. |
524 whole mapping. |
527 |
525 |
528 As an example, the cutting function of induced subgraph isomorphism is presented. |
526 As an example, a cutting function of induced subgraph isomorphism problem is presented. |
529 \begin{notation} |
527 \begin{notation} |
530 Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash |
528 Let $\mathbf{\tilde{T}_{1}}(\mathfrak{m}):=(V_{1}\backslash |
531 \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and |
529 \mathfrak{D}(\mathfrak{m}))\backslash T_{1}(\mathfrak{m})$, and |
532 \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash |
530 \\ $\mathbf{\tilde{T}_{2}}(\mathfrak{m}):=(V_{2}\backslash |
533 \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$. |
531 \mathfrak{R}(\mathfrak{m}))\backslash T_{2}(\mathfrak{m})$. |
535 |
533 |
536 \begin{claim} |
534 \begin{claim} |
537 $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| < |
535 $Cut_{IND}((u,v),\mathfrak{m}):= |\Gamma_{2} (v)\ \cap\ T_{2}(\mathfrak{m})| < |
538 |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap |
536 |\Gamma_{1} (u)\ \cap\ T_{1}(\mathfrak{m})| \vee |\Gamma_{2}(v)\cap |
539 \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap |
537 \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}(u)\cap |
540 \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function by $IND$. |
538 \tilde{T}_{1}(\mathfrak{m})|$ is a cutting function for $IND$. |
541 \end{claim} |
539 \end{claim} |
542 |
540 |
543 \section{The VF2++ Algorithm}\label{sec:VF2ppAlg} |
541 \section{The VF2++ Algorithm}\label{sec:VF2ppAlg} |
544 Although any total ordering relation makes the search space of VF2 a |
542 Although any matching order makes the search space of VF2 a |
545 tree, its choice turns out to dramatically influence the number of |
543 tree, its choice turns out to dramatically influence the number of |
546 visited states. The goal is to determine an efficient one as quickly |
544 visited states. The goal is to determine an efficient one as quickly |
547 as possible. |
545 as possible. |
548 |
546 |
549 The main reason for VF2++' superiority over VF2 is twofold. Firstly, |
547 The main reason for the superiority of VF2++ over VF2 is twofold. Firstly, |
550 taking into account the structure and the node labeling of the graph, |
548 taking into account the structure and the node labelling of the graph, |
551 VF2++ determines a state order in which most of the unfruitful |
549 VF2++ determines a matching order in which most of the unfruitful |
552 branches of the search space can be pruned immediately. Secondly, |
550 branches of the search space can be pruned immediately. Secondly, |
553 introducing more efficient --- nevertheless still easier to compute |
551 introducing more efficient --- nevertheless still easier to compute |
554 --- cutting rules reduces the chance of going astray even further. |
552 --- cutting rules reduces the chance of going astray even further. |
555 |
553 |
556 In addition to the usual subgraph isomorphism, specialized versions |
554 In addition to the usual subgraph isomorphism problem, specialized versions |
557 for induced subgraph isomorphism and for graph isomorphism have been |
555 for induced subgraph and graph isomorphism problems have been |
558 designed. |
556 designed. |
559 |
557 |
560 Note that a weaker version of the cutting rules and an efficient |
558 Note that a weaker version of the cutting rules of VF2++ and an efficient |
561 candidate set calculating were described in \cite{VF2Plus}. |
559 candidate set calculation method were described in~\cite{VF2Plus}. |
562 |
560 |
563 It should be noted that all the methods described in this section are |
561 It should be noted that all the methods described in this section are |
564 extendable to handle directed graphs and edge labels as well. |
562 extendable to handle directed graphs and edge labels as well. |
565 The basic ideas and the detailed description of VF2++ are provided in |
563 The basic ideas and the detailed description of VF2++ are provided in |
566 the following.\newline |
564 the following.\newline |
580 consistent pair in $G_{2}$? The more covered neighbours a node in |
578 consistent pair in $G_{2}$? The more covered neighbours a node in |
581 $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has |
579 $T_{1}(\mathfrak{m})$ has --- i.e. the largest $Conn_{\mathfrak{D}(\mathfrak{m})}$ it has |
582 ---, the more rarely satisfiable consistency constraints for its pair |
580 ---, the more rarely satisfiable consistency constraints for its pair |
583 are given. |
581 are given. |
584 |
582 |
585 In biology, most of the graphs are sparse, thus several nodes in |
583 Most of the graphs of biological and chemical structures are sparse, thus several nodes in |
586 $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes |
584 $T_{1}(\mathfrak{m})$ may have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$, which makes |
587 reasonable to define a secondary and a tertiary order between them. |
585 reasonable to define a secondary and a tertiary order between them. |
588 The observation above proves itself to be as determining, that the |
586 The observation above proves itself to be as determining, that the |
589 secondary ordering prefers nodes with the most uncovered neighbours |
587 secondary ordering prefers nodes with the most uncovered neighbours |
590 among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase |
588 among which have the same $Conn_{\mathfrak{D}(\mathfrak{m})}$ to increase |
591 $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes so much, as possible. The |
589 $Conn_{\mathfrak{D}(\mathfrak{m})}$ of uncovered nodes as much, as possible. The tertiary ordering prefers nodes having the rarest uncovered labels in $G_2$. |
592 tertiary ordering prefers nodes having the rarest uncovered labels. |
590 |
593 |
591 Note that the secondary ordering is the same as ordering by degrees, |
594 Note that the secondary ordering is the same as the ordering by $deg$, |
|
595 which is a static data in front of the above used. |
592 which is a static data in front of the above used. |
596 |
593 |
597 These rules can easily result in a matching order which contains the |
594 These rules can easily result in a matching order which contains the |
598 nodes of a long path successively, whose nodes may have low $Conn$ and |
595 nodes of a long path successively, whose nodes may have low $Conn$ and |
599 is easily matchable into $G_{2}$. To avoid that, a BFS order is |
596 is easily matchable into $G_{2}$. To try to avoid that, a Breadth-first-search order is used, and on each of its levels, the ordering procedure described above is applied. |
600 used, which provides the shortest possible paths. |
|
601 \newline |
597 \newline |
602 |
598 |
603 In the following, some examples on which the VF2 may be slow are |
599 In the following, some examples on which the VF2 may be slow are |
604 described, although they are easily solvable by using a proper |
600 described, although they are easily solvable by using a proper |
605 matching order. |
601 matching order. |
615 $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in V_{1}\backslash |
611 $\mathcal{L}(\tilde{u}):=red \ \forall \tilde{u}\in V_{1}\backslash |
616 \{u\}$ |
612 \{u\}$ |
617 \newline |
613 \newline |
618 $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in V_{2}\backslash |
614 $\mathcal{L}(\tilde{v}):=red \ \forall \tilde{v}\in V_{2}\backslash |
619 \{v\}$ |
615 \{v\}$ |
620 \newline |
|
621 |
616 |
622 Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since |
617 Now, any mapping by $\mathcal{L}$ must contain $(u,v)$, since |
623 $u$ is black and no node in $V_{2}$ has a black label except |
618 $u$ is black and no node in $V_{2}$ has a black label except |
624 $v$. If unfortunately $u$ were the last node which will get covered, |
619 $v$. If unfortunately $u$ were the last node which will get covered, |
625 VF2 would check only in the last steps, whether $u$ can be matched to |
620 VF2 would check only in the last steps, whether $u$ can be matched to |
626 $v$. |
621 $v$. |
627 \newline |
622 |
628 However, had $u$ been the first matched node, u would have been |
623 However, had $u$ been the first matched node, u would have been |
629 matched immediately to v, so all the mappings would have been |
624 matched immediately to v, so all the mappings would have been |
630 precluded in which node labels can not correspond. |
625 precluded in which node labels can not correspond. |
631 \end{example} |
626 \end{example} |
632 |
627 |
633 \begin{example} |
628 \begin{example} |
634 Suppose there is no node label given, $G_{1}$ is a small graph and |
629 Suppose there is no node label given, and $G_{1}$ is a small graph that can not be mapped into $G_{2}$ and $u\in V_{1}$. |
635 can not be mapped into $G_{2}$ and $u\in V_{1}$. |
|
636 \newline |
630 \newline |
637 Let $G'_{1}:=(V_{1}\cup |
631 Let $G'_{1}:=(V_{1}\cup |
638 \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup |
632 \{u'_{1},u'_{2},..,u'_{k}\},E_{1}\cup |
639 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is, |
633 \{(u,u'_{1}),(u'_{1},u'_{2}),..,(u'_{k-1},u'_{k})\})$, that is, |
640 $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint |
634 $G'_{1}$ is $G_{1}\cup \{ a\ k$ long path, which is disjoint |
641 from $G_{1}$ and one of its starting points is connected to $u\in |
635 from $G_{1}$ and one of its starting points is connected to $u\in |
642 V_{1}\}$. |
636 V_{1}\}$. |
643 \newline |
637 |
644 Is there a subgraph of $G_{2}$, which is isomorph with |
|
645 $G'_{1}$? |
|
646 \newline |
|
647 If unfortunately the nodes of the path were the first $k$ nodes in the |
638 If unfortunately the nodes of the path were the first $k$ nodes in the |
648 matching order, the algorithm would iterate through all the possible k |
639 matching order, the algorithm would iterate through all the possible k |
649 long paths in $G_{2}$, and it would recognize that no path can be |
640 long paths in $G_{2}$, and it would recognize that no path can be |
650 extended to $G'_{1}$. |
641 extended to $G'_{1}$. |
651 \newline |
642 \newline |
654 \end{example} |
645 \end{example} |
655 |
646 |
656 These examples may look artificial, but the same problems also appear |
647 These examples may look artificial, but the same problems also appear |
657 in real-world instances, even though in a less obvious way. |
648 in real-world instances, even though in a less obvious way. |
658 |
649 |
659 \subsection{Preparations} |
650 %\subsection{Preparations} |
660 \begin{claim} |
651 %\begin{claim} |
661 \label{claim:claimCoverFromLeft} |
652 %\label{claim:claimCoverFromLeft} |
662 The total ordering relation uniquely determines a node order, in which |
653 %The total ordering relation uniquely determines a node order, in which |
663 the nodes of $V_{1}$ will be covered by VF2. From the point of |
654 %the nodes of $V_{1}$ will be covered by VF2. From the point of |
664 view of the matching procedure, this means, that always the same node |
655 %view of the matching procedure, this means, that always the same node |
665 of $G_{1}$ will be covered on the d-th level. |
656 %of $G_{1}$ will be covered on the $d$-th level. |
666 \end{claim} |
657 %\end{claim} |
667 |
658 |
668 \begin{definition} |
659 %\begin{definition} |
669 An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of |
660 %An order $(u_{\sigma(1)},u_{\sigma(2)},..,u_{\sigma(|V_{1}|)})$ of |
670 $V_{1}$ is \textbf{matching order} if exists $\prec$ total |
661 %$V_{1}$ is \textbf{matching order} if there exists $\prec$ total |
671 ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds |
662 %ordering relation, s.t. the VF2 with $\prec$ on the d-th level finds |
672 pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$. |
663 %pair for $u_{\sigma(d)}$ for all $d\in\{1,..,|V_{1}|\}$. |
673 \end{definition} |
664 %\end{definition} |
674 |
665 |
675 \begin{claim}\label{claim:MOclaim} |
666 %\begin{claim}\label{claim:MOclaim} |
676 A total ordering is matching order iff the nodes of every component |
667 %A total ordering is matching order iff the nodes of every component |
677 form an interval in the node sequence, and every node connects to a |
668 %form an interval in the node sequence, and every node connects to a |
678 previous node in its component except the first node of each component. |
669 %previous node in its component except the first node of each component. |
679 \end{claim} |
670 %\end{claim} |
680 |
671 |
681 To summing up, a total ordering always uniquely determines a matching |
672 %In summary, a total ordering always uniquely determines a matching |
682 order, and every matching order can be determined by a total ordering, |
673 %order, and every matching order can be determined by a total ordering, |
683 however, more than one different total orderings may determine the |
674 %however, more than one different total orderings may determine the |
684 same matching order. |
675 %same matching order. |
685 |
676 |
686 \subsection{Total ordering} |
677 \subsection{Matching order} |
687 The matching order will be searched directly. |
|
688 \begin{notation} |
678 \begin{notation} |
689 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} : |
679 Let \textbf{F$_\mathcal{M}$(l)}$:=|\{v\in V_{2} : |
690 l=\mathcal{L}(v)\}|-|\{u\in V_{1}\backslash \mathcal{M} : l=\mathcal{L}(u)\}|$ , |
680 l=\mathcal{L}(v)\}|-|\{u\in \mathcal{M} : l=\mathcal{L}(u)\}|$, |
691 where $l$ is a label and $\mathcal{M}\subseteq V_{1}$. |
681 where $l$ is a label and $\mathcal{M}\subseteq V_{1}$. |
692 \end{notation} |
682 \end{notation} |
693 |
683 |
694 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{-f}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$. |
684 \begin{definition}Let $\mathbf{arg\ max}_{f}(S) :=\{u\in S : f(u)=max_{v\in S}\{f(v)\}\}$ and $\mathbf{arg\ min}_{f}(S) := arg\ max_{(-f)}(S)$, where $S$ is a finite set and $f:S\longrightarrow \mathbb{R}$. |
695 \end{definition} |
685 \end{definition} |
696 |
686 |
697 \begin{algorithm} |
687 \begin{notation} |
|
688 Let $deg(v)$ denote the degree of node $v$. |
|
689 \end{notation} |
|
690 |
|
691 \begin{algorithm}[H] |
698 \algtext*{EndIf} |
692 \algtext*{EndIf} |
699 \algtext*{EndProcedure} |
693 \algtext*{EndProcedure} |
700 \algtext*{EndWhile} |
694 \algtext*{EndWhile} |
701 \algtext*{EndFor} |
695 \algtext*{EndFor} |
702 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu} |
696 \caption{\hspace{0.5cm}$The\ method\ of\ VF2++\ for\ determining\ the\ node\ order$}\label{alg:VF2PPPseu} |
718 \algtext*{EndProcedure}%ne nyomtasson .. |
711 \algtext*{EndProcedure}%ne nyomtasson .. |
719 \algtext*{EndWhile} |
712 \algtext*{EndWhile} |
720 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1} |
713 \caption{\hspace{.5cm}$The\ method\ for\ processing\ a\ level\ of\ the\ BFS\ tree$}\label{alg:VF2PPProcess1} |
721 \begin{algorithmic}[1] |
714 \begin{algorithmic}[1] |
722 \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$} |
715 \Procedure{VF2++ProcessLevel}{$V_{d}$} \While{$V_d\neq\emptyset$} |
723 \State $m\in$ arg min$_{F_{\mathcal{M}\circ\ \mathcal{L}}}($ arg max$_{deg}($arg |
716 \State $m\in$ arg min$_{F_\mathcal{M}\circ\ \mathcal{L}}($ arg max$_{deg}($arg |
724 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$ |
717 max$_{Conn_{\mathcal{M}}}(V_{d})))$ \State $V_d:=V_d\backslash m$ |
725 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh |
718 \State Append node $m$ to the end of $\mathcal{M}$ \State Refresh |
726 $F_\mathcal{M}$ \EndWhile \EndProcedure |
719 $F_\mathcal{M}$ \EndWhile \EndProcedure |
727 \end{algorithmic} |
720 \end{algorithmic} |
728 \end{algorithm} |
721 \end{algorithm} |
729 |
722 |
730 Algorithm~\ref{alg:VF2PPPseu} is a high level description of the |
723 Algorithm~\ref{alg:VF2PPPseu} is a high-level description of the |
731 matching order procedure of VF2++. It computes a BFS tree for each |
724 matching order procedure of VF2++. It computes a BFS tree for each |
732 component in ascending order of their rarest node labels and largest $deg$, |
725 component in ascending order of their rarest node labels and largest $deg$, |
733 whose root vertex is the component's minimal |
726 whose root vertex is the minimal node of its component. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending |
734 node. Algorithm~\ref{alg:VF2PPProcess1} is a method to process a level of the BFS tree, which appends the nodes of the current level in descending |
|
735 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately |
727 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$ separately |
736 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately. |
728 to $\mathcal{M}$, and refreshes $F_\mathcal{M}$ immediately. |
737 |
729 |
738 Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu} |
730 \begin{claim} |
739 provides a matching order. |
731 Algorithm~\ref{alg:VF2PPPseu} provides a matching order. |
|
732 \end{claim} |
740 |
733 |
741 |
734 |
742 \subsection{Cutting rules} |
735 \subsection{Cutting rules} |
743 \label{VF2PPCuttingRules} |
736 \label{VF2PPCuttingRules} |
744 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels. |
737 This section presents the cutting rules of VF2++, which are improved by using extra information coming from the node labels. |
748 $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge |
741 $\mathbf{\Gamma_{2}^{l}(v)}:=\{\tilde{v} : \mathcal{L}(\tilde{v})=l \wedge |
749 \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in |
742 \tilde{v}\in \Gamma_{2} (v)\}$, where $u\in V_{1}$, $v\in |
750 V_{2}$ and $l$ is a label. |
743 V_{2}$ and $l$ is a label. |
751 \end{notation} |
744 \end{notation} |
752 |
745 |
753 \subsubsection{Induced subgraph isomorphism} |
746 \begin{claim}[Cutting function for ISO] |
754 \begin{claim} |
747 \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function for ISO. |
755 \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by IND. |
|
756 \end{claim} |
748 \end{claim} |
757 \subsubsection{Graph isomorphism} |
749 |
758 \begin{claim} |
750 \begin{claim}[Cutting function for IND] |
759 \[LabCut_{ISO}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!\neq\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| \neq |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function by ISO. |
751 \[LabCut_{IND}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\ \vee\]\[\bigvee_{l\ is\ label} \newline |\Gamma_{2}^{l}(v)\cap \tilde{T}_{2}(\mathfrak{m})| < |\Gamma_{1}^{l}(u)\cap \tilde{T}_{1}(\mathfrak{m})|\] is a cutting function for IND. |
760 \end{claim} |
752 \end{claim} |
761 |
753 |
762 \subsubsection{Subgraph isomorphism} |
754 \begin{claim}[Cutting function for SUB] |
763 \begin{claim} |
755 \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function for SUB. |
764 \[LabCut_{SU\!B}((u,v),\mathfrak{m}):=\bigvee_{l\ is\ label}|\Gamma_{2}^{l} (v) \cap T_{2}(\mathfrak{m})|\!<\!|\Gamma_{1}^{l}(u)\cap T_{1}(\mathfrak{m})|\] is a cutting function by SUB. |
|
765 \end{claim} |
756 \end{claim} |
766 |
757 |
767 |
758 |
768 |
759 |
769 \section{Implementation details}\label{sec:VF2ppImpl} |
760 \section{Implementation details}\label{sec:VF2ppImpl} |
770 This section provides a detailed summary of an efficient |
761 This section provides a detailed summary of an efficient |
771 implementation of VF2++. |
762 implementation of VF2++. |
|
763 \begin{notation} |
|
764 Let $\Delta_1$ and $\Delta_2$ denote the largest degree in $G_1$ and $G_2$, respectively, and let $\Delta=\max\{\Delta_1,\Delta_2\}$. |
|
765 \end{notation} |
772 \subsection{Storing a mapping} |
766 \subsection{Storing a mapping} |
773 After fixing an arbitrary node order ($u_0, u_1, .., |
767 After fixing an arbitrary node order ($u_0, u_1, .., |
774 u_{|G_{1}|-1}$) of $G_{1}$, an array $M$ is usable to store |
768 u_{|V_{1}|-1}$) of $G_{1}$, an array $M$ can be used to store |
775 the current mapping in the following way. |
769 the current mapping in the following way. |
776 \[ |
770 \[ |
777 M[i] = |
771 M[i] = |
778 \begin{cases} |
772 \begin{cases} |
779 v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D & |
773 v & if\ (u_i,v)\ is\ in\ the\ mapping\\ INV\!ALI\!D & |
780 if\ no\ node\ has\ been\ mapped\ to\ u_i, |
774 if\ no\ node\ has\ been\ mapped\ to\ u_i, |
781 \end{cases} |
775 \end{cases} |
782 \] |
776 \] |
783 where $i\in\{0,1, ..,|G_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$ |
777 where $i\in\{0,1, ..,|V_{1}|-1\}$, $v\in V_{2}$ and $INV\!ALI\!D$ |
784 means "no node". |
778 means "no node". |
785 \subsection{Avoiding the recurrence} |
779 \subsection{Avoiding the recurrence} |
786 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized |
780 The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized |
787 as a \textit{while loop}, which has a loop counter $depth$ denoting the |
781 as a \textit{while loop}, which has a loop counter $depth$ denoting the |
788 all-time depth of the recursion. Fixing a matching order, let $M$ |
782 current depth of the recursion. Fixing a matching order, let $M$ |
789 denote the array storing the all-time mapping. Based on Claim~\ref{claim:claimCoverFromLeft}, |
783 denote the array storing the current mapping. Observe that |
790 $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before |
784 $M$ is $INV\!ALI\!D$ from index $depth$+1 and not $INV\!ALI\!D$ before |
791 $depth$. $M[depth]$ changes |
785 $depth$. $M[depth]$ changes |
792 while the state is being processed, but the property is held before |
786 while the state is being processed, but the property is held before |
793 both stepping back to a predecessor state and exploring a successor |
787 both stepping back to a predecessor state and exploring a successor |
794 state. |
788 state. |
795 |
789 |
796 The necessary part of the candidate set is easily maintainable or |
790 The necessary part of the candidate set is easily maintainable or |
797 computable by following |
791 computable by following |
798 Section~\ref{candidateComputingVF2}. A much faster method |
792 the steps described in Section~\ref{candidateComputingVF2}. A much faster method |
799 has been designed for biological- and sparse graphs, see the next |
793 has been designed for biological and sparse graphs, see the next |
800 section for details. |
794 section for details. |
801 |
795 |
802 \subsection{Calculating the candidates for a node} |
796 \subsection{Calculating the candidates for a node} |
803 Being aware of Claim~\ref{claim:claimCoverFromLeft}, the |
797 The task is not to maintain the candidate set, but to generate the |
804 task is not to maintain the candidate set, but to generate the |
|
805 candidate nodes in $G_{2}$ for a given node $u\in V_{1}$. In |
798 candidate nodes in $G_{2}$ for a given node $u\in V_{1}$. In |
806 case of any of the three problem types and a mapping $\mathfrak{m}$, if a node $v\in |
799 case of any of the three problem types and a mapping $\mathfrak{m}$, if a node $v\in |
807 V_{2}$ is a potential pair of $u\in V_{1}$, then $\forall |
800 V_{2}$ is a potential pair of $u\in V_{1}$, then $\forall |
808 u'\in \mathfrak{D}(\mathfrak{m}) : (u,u')\in |
801 u'\in \mathfrak{D}(\mathfrak{m}) : (u,u')\in |
809 E_{1}\Rightarrow (v,\mathfrak{m}(u'))\in |
802 E_{1}\Rightarrow (v,\mathfrak{m}(u'))\in |
810 E_{2}$. That is, each covered neighbour of $u$ has to be mapped to |
803 E_{2}$. That is, each covered neighbour of $u$ has to be mapped to |
811 a covered neighbour of $v$. |
804 a covered neighbour of $v$, i.e. selecting arbitrarily a covered neighbour $u'$ of $u$, all of the admissible candidates for $u$ are among the neighbours of $\mathfrak{m}(u')$. |
812 |
805 |
813 Having said that, an algorithm running in $\Theta(deg)$ time is |
806 Having said that, an algorithm running in $\Theta(\Delta_2)$ time is |
814 describable if there exists a covered node in the component containing |
807 describable if there exists a covered node in the component containing |
815 $u$, and a linear one otherwise. |
808 $u$, and a linear one otherwise. |
816 |
809 |
817 |
810 |
818 \subsection{Determining the node order} |
811 \subsection{Determining the node order} |
821 |
814 |
822 For using lookup tables, the node labels are associated with the |
815 For using lookup tables, the node labels are associated with the |
823 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It |
816 numbers $\{0,1,..,|K|-1\}$, where $K$ is the set of the labels. It |
824 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order |
817 enables $F_\mathcal{M}$ to be stored in an array. At first, the node order |
825 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes |
818 $\mathcal{M}=\emptyset$, so $F_\mathcal{M}[i]$ is the number of nodes |
826 in $V_{1}$ having label $i$, which is easy to compute in |
819 in $V_{2}$ having label $i$, which is easy to compute in |
827 $\Theta(|V_{1}|)$ steps. |
820 $\Theta(|V_{2}|)$ steps. |
828 |
821 |
829 Representing $\mathcal{M}\subseteq V_{1}$ as an array of |
822 Representing $\mathcal{M}\subseteq V_{1}$ as an array of |
830 size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done inplace by swapping nodes. |
823 size $|V_{1}|$, both the computation of the BFS tree, and processing its levels by Algorithm~\ref{alg:VF2PPProcess1} can be done in-place by swapping nodes. |
831 |
824 |
832 \subsection{Cutting rules} |
825 \subsection{Cutting rules} |
833 In Section~\ref{VF2PPCuttingRules}, the cutting rules were |
826 In Section~\ref{VF2PPCuttingRules}, the cutting rules were |
834 described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$ |
827 described using the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$ |
835 and $\tilde T_{2}$, which are dependent on the all-time mapping |
828 and $\tilde T_{2}$, which are dependent on the current mapping. The aim is to check the labelled cutting |
836 (i.e. on the all-time state). The aim is to check the labeled cutting |
829 rules of VF2++ in $\Theta(\Delta)$ time. |
837 rules of VF2++ in $\Theta(deg)$ time. |
|
838 |
830 |
839 Firstly, suppose that these four sets are given in such a way, that |
831 Firstly, suppose that these four sets are given in such a way, that |
840 checking whether a node is in a certain set takes constant time, |
832 checking whether a node is in a certain set takes constant time, |
841 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an |
833 e.g. they are given by their 0-1 characteristic vectors. Let $L$ be an |
842 initially zero integer lookup table of size $|K|$. After incrementing |
834 initially zero integer lookup table of size $|K|$. After incrementing |
843 $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and |
835 $L[\mathcal{L}(u')]$ for all $u'\in \Gamma_{1}(u) \cap T_{1}(\mathfrak{m})$ and |
844 decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap |
836 decrementing $L[\mathcal{L}(v')]$ for all $v'\in\Gamma_{2} (v) \cap |
845 T_{2}(s)$, the first part of the cutting rules is checkable in |
837 T_{2}(\mathfrak{m})$, the first part of the cutting rules is checkable in |
846 $\Theta(deg)$ time by considering the proper signs of $L$. Setting $L$ |
838 $\Theta(\Delta)$ time by considering the proper signs of $L$. Setting $L$ |
847 to zero takes $\Theta(deg)$ time again, which makes it possible to use |
839 to zero takes $\Theta(\Delta)$ time again, which makes it possible to use |
848 the same table through the whole algorithm. The second part of the |
840 the same table through the whole algorithm. The second part of the |
849 cutting rules can be verified using the same method with $\tilde |
841 cutting rules can be verified using the same method with $\tilde |
850 T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and |
842 T_{1}$ and $\tilde T_{2}$ instead of $T_{1}$ and |
851 $T_{2}$. Thus, the overall complexity is $\Theta(deg)$. |
843 $T_{2}$. Thus, the overall time complexity is $\Theta(\Delta)$. |
852 |
844 |
853 Another integer lookup table storing the number of covered neighbours |
845 To maintain the sets $T_{1}$, $T_{2}$, $\tilde T_{1}$ |
854 of each node in $G_{2}$ gives all the information about the sets |
846 and $\tilde T_{2}$, two other integer lookup tables storing the number of covered neighbours of the nodes of the two graphs can be used. This representation allows constant-time membership checking, furthermore it is maintainable in $\Theta(\Delta)$ time whenever a node pair is added or subtracted by incrementing |
855 $T_{2}$ and $\tilde T_{2}$, which is maintainable in |
|
856 $\Theta(deg)$ time when a pair is added or substracted by incrementing |
|
857 or decrementing the proper indices. A further improvement is that the |
847 or decrementing the proper indices. A further improvement is that the |
858 values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on |
848 values of $L[\mathcal{L}(u')]$ in case of checking $u$ are dependent only on |
859 $u$, i.e. on the size of the mapping, so for each $u\in V_{1}$ an |
849 $u$, i.e. on the current depth of the recursion, so for each $u\in V_{1}$, an array of pairs \textit{(label, number of such labels)} can store $L$. Note that these arrays are at most of size |
860 array of pairs (label, number of such labels) can be stored to skip |
850 $\Delta_1$ if pairs with non-appearing node labels are discarded. |
861 the maintaining operations. Note that these arrays are at most of size |
|
862 $deg$. |
|
863 |
851 |
864 Using similar techniques, the consistency function can be evaluated in |
852 Using similar techniques, the consistency function can be evaluated in |
865 $\Theta(deg)$ steps, as well. |
853 $\Theta(\Delta)$ steps, as well. |
866 |
854 |
867 \section{Experimental results}\label{sec:ExpRes} |
855 \section{Experimental results}\label{sec:ExpRes} |
868 This section compares the performance of VF2++ and VF2 Plus. According to |
856 This section compares the performance of VF2++ and VF2~Plus. According to |
869 our experience, both algorithms run faster than VF2 with orders of |
857 our experience, both algorithms run faster than VF2 with orders of |
870 magnitude, thus its inclusion was not reasonable. |
858 magnitude, thus its inclusion was not reasonable. |
871 |
859 |
872 The algorithms were implemented in C++ using the open source |
860 The algorithms were implemented in C++ using the open-source |
873 LEMON graph and network optimization library\cite{LEMON}. The test were carried out on a linux based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM. |
861 LEMON graph and network optimization library~\cite{LEMON}. The tests were carried out on a Linux-based system with an Intel i7 X980 3.33 GHz CPU and 6 GB of RAM. |
874 \subsection{Biological graphs} |
862 \subsection{Biological graphs} |
875 The tests have been executed on a recent biological dataset created |
863 The tests have been executed on a recent biological dataset created |
876 for the International Contest on Pattern Search in Biological |
864 for the International Contest on Pattern Search in Biological |
877 Databases\cite{Content}, which has been constructed of molecule, |
865 Databases~\cite{Content}, which has been constructed of molecule, |
878 protein and contact map graphs extracted from the Protein Data |
866 protein and contact map graphs extracted from the Protein Data |
879 Bank\cite{ProteinDataBank}. |
867 Bank~\cite{ProteinDataBank}. |
880 |
868 |
881 The molecule dataset contains small graphs with less than 100 nodes |
869 The molecule dataset contains small graphs with less than 100 nodes |
882 and an average degree of less than 3. The protein dataset contains |
870 and an average degree of less than 3. The protein dataset contains |
883 graphs having 500-10 000 nodes and an average degree of 4, while the |
871 graphs having 500-10 000 nodes and an average degree of 4, while the |
884 contact map dataset contains graphs with 150-800 nodes and an average |
872 contact map dataset contains graphs with 150-800 nodes and an average |
885 degree of 20. \\ |
873 degree of 20. \\ |
886 |
874 |
887 In the following, both the induced subgraph isomorphism and the graph |
875 In the following, both the induced subgraph and the graph |
888 isomorphism will be examined. |
876 isomorphism problems will be examined. |
889 |
|
890 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}. |
877 This dataset provides graph pairs, between which all the induced subgraph isomorphisms have to be found. For runtime results, please see Figure~\ref{fig:bioIND}. |
891 |
878 |
892 In an other experiment, the nodes of each graph in the database had been |
879 In an other experiment, the nodes of each graph in the database had been |
893 shuffled, and an isomorphism between the shuffled and the original |
880 shuffled, and an isomorphism between the shuffled and the original |
894 graph was searched. The solution times are shown on Figure~\ref{fig:bioISO}. |
881 graph was searched. The running times are shown on Figure~\ref{fig:bioISO}. |
895 |
882 |
896 |
883 |
897 \begin{figure}[H] |
884 \begin{figure}[H] |
898 \vspace*{-2cm} |
885 \vspace*{-2cm} |
899 \hspace*{-1.5cm} |
886 \hspace*{-1.5cm} |
900 \begin{subfigure}[b]{0.55\textwidth} |
887 \begin{subfigure}[b]{0.55\textwidth} |
901 \begin{figure}[H] |
888 \begin{figure}[H] |
902 \begin{tikzpicture}[trim axis left, trim axis right] |
889 \begin{tikzpicture}[trim axis left, trim axis right] |
903 \begin{axis}[title=Molecules IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
890 \begin{axis}[title=Molecules IND,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
904 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
891 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
905 west},scaled x ticks = false,x tick label style={/pgf/number |
892 west},scaled x ticks = false,x tick label style={/pgf/number |
906 format/1000 sep = \thinspace}] |
893 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
894 format/1000 sep = \kern 0.08em}] |
907 %\addplot+[only marks] table {proteinsOrig.txt}; |
895 %\addplot+[only marks] table {proteinsOrig.txt}; |
908 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark |
896 \addplot table {Orig/Molecules.32.txt}; \addplot[mark=triangle*,mark |
909 size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt}; |
897 size=1.8pt,color=red] table {VF2PPLabel/Molecules.32.txt}; |
910 \end{axis} |
898 \end{axis} |
911 \end{tikzpicture} |
899 \end{tikzpicture} |
916 \end{subfigure} |
904 \end{subfigure} |
917 \hspace*{1.5cm} |
905 \hspace*{1.5cm} |
918 \begin{subfigure}[b]{0.55\textwidth} |
906 \begin{subfigure}[b]{0.55\textwidth} |
919 \begin{figure}[H] |
907 \begin{figure}[H] |
920 \begin{tikzpicture}[trim axis left, trim axis right] |
908 \begin{tikzpicture}[trim axis left, trim axis right] |
921 \begin{axis}[title=Contact maps IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
909 \begin{axis}[title=Contact maps IND,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
922 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
910 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
923 west},scaled x ticks = false,x tick label style={/pgf/number |
911 west},scaled x ticks = false,x tick label style={/pgf/number |
924 format/1000 sep = \thinspace}] |
912 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
913 format/1000 sep = \kern 0.08em}] |
925 %\addplot+[only marks] table {proteinsOrig.txt}; |
914 %\addplot+[only marks] table {proteinsOrig.txt}; |
926 \addplot table {Orig/ContactMaps.128.txt}; |
915 \addplot table {Orig/ContactMaps.128.txt}; |
927 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
916 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
928 {VF2PPLabel/ContactMaps.128.txt}; |
917 {VF2PPLabel/ContactMaps.128.txt}; |
929 \end{axis} |
918 \end{axis} |
930 \end{tikzpicture} |
919 \end{tikzpicture} |
931 \caption{On contact maps, VF2++ runs almost in constant time, while VF2 |
920 \caption{On contact maps, VF2++ runs almost in constant time, while VF2 |
932 Plus has a near linear behaviour.} \label{fig:INDContact} |
921 Plus has a near-linear behaviour.} \label{fig:INDContact} |
933 \end{figure} |
922 \end{figure} |
934 \end{subfigure} |
923 \end{subfigure} |
935 |
924 |
936 \begin{center} |
925 \begin{center} |
937 \vspace*{-0.5cm} |
926 \vspace*{-0.5cm} |
938 \begin{subfigure}[b]{0.55\textwidth} |
927 \begin{subfigure}[b]{0.55\textwidth} |
939 \begin{figure}[H] |
928 \begin{figure}[H] |
940 \begin{tikzpicture}[trim axis left, trim axis right] |
929 \begin{tikzpicture}[trim axis left, trim axis right] |
941 \begin{axis}[title=Proteins IND,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
930 \begin{axis}[title=Proteins IND,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
942 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
931 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
943 west},scaled x ticks = false,x tick label style={/pgf/number |
932 west},scaled x ticks = false,x tick label style={/pgf/number |
944 format/1000 sep = \thinspace}] %\addplot+[only marks] table |
933 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
934 format/1000 sep = \kern 0.08em}] %\addplot+[only marks] table |
945 {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue] |
935 {proteinsOrig.txt}; \addplot[mark=*,mark size=1.2pt,color=blue] |
946 table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark |
936 table {Orig/Proteins.256.txt}; \addplot[mark=triangle*,mark |
947 size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt}; |
937 size=1.8pt,color=red] table {VF2PPLabel/Proteins.256.txt}; |
948 \end{axis} |
938 \end{axis} |
949 \end{tikzpicture} |
939 \end{tikzpicture} |
950 \caption{Both the algorithms have linear behaviour on protein |
940 \caption{Both of the algorithms have linear behaviour on protein |
951 graphs. VF2++ is more than 10 times faster than VF2 |
941 graphs. VF2++ is more than 10 times faster than VF2 |
952 Plus.} \label{fig:INDProt} |
942 Plus.} \label{fig:INDProt} |
953 \end{figure} |
943 \end{figure} |
954 \end{subfigure} |
944 \end{subfigure} |
955 \end{center} |
945 \end{center} |
956 \vspace*{-0.5cm} |
946 \vspace*{-0.5cm} |
957 \caption{\normalsize{Induced subgraph isomorphism on biological graphs}}\label{fig:bioIND} |
947 \caption{\normalsize{Induced subgraph isomorphism problem on biological graphs}}\label{fig:bioIND} |
958 \end{figure} |
948 \end{figure} |
959 |
949 |
960 |
950 |
961 \begin{figure}[H] |
951 \begin{figure}[H] |
962 \vspace*{-2cm} |
952 \vspace*{-2cm} |
963 \hspace*{-1.5cm} |
953 \hspace*{-1.5cm} |
964 \begin{subfigure}[b]{0.55\textwidth} |
954 \begin{subfigure}[b]{0.55\textwidth} |
965 \begin{figure}[H] |
955 \begin{figure}[H] |
966 \begin{tikzpicture}[trim axis left, trim axis right] |
956 \begin{tikzpicture}[trim axis left, trim axis right] |
967 \begin{axis}[title=Molecules ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
957 \begin{axis}[title=Molecules ISO,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
968 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
958 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
969 west},scaled x ticks = false,x tick label style={/pgf/number |
959 west},scaled x ticks = false,x tick label style={/pgf/number |
970 format/1000 sep = \thinspace}] |
960 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
961 format/1000 sep = \kern 0.08em}] |
971 %\addplot+[only marks] table {proteinsOrig.txt}; |
962 %\addplot+[only marks] table {proteinsOrig.txt}; |
972 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark |
963 \addplot table {Orig/moleculesIso.txt}; \addplot[mark=triangle*,mark |
973 size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt}; |
964 size=1.8pt,color=red] table {VF2PPLabel/moleculesIso.txt}; |
974 \end{axis} |
965 \end{axis} |
975 \end{tikzpicture} |
966 \end{tikzpicture} |
976 \caption{In the case of molecules, there is not such a significant |
967 \caption{The results are close to each other on contact maps, but VF2++ seems to be slightly faster as the number of nodes increases. |
977 difference, but VF2++ seems to be faster as the number of nodes |
968 }\label{fig:ISOMolecule} |
978 increases.}\label{fig:ISOMolecule} |
|
979 \end{figure} |
969 \end{figure} |
980 \end{subfigure} |
970 \end{subfigure} |
981 \hspace*{1.5cm} |
971 \hspace*{1.5cm} |
982 \begin{subfigure}[b]{0.55\textwidth} |
972 \begin{subfigure}[b]{0.55\textwidth} |
983 \begin{figure}[H] |
973 \begin{figure}[H] |
984 \begin{tikzpicture}[trim axis left, trim axis right] |
974 \begin{tikzpicture}[trim axis left, trim axis right] |
985 \begin{axis}[title=Contact maps ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
975 \begin{axis}[title=Contact maps ISO,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
986 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
976 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
987 west},scaled x ticks = false,x tick label style={/pgf/number |
977 west},scaled x ticks = false,x tick label style={/pgf/number |
988 format/1000 sep = \thinspace}] |
978 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
979 format/1000 sep = \kern 0.08em}] |
989 %\addplot+[only marks] table {proteinsOrig.txt}; |
980 %\addplot+[only marks] table {proteinsOrig.txt}; |
990 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark |
981 \addplot table {Orig/contactMapsIso.txt}; \addplot[mark=triangle*,mark |
991 size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt}; |
982 size=1.8pt,color=red] table {VF2PPLabel/contactMapsIso.txt}; |
992 \end{axis} |
983 \end{axis} |
993 \end{tikzpicture} |
984 \end{tikzpicture} |
994 \caption{The results are closer to each other on contact maps, but |
985 \caption{In the case of molecules, there is no significant |
995 VF2++ still performs consistently better.}\label{fig:ISOContact} |
986 difference, but VF2++ performs consistently better.}\label{fig:ISOContact} |
996 \end{figure} |
987 \end{figure} |
997 \end{subfigure} |
988 \end{subfigure} |
998 |
989 |
999 \begin{center} |
990 \begin{center} |
1000 \vspace*{-0.5cm} |
991 \vspace*{-0.5cm} |
1001 \begin{subfigure}[b]{0.55\textwidth} |
992 \begin{subfigure}[b]{0.55\textwidth} |
1002 \begin{figure}[H] |
993 \begin{figure}[H] |
1003 \begin{tikzpicture}[trim axis left, trim axis right] |
994 \begin{tikzpicture}[trim axis left, trim axis right] |
1004 \begin{axis}[title=Proteins ISO,xlabel={target size},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
995 \begin{axis}[title=Proteins ISO,xlabel={$|V_2|$},ylabel={time (ms)},legend entries={VF2 Plus,VF2++},grid |
1005 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
996 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1006 west},scaled x ticks = false,x tick label style={/pgf/number |
997 west},scaled x ticks = false,x tick label style={/pgf/number |
1007 format/1000 sep = \thinspace}] |
998 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
999 format/1000 sep = \kern 0.08em}] |
1008 %\addplot+[only marks] table {proteinsOrig.txt}; |
1000 %\addplot+[only marks] table {proteinsOrig.txt}; |
1009 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark |
1001 \addplot table {Orig/proteinsIso.txt}; \addplot[mark=triangle*,mark |
1010 size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt}; |
1002 size=1.8pt,color=red] table {VF2PPLabel/proteinsIso.txt}; |
1011 \end{axis} |
1003 \end{axis} |
1012 \end{tikzpicture} |
1004 \end{tikzpicture} |
1013 \caption{On protein graphs, VF2 Plus has a super linear time |
1005 \caption{On protein graphs, VF2~Plus has a super linear time |
1014 complexity, while VF2++ runs in near constant time. The difference |
1006 complexity, while VF2++ runs in near constant time. The difference |
1015 is about two order of magnitude on large graphs.}\label{fig:ISOProt} |
1007 is about two orders of magnitude on large graphs.}\label{fig:ISOProt} |
1016 \end{figure} |
1008 \end{figure} |
1017 \end{subfigure} |
1009 \end{subfigure} |
1018 \end{center} |
1010 \end{center} |
1019 \vspace*{-0.6cm} |
1011 \vspace*{-0.6cm} |
1020 \caption{\normalsize{Graph isomorphism on biological graphs}}\label{fig:bioISO} |
1012 \caption{\normalsize{Graph isomorphism problem on biological graphs}}\label{fig:bioISO} |
1021 \end{figure} |
1013 \end{figure} |
1022 |
1014 |
1023 |
1015 |
1024 |
1016 |
1025 |
1017 |
1026 \subsection{Random graphs} |
1018 \subsection{Random graphs} |
1027 This section compares VF2++ with VF2 Plus on random graphs of a large |
1019 This section compares VF2++ with VF2~Plus on random graphs of large |
1028 size. The node labels are uniformly distributed. Let $\delta$ denote |
1020 size. The node labels are uniformly distributed. Let $\delta$ denote |
1029 the average degree. For the parameters of problems solved in the |
1021 the average degree. For the parameters of problems solved in the |
1030 experiments, please see the top of each chart. |
1022 experiments, please see the top of each chart. |
1031 \subsubsection{Graph isomorphism} |
1023 \subsubsection{Graph isomorphism problem} |
1032 To evaluate the efficiency of the algorithms in the case of graph |
1024 To evaluate the efficiency of the algorithms in the case of graph |
1033 isomorphism, random connected graphs of less than 20 000 nodes have been |
1025 isomorphism problem, random connected graphs of less than 20 000 nodes have been |
1034 considered. Generating a random graph and shuffling its nodes, an |
1026 considered. Generating a random graph and shuffling its nodes, an |
1035 isomorphism had to be found. Figure \ref{fig:randISO} shows the runtime results |
1027 isomorphism had to be found. Figure~\ref{fig:randISO} shows the runtime results |
1036 on graph sets of various density. |
1028 on graph sets of various density. |
1037 |
1029 |
1038 |
1030 |
1039 |
1031 |
1040 |
1032 |
1090 \end{center} |
1085 \end{center} |
1091 \end{subfigure} |
1086 \end{subfigure} |
1092 \begin{subfigure}[b]{0.55\textwidth} |
1087 \begin{subfigure}[b]{0.55\textwidth} |
1093 \begin{center} |
1088 \begin{center} |
1094 \begin{tikzpicture} |
1089 \begin{tikzpicture} |
1095 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1090 \begin{axis}[title={Random ISO, $\delta = 100$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1096 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1091 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1097 west},scaled x ticks = false,x tick label style={/pgf/number |
1092 west},scaled x ticks = false,x tick label style={/pgf/number |
1098 format/1000 sep = \thinspace}] |
1093 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
1094 format/1000 sep = \kern 0.08em}] |
1099 %\addplot+[only marks] table {proteinsOrig.txt}; |
1095 %\addplot+[only marks] table {proteinsOrig.txt}; |
1100 \addplot table {randGraph/iso/vf2pIso100_1.txt}; |
1096 \addplot table {randGraph/iso/vf2pIso100_1.txt}; |
1101 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1097 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1102 {randGraph/iso/vf2ppIso100_1.txt}; |
1098 {randGraph/iso/vf2ppIso100_1.txt}; |
1103 \end{axis} |
1099 \end{axis} |
1104 \end{tikzpicture} |
1100 \end{tikzpicture} |
1105 \end{center} |
1101 \end{center} |
1106 \end{subfigure} |
1102 \end{subfigure} |
1107 \vspace*{-0.8cm} |
1103 \vspace*{-0.8cm} |
1108 \caption{ISO on random graphs. |
1104 \caption{Graph isomorphism problem on random graphs |
1109 }\label{fig:randISO} |
1105 }\label{fig:randISO} |
1110 \end{figure} |
1106 \end{figure} |
1111 |
1107 |
1112 |
1108 |
1113 \subsubsection{Induced subgraph isomorphism} |
1109 \subsubsection{Induced subgraph isomorphism problem} |
1114 This section presents a comparison of VF2++ and VF2 Plus in the case |
1110 This section presents a comparison of VF2++ and VF2~Plus in the case |
1115 of induced subgraph isomorphism. In addition to the size of the large |
1111 of induced subgraph isomorphism problem. In addition to the size of graph $G_2$, that of $G_1$ dramatically influences the hardness of |
1116 graph, that of the small graph dramatically influences the hardness of |
1112 a given problem too, so the overall picture is provided by examining graphs to be embedded of various size. |
1117 a given problem too, so the overall picture is provided by examining |
|
1118 small graphs of various size. |
|
1119 |
1113 |
1120 For each chart, a number $0<\rho< 1$ has been fixed, and the following |
1114 For each chart, a number $0<\rho< 1$ has been fixed, and the following |
1121 has been executed 150 times. Generating a large graph $G_{2}$ of an average degree of $\delta$, |
1115 has been executed 150 times. Generating a large graph $G_{2}$ of an average degree of $\delta$, |
1122 choose 10 of its induced subgraphs having $\rho\ |V_{2}|$ nodes, |
1116 choose 10 of its induced subgraphs having $\rho|V_{2}|$ nodes, |
1123 and for all the 10 subgraphs find a mapping by using both the graph |
1117 and for all the 10 subgraphs find a mapping by using both graph |
1124 matching algorithms. The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1, |
1118 matching algorithms. The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1, |
1125 0.3, 0.8$ cases have been examined, see |
1119 0.3, 0.8$ cases have been examined, see |
1126 Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and |
1120 Figure~\ref{fig:randIND5},~\ref{fig:randIND10}~and~\ref{fig:randIND35}. |
1127 \ref{fig:randIND35}. |
|
1128 |
1121 |
1129 |
1122 |
1130 |
1123 |
1131 |
1124 |
1132 |
1125 |
1180 \end{center} |
1176 \end{center} |
1181 \end{subfigure} |
1177 \end{subfigure} |
1182 \begin{subfigure}[b]{0.55\textwidth} |
1178 \begin{subfigure}[b]{0.55\textwidth} |
1183 \begin{center} |
1179 \begin{center} |
1184 \begin{tikzpicture} |
1180 \begin{tikzpicture} |
1185 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1181 \begin{axis}[title={Random IND, $\delta = 5$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1186 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1182 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1187 west},scaled x ticks = false,x tick label style={/pgf/number |
1183 west},scaled x ticks = false,x tick label style={/pgf/number |
1188 format/1000 sep = \space}] |
1184 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
1185 format/1000 sep = \kern 0.08em}] |
1189 %\addplot+[only marks] table {proteinsOrig.txt}; |
1186 %\addplot+[only marks] table {proteinsOrig.txt}; |
1190 \addplot table {randGraph/ind/vf2pInd5_0.8.txt}; |
1187 \addplot table {randGraph/ind/vf2pInd5_0.8.txt}; |
1191 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1188 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1192 {randGraph/ind/vf2ppInd5_0.8.txt}; |
1189 {randGraph/ind/vf2ppInd5_0.8.txt}; |
1193 \end{axis} |
1190 \end{axis} |
1194 \end{tikzpicture} |
1191 \end{tikzpicture} |
1195 \end{center} |
1192 \end{center} |
1196 \end{subfigure} |
1193 \end{subfigure} |
1197 \vspace*{-0.8cm} |
1194 \vspace*{-0.8cm} |
1198 \caption{IND on graphs having an average degree of |
1195 \caption{Induced subgraph isomorphism problem on random graphs having an average degree of |
1199 5.}\label{fig:randIND5} |
1196 5}\label{fig:randIND5} |
1200 \end{figure} |
1197 \end{figure} |
1201 |
1198 |
1202 |
1199 |
1203 \begin{figure} |
1200 \begin{figure} |
1204 \vspace*{-1.5cm} |
1201 \vspace*{-1.5cm} |
1205 \hspace*{-1.5cm} |
1202 \hspace*{-1.5cm} |
1206 \begin{subfigure}[b]{0.55\textwidth} |
1203 \begin{subfigure}[b]{0.55\textwidth} |
1207 \begin{center} |
1204 \begin{center} |
1208 \hspace*{-0.5cm} |
1205 \hspace*{-0.5cm} |
1209 \begin{tikzpicture} |
1206 \begin{tikzpicture} |
1210 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1207 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1211 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1208 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1212 west},scaled x ticks = false,x tick label style={/pgf/number |
1209 west},scaled x ticks = false,x tick label style={/pgf/number |
1213 format/1000 sep = \space}] |
1210 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
1211 format/1000 sep = \kern 0.08em}] |
1214 %\addplot+[only marks] table {proteinsOrig.txt}; |
1212 %\addplot+[only marks] table {proteinsOrig.txt}; |
1215 \addplot table {randGraph/ind/vf2pInd10_0.05.txt}; |
1213 \addplot table {randGraph/ind/vf2pInd10_0.05.txt}; |
1216 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1214 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1217 {randGraph/ind/vf2ppInd10_0.05.txt}; |
1215 {randGraph/ind/vf2ppInd10_0.05.txt}; |
1218 \end{axis} |
1216 \end{axis} |
1252 \end{center} |
1252 \end{center} |
1253 \end{subfigure} |
1253 \end{subfigure} |
1254 \begin{subfigure}[b]{0.55\textwidth} |
1254 \begin{subfigure}[b]{0.55\textwidth} |
1255 \begin{center} |
1255 \begin{center} |
1256 \begin{tikzpicture} |
1256 \begin{tikzpicture} |
1257 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1257 \begin{axis}[title={Random IND, $\delta = 10$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1258 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1258 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1259 west},scaled x ticks = false,x tick label style={/pgf/number |
1259 west},scaled x ticks = false,x tick label style={/pgf/number |
1260 format/1000 sep = \space}] |
1260 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
1261 format/1000 sep = \kern 0.08em}] |
1261 %\addplot+[only marks] table {proteinsOrig.txt}; |
1262 %\addplot+[only marks] table {proteinsOrig.txt}; |
1262 \addplot table {randGraph/ind/vf2pInd10_0.8.txt}; |
1263 \addplot table {randGraph/ind/vf2pInd10_0.8.txt}; |
1263 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1264 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1264 {randGraph/ind/vf2ppInd10_0.8.txt}; |
1265 {randGraph/ind/vf2ppInd10_0.8.txt}; |
1265 \end{axis} |
1266 \end{axis} |
1266 \end{tikzpicture} |
1267 \end{tikzpicture} |
1267 \end{center} |
1268 \end{center} |
1268 \end{subfigure} |
1269 \end{subfigure} |
1269 \vspace*{-0.8cm} |
1270 \vspace*{-0.8cm} |
1270 \caption{IND on graphs having an average degree of |
1271 \caption{Induced subgraph isomorphism problem on random graphs having an average degree of |
1271 10.}\label{fig:randIND10} |
1272 10}\label{fig:randIND10} |
1272 \end{figure} |
1273 \end{figure} |
1273 |
1274 |
1274 |
1275 |
1275 |
1276 |
1276 \begin{figure} |
1277 \begin{figure} |
1277 \vspace*{-1.5cm} |
1278 \vspace*{-1.5cm} |
1278 \hspace*{-1.5cm} |
1279 \hspace*{-1.5cm} |
1279 \begin{subfigure}[b]{0.55\textwidth} |
1280 \begin{subfigure}[b]{0.55\textwidth} |
1280 \begin{center} |
1281 \begin{center} |
1281 \begin{tikzpicture} |
1282 \begin{tikzpicture} |
1282 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1283 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.05$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1283 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1284 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1284 west},scaled x ticks = false,x tick label style={/pgf/number |
1285 west},scaled x ticks = false,x tick label style={/pgf/number |
1285 format/1000 sep = \space}] |
1286 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
1287 format/1000 sep = \kern 0.08em}] |
1286 %\addplot+[only marks] table {proteinsOrig.txt}; |
1288 %\addplot+[only marks] table {proteinsOrig.txt}; |
1287 \addplot table {randGraph/ind/vf2pInd35_0.05.txt}; |
1289 \addplot table {randGraph/ind/vf2pInd35_0.05.txt}; |
1288 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1290 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1289 {randGraph/ind/vf2ppInd35_0.05.txt}; |
1291 {randGraph/ind/vf2ppInd35_0.05.txt}; |
1290 \end{axis} |
1292 \end{axis} |
1323 \end{center} |
1327 \end{center} |
1324 \end{subfigure} |
1328 \end{subfigure} |
1325 \begin{subfigure}[b]{0.55\textwidth} |
1329 \begin{subfigure}[b]{0.55\textwidth} |
1326 \begin{center} |
1330 \begin{center} |
1327 \begin{tikzpicture} |
1331 \begin{tikzpicture} |
1328 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={target size},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1332 \begin{axis}[title={Random IND, $\delta = 35$, $\rho = 0.8$},width=7.2cm,height=6cm,xlabel={$|V_2|$},ylabel={time (ms)},ylabel near ticks,legend entries={VF2 Plus,VF2++},grid |
1329 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1333 =major,mark size=1.2pt, legend style={at={(0,1)},anchor=north |
1330 west},scaled x ticks = false,x tick label style={/pgf/number |
1334 west},scaled x ticks = false,x tick label style={/pgf/number |
1331 format/1000 sep = \space}] |
1335 format/1000 sep = \kern 0.08em},y tick label style={/pgf/number |
|
1336 format/1000 sep = \kern 0.08em}] |
1332 %\addplot+[only marks] table {proteinsOrig.txt}; |
1337 %\addplot+[only marks] table {proteinsOrig.txt}; |
1333 \addplot table {randGraph/ind/vf2pInd35_0.8.txt}; |
1338 \addplot table {randGraph/ind/vf2pInd35_0.8.txt}; |
1334 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1339 \addplot[mark=triangle*,mark size=1.8pt,color=red] table |
1335 {randGraph/ind/vf2ppInd35_0.8.txt}; |
1340 {randGraph/ind/vf2ppInd35_0.8.txt}; |
1336 \end{axis} |
1341 \end{axis} |
1337 \end{tikzpicture} |
1342 \end{tikzpicture} |
1338 \end{center} |
1343 \end{center} |
1339 \end{subfigure} |
1344 \end{subfigure} |
1340 \vspace*{-0.8cm} |
1345 \vspace*{-0.8cm} |
1341 \caption{IND on graphs having an average degree of |
1346 \caption{Induced subgraph isomorphism problem on random graphs having an average degree of |
1342 35.}\label{fig:randIND35} |
1347 35}\label{fig:randIND35} |
1343 \end{figure} |
1348 \end{figure} |
1344 |
1349 |
1345 |
1350 |
1346 Based on these experiments, VF2++ is faster than VF2 Plus and able to |
1351 Based on these experiments, VF2++ is faster than VF2~Plus and able to |
1347 handle really large graphs in milliseconds. Note that when $IND$ was |
1352 handle really large graphs in milliseconds. Note that when $IND$ was |
1348 considered and the small graphs had proportionally few nodes ($\rho = |
1353 considered and the graph to be embedded had proportionally few nodes ($\rho = |
1349 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node |
1354 0.05$, or $\rho = 0.1$), then VF2~Plus produced some inefficient node |
1350 orders (e.g. see the $\delta=10$ case on |
1355 orders (e.g. see the $\delta=10$ case on |
1351 Figure~\ref{fig:randIND10}). If these instances had been excluded, the |
1356 Figure~\ref{fig:randIND10}). If these instances had been excluded, the |
1352 charts would have seemed to be similar to the other ones. |
1357 charts would have looked similarly to the other ones. |
1353 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 |
1358 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 |
1354 Plus slow slightly down, but remain practically usable even on graphs |
1359 Plus slow slightly down, but remain practically usable even on graphs |
1355 having 10 000 nodes. |
1360 having 10 000 nodes. |
1356 |
1361 |
1357 |
1362 |
1358 |
1363 |
1359 |
1364 |
1360 |
1365 |
1361 \section{Conclusion} |
1366 \section{Conclusion} |
1362 This paper presented VF2++, a new graph matching algorithm based on VF2, called VF2++, and analyzed it from a practical viewpoint. |
1367 This paper presented VF2++, a new graph matching algorithm based on VF2, and analysed it from a practical viewpoint. |
1363 |
1368 |
1364 Recognizing the importance of the node order and determining an |
1369 Recognizing the importance of the node order and determining an |
1365 efficient one, VF2++ is able to match graphs of thousands of nodes in |
1370 efficient one, VF2++ is able to match graphs of thousands of nodes in |
1366 near practically linear time including preprocessing. In addition to |
1371 near practically linear time including preprocessing. In addition to |
1367 the proper order, VF2++ uses more efficient consistency and cutting |
1372 the proper order, VF2++ uses more efficient cutting |
1368 rules which are easy to compute and make the algorithm able to prune |
1373 rules which are easy to compute and make the algorithm able to prune |
1369 most of the unfruitful branches without going astray. |
1374 most of the unfruitful branches without going astray. |
1370 |
1375 |
1371 In order to show the efficiency of the new method, it has been |
1376 In order to show the efficiency of the new method, it has been |
1372 compared to VF2 Plus\cite{VF2Plus}, which is the best contemporary algorithm. |
1377 compared to VF2~Plus~\cite{VF2Plus}, which is the best contemporary algorithm. |
1373 . |
1378 |
1374 |
1379 The experiments show that VF2++ consistently outperforms VF2~Plus on |
1375 The experiments show that VF2++ consistently outperforms VF2 Plus on |
|
1376 biological graphs. It seems to be asymptotically faster on protein and |
1380 biological graphs. It seems to be asymptotically faster on protein and |
1377 on contact map graphs in the case of induced subgraph isomorphism, |
1381 on contact map graphs in the case of induced subgraph isomorphism problem, |
1378 while in the case of graph isomorphism, it has definitely better |
1382 while in the case of graph isomorphism problem, it has definitely better |
1379 asymptotic behaviour on protein graphs. |
1383 asymptotic behaviour on protein graphs. |
1380 |
1384 |
1381 Regarding random sparse graphs, not only has VF2++ proved itself to be |
1385 Regarding random sparse graphs, not only has VF2++ proved itself to be |
1382 faster than VF2 Plus, but it also has a practically linear behaviour both |
1386 faster than VF2~Plus, but it also has a practically linear behaviour both |
1383 in the case of induced subgraph- and graph isomorphism. |
1387 in the case of induced subgraph and graph isomorphism problems. |
1384 |
1388 |
1385 %%%%%%%%%%%%%%%% |
1389 %%%%%%%%%%%%%%%% |
1386 \section*{Acknowledgement} \label{sec:ack} |
1390 \section*{Acknowledgement} \label{sec:ack} |
1387 %%%%%%%%%%%%%%%% |
1391 %%%%%%%%%%%%%%%% |
1388 This research project was initiated and sponsored by QuantumBio |
1392 This research project was initiated and sponsored by QuantumBio |
1389 Inc.\cite{QUANTUMBIO}. |
1393 Inc.~\cite{QUANTUMBIO}. |
1390 |
1394 |
1391 The authors were supported by the Hungarian Scientific Research Fund - |
1395 The authors were supported by the Hungarian Scientific Research Fund - |
1392 OTKA, K109240 and by the J\'anos Bolyai Research Fellowship program of |
1396 OTKA, K109240 and by the J\'anos Bolyai Research Fellowship program of |
1393 the Hungarian Academy of Sciences. |
1397 the Hungarian Academy of Sciences. |
1394 |
1398 |