# HG changeset patch # User Alpar Juttner # Date 1479820309 -3600 # Node ID c682d3237141451c6656e28cb73ed61a75871f83 # Parent 550f81b2f81ce77601f353c6fab8327c253cc516 Reformatting II diff -r 550f81b2f81c -r c682d3237141 damecco.tex --- a/damecco.tex Tue Nov 22 13:58:09 2016 +0100 +++ b/damecco.tex Tue Nov 22 14:11:49 2016 +0100 @@ -229,23 +229,23 @@ does it cover all the recent algorithms. The first practically usable approach was due to -\textbf{Ullmann}\cite{Ullmann} which is a commonly used depth-first +Ullmann\cite{Ullmann} which is a commonly used depth-first search based algorithm with a complex heuristic for reducing the number of visited states. A major problem is its $\Theta(n^3)$ space complexity, which makes it impractical in the case of big sparse graphs. -In a recent paper, \textbf{Ullmann}\cite{UllmannBit} presents an +In a recent paper, Ullmann\cite{UllmannBit} presents an improved version of this algorithm based on a bit-vector solution for the binary Constraint Satisfaction Problem. -The \textbf{Nauty} algorithm\cite{Nauty} transforms the two graphs to +The Nauty algorithm\cite{Nauty} transforms the two graphs to a canonical form before starting to check for the isomorphism. It has been considered as one of the fastest graph isomorphism algorithms, although graph categories were shown in which it takes exponentially many steps. This algorithm handles only the graph isomorphism problem. -The \textbf{LAD} algorithm\cite{Lad} uses a depth-first search +The \emph{LAD} algorithm\cite{Lad} uses a depth-first search strategy and formulates the matching as a Constraint Satisfaction Problem to prune the search tree. The constraints are that the mapping has to be injective and edge-preserving, hence it is possible to @@ -382,8 +382,8 @@ \section{The VF2 Algorithm} This algorithm is the basis of both the VF2++ and the VF2 Plus. VF2 -is able to handle all the variations mentioned in \textbf{Section - \ref{sec:CommProb})}. Although it can also handle directed graphs, +is able to handle all the variations mentioned in Section + \ref{sec:CommProb}. Although it can also handle directed graphs, for the sake of simplicity, only the undirected case will be discussed. @@ -444,10 +444,10 @@ For example the problem type of graph isomorphism problem is the following. A whole mapping $W$ is in $\mathbf{ISO}$, iff the -bijection represented by $W$ satisfies \textbf{Definition - \ref{sec:ismorphic})}. The subgraph- and induced subgraph matching -problems can be formalized in a similar way. Let their problem types -be denoted as $\mathbf{SUB}$ and $\mathbf{IND}$. +bijection represented by $W$ satisfies Definition~\ref{sec:ismorphic}. +The subgraph- and induced subgraph matching problems can be formalized +in a similar way. Let their problem types be denoted as $\mathbf{SUB}$ +and $\mathbf{IND}$. \begin{definition} \label{expPT} @@ -501,7 +501,7 @@ pruning the search tree. Each state $s$ of the matching process can be associated with a mapping $M(s)$. -\textbf{Algorithm~\ref{alg:VF2Pseu})} is a high level description of +Algorithm~\ref{alg:VF2Pseu} is a high level description of the VF2 matching algorithm. @@ -571,7 +571,7 @@ \label{candidateComputingVF2} $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$. Suppose that $PT$ is an expanding problem type, see -\textbf{Definition~\ref{expPT})}. +Definition~\ref{expPT}. \begin{notation} Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by @@ -598,7 +598,7 @@ \subsection{Consistency} This section defines the consistency functions for the different -problem types mentioned in \textbf{Section \ref{sec:CommProb})}. +problem types mentioned in Section~\ref{sec:CommProb}. \begin{notation} Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} : (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large} @@ -713,11 +713,11 @@ (\from) -- (\to); % \draw[dashed] (\from) -- (\to); \end{tikzpicture} -\caption{Graphs for the proof of \textbf{Claim - \ref{claimSUB}}} \label{fig:proofSUB} +\caption{Graphs for the proof of Claim~\ref{claimSUB}} +\label{fig:proofSUB} \end{center} \end{figure} -Let the two graphs of \textbf{Figure \ref{fig:proofSUB})} be the input +Let the two graphs of Figure~\ref{fig:proofSUB} be the input graphs. Suppose the total ordering relation is $u_1 \prec u_2 \prec u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add $(u_2,v_2)\in P(s)$.\newline $Cons_{SUB}((u_2,v_2),M(s))=true$, so @@ -954,7 +954,7 @@ \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with root node $r$. \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the $d$-th level \State Process $V_d$ \Comment{See Algorithm - \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2})} \EndFor + \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2}} \EndFor \EndWhile \EndProcedure \end{algorithmic} \end{algorithm} @@ -985,25 +985,24 @@ Refresh $F_\mathcal{M}$ \EndProcedure \end{algorithmic} \end{algorithm} -\textbf{Algorithm~\ref{alg:VF2PPPseu})} is a high level description of -the matching order procedure of VF2++. It computes a BFS tree for each +Algorithm~\ref{alg:VF2PPPseu} is a high level description of the +matching order procedure of VF2++. It computes a BFS tree for each component in ascending order of their rarest $lab$ and largest $deg$, -whose root vertex is the component's minimal node. \textbf{Algorithm - \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} are -two different methods to process a level of the BFS tree. +whose root vertex is the component's minimal +node. Algorithms~\ref{alg:VF2PPProcess1} and \ref{alg:VF2PPProcess2} +are two different methods to process a level of the BFS tree. After sorting the nodes of the current level in descending lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$, -\textbf{Algorithm \ref{alg:VF2PPProcess2})} appends them -simultaneously to the matching order $\mathcal{M}$ and refreshes -$F_\mathcal{M}$ just once, whereas \textbf{Algorithm - \ref{alg:VF2PPProcess1})} appends the nodes separately to -$\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it +Algorithm~\ref{alg:VF2PPProcess2} appends them simultaneously to the +matching order $\mathcal{M}$ and refreshes $F_\mathcal{M}$ just once, +whereas Algorithm~\ref{alg:VF2PPProcess1} appends the nodes separately +to $\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it provides up-to-date label information and may result in a more efficient matching order. -\textbf{Claim~\ref{claim:MOclaim})} shows that \textbf{Algorithm - \ref{alg:VF2PPPseu})} provides a matching order. +Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu} +provides a matching order. \subsection{Cutting rules} @@ -1079,20 +1078,20 @@ means "no node". \subsubsection{Avoiding the recurrence} Exploring the state space was described in a recursive fashion using -sets (see \textbf{Algorithm~\ref{alg:VF2Pseu})}), which makes the +sets (see Algorithm~\ref{alg:VF2Pseu}), which makes the algorithm easy to understand but does not show directly an efficient way of implementation. The following approach avoiding recursion and using lookup tables instead of sets is not only fast but has linear space complexity as well. -The recursion of \textbf{Algorithm~\ref{alg:VF2Pseu})} can be realized +The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized as a while loop, which has a loop counter $depth$ denoting the all-time depth of the recursion. Fixing a matching order, let $M$ denote the array storing the all-time mapping. The initial state is associated with the empty mapping, which means that $\forall i: M[i]=INVALID$ and $depth=0$. In case of a recursive call, $depth$ has to be incremented, while in case of a return, it has to be -decremented. Based on \textbf{Claim~\ref{claim:claimCoverFromLeft})}, +decremented. Based on Claim~\ref{claim:claimCoverFromLeft}, $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before $depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and $\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes @@ -1102,12 +1101,12 @@ The necessary part of the candidate set is easily maintainable or computable by following -\textbf{Section~\ref{candidateComputingVF2})}. A much faster method +Section~\ref{candidateComputingVF2}. A much faster method has been designed for biological- and sparse graphs, see the next section for details. \subsubsection{Calculating the candidates for a node} -Being aware of \textbf{Claim~\ref{claim:claimCoverFromLeft})}, the +Being aware of Claim~\ref{claim:claimCoverFromLeft}, the task is not to maintain the candidate set, but to generate the candidate nodes in $G_{large}$ for a given node $u\in V_{small}$. In case of an expanding problem type and $M$ mapping, if a node $v\in @@ -1121,7 +1120,7 @@ describable if there exists a covered node in the component containing $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily --- such a node exists based on -\textbf{Claim~\ref{claim:MOclaim})}. With all the candidates of $u$ +Claim~\ref{claim:MOclaim}. With all the candidates of $u$ being among the uncovered neighbours of $Pair(M,u')$, there are solely $deg(Pair(M,u'))$ nodes to check. @@ -1155,30 +1154,30 @@ seen before, a pointer shows where the first node of the FIFO is, and another one shows where the next explored node has to be inserted. So the nodes of each level of the BFS tree can be processed by -\textbf{Algorithm \ref{alg:VF2PPProcess1})} and -\textbf{\ref{alg:VF2PPProcess2})} in place by swapping nodes. +Algorithms~\ref{alg:VF2PPProcess1} and +\ref{alg:VF2PPProcess2} in place by swapping nodes. After a node $u$ gets to the next place of the node order, $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is one less covered node in $V_{large}$ with label $lab(u)$, that is why min selection sort is preferred which gives the elements from left to -right in descending order, see \textbf{Algorithm - \ref{alg:VF2PPProcess1})}. +right in descending order, see Algorithm~\ref{alg:VF2PPProcess1}. Note that using a $\Theta(n^2)$ sort absolutely does not slow down the procedure on biological (and on sparse) graphs, since they have few nodes on a level. If a level had a large number of nodes, -\textbf{Algorithm \ref{alg:VF2PPProcess2})} would seem to be a better +Algorithm~\ref{alg:VF2PPProcess2} would seem to be a better choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not be immediately refreshed, so it is unable to provide up-to-date label information. -Note that the \textit{while loop} of \textbf{Algorithm - \ref{alg:VF2PPPseu})} takes one iteration per graph component and -the graphs in biology are mostly connected. +Note that the \textit{while loop} of Algorithm~\ref{alg:VF2PPPseu} +takes one iteration per graph component and the graphs in biology are +mostly connected. + \subsubsection{Cutting rules} -In \textbf{Section \ref{VF2PPCuttingRules})}, the cutting rules were +In Section~\ref{VF2PPCuttingRules}, the cutting rules were described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$ and $\tilde T_{large}$, which are dependent on the all-time mapping (i.e. on the all-time state). The aim is to check the labeled cutting @@ -1261,7 +1260,7 @@ \end{algorithmic} \end{algorithm} -Using these notations, \textbf{Algorithm~\ref{alg:VF2PlusPseu})} +Using these notations, Algorithm~\ref{alg:VF2PlusPseu} provides the description of the sorting procedure. Note that $P(u)$ is not the exact probability of finding a consistent @@ -1293,8 +1292,8 @@ \subsubsection{Induced subgraph isomorphism} This dataset contains a set of graph pairs, and \textbf{all} the induced subgraph ismorphisms have to be found between -them. \textbf{Figure \ref{fig:INDProt}), \ref{fig:INDContact}),} and -\textbf{ \ref{fig:INDMolecule})} show the solution time of the +them. Figure~\ref{fig:INDProt}), \ref{fig:INDContact}), and +\ref{fig:INDMolecule} show the solution time of the problems in the problem set. \begin{figure}[H] @@ -1358,9 +1357,10 @@ \subsubsection{Graph ismorphism} In this experiment, the nodes of each graph in the database have been shuffled and an isomorphism between the shuffled and the original -graph has been searched. For runtime results, see \textbf{Figure - \ref{fig:ISOProt}), \ref{fig:ISOContact}),} and -\textbf{\ref{fig:ISOMolecule})}. +graph has been searched. For runtime results, see +Figure~\ref{fig:ISOProt}, \ref{fig:ISOContact}, and +\ref{fig:ISOMolecule}. + \begin{figure}[H] \begin{center} \begin{tikzpicture} @@ -1427,10 +1427,10 @@ To evaluate the efficiency of the algorithms in the case of graph isomorphism, connected graphs of less than 20 000 nodes have been considered. Generating a random graph and shuffling its nodes, an -isomorphism had to be found. \textbf{Figure \ref{fig:randISO5}), - \ref{fig:randISO10}), \ref{fig:randISO15}), \ref{fig:randISO35}), - \ref{fig:randISO45}),} and \textbf{\ref{fig:randISO100}) } show the -runtime results on graph sets of various density. +isomorphism had to be found. Figure \ref{fig:randISO5}, +\ref{fig:randISO10}, \ref{fig:randISO15}, \ref{fig:randISO35}, +\ref{fig:randISO45}, and \ref{fig:randISO100} show the runtime results +on graph sets of various density. \begin{figure}[H] \begin{center} @@ -1545,7 +1545,7 @@ outperforms its rival especially on sparse graphs. The reason for the slightly super linear behaviour of VF2++ on denser graphs is the larger number of nodes in the BFS tree constructed in -\textbf{Algorithm \ref{alg:VF2PPPseu})}. +Algorithm~\ref{alg:VF2PPPseu}. \subsubsection{Induced subgraph isomorphism} This section provides a comparison of VF2++ and VF2 Plus in the case @@ -1559,12 +1559,12 @@ choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes, and for all the 10 subgraphs find a mapping by using both the graph matching algorithms. The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1, -0.3, 0.6, 0.8, 0.95$ cases have been examined (see \textbf{Figure - \ref{fig:randIND5}), \ref{fig:randIND10})} and -\textbf{\ref{fig:randIND35})}), and for each $\delta$, a cumulative -chart is given as well, which excludes $\rho = 0.05$ and $0.1$ for the -sake of perspicuity (see \textbf{Figure \ref{fig:randIND5Sum}), - \ref{fig:randIND10Sum})} and \textbf{\ref{fig:randIND35Sum})}). +0.3, 0.6, 0.8, 0.95$ cases have been examined (see +Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and +\ref{fig:randIND35}, and for each $\delta$, a cumulative chart is +given as well, which excludes $\rho = 0.05$ and $0.1$ for the sake of +perspicuity (see Figure~\ref{fig:randIND5Sum}, \ref{fig:randIND10Sum} +and \ref{fig:randIND35Sum}). @@ -1970,8 +1970,8 @@ handle really large graphs in milliseconds. Note that when $IND$ was considered and the small graphs had proportionally few nodes ($\rho = 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node -orders(e.g. see the $\delta=10$ case on \textbf{Figure - \ref{fig:randIND10})}). If these examples had been excluded, the +orders (e.g. see the $\delta=10$ case on +Figure~\ref{fig:randIND10}). If these examples had been excluded, the charts would have seemed to be similar to the other ones. Unsurprisingly, as denser graphs are considered, both VF2++ and VF2 Plus slow slightly down, but remain practically usable even on graphs