1.1 --- a/damecco.tex Tue Nov 22 13:58:09 2016 +0100
1.2 +++ b/damecco.tex Tue Nov 22 14:11:49 2016 +0100
1.3 @@ -229,23 +229,23 @@
1.4 does it cover all the recent algorithms.
1.5
1.6 The first practically usable approach was due to
1.7 -\textbf{Ullmann}\cite{Ullmann} which is a commonly used depth-first
1.8 +Ullmann\cite{Ullmann} which is a commonly used depth-first
1.9 search based algorithm with a complex heuristic for reducing the
1.10 number of visited states. A major problem is its $\Theta(n^3)$ space
1.11 complexity, which makes it impractical in the case of big sparse
1.12 graphs.
1.13
1.14 -In a recent paper, \textbf{Ullmann}\cite{UllmannBit} presents an
1.15 +In a recent paper, Ullmann\cite{UllmannBit} presents an
1.16 improved version of this algorithm based on a bit-vector solution for
1.17 the binary Constraint Satisfaction Problem.
1.18
1.19 -The \textbf{Nauty} algorithm\cite{Nauty} transforms the two graphs to
1.20 +The Nauty algorithm\cite{Nauty} transforms the two graphs to
1.21 a canonical form before starting to check for the isomorphism. It has
1.22 been considered as one of the fastest graph isomorphism algorithms,
1.23 although graph categories were shown in which it takes exponentially
1.24 many steps. This algorithm handles only the graph isomorphism problem.
1.25
1.26 -The \textbf{LAD} algorithm\cite{Lad} uses a depth-first search
1.27 +The \emph{LAD} algorithm\cite{Lad} uses a depth-first search
1.28 strategy and formulates the matching as a Constraint Satisfaction
1.29 Problem to prune the search tree. The constraints are that the mapping
1.30 has to be injective and edge-preserving, hence it is possible to
1.31 @@ -382,8 +382,8 @@
1.32
1.33 \section{The VF2 Algorithm}
1.34 This algorithm is the basis of both the VF2++ and the VF2 Plus. VF2
1.35 -is able to handle all the variations mentioned in \textbf{Section
1.36 - \ref{sec:CommProb})}. Although it can also handle directed graphs,
1.37 +is able to handle all the variations mentioned in Section
1.38 + \ref{sec:CommProb}. Although it can also handle directed graphs,
1.39 for the sake of simplicity, only the undirected case will be
1.40 discussed.
1.41
1.42 @@ -444,10 +444,10 @@
1.43
1.44 For example the problem type of graph isomorphism problem is the
1.45 following. A whole mapping $W$ is in $\mathbf{ISO}$, iff the
1.46 -bijection represented by $W$ satisfies \textbf{Definition
1.47 - \ref{sec:ismorphic})}. The subgraph- and induced subgraph matching
1.48 -problems can be formalized in a similar way. Let their problem types
1.49 -be denoted as $\mathbf{SUB}$ and $\mathbf{IND}$.
1.50 +bijection represented by $W$ satisfies Definition~\ref{sec:ismorphic}.
1.51 +The subgraph- and induced subgraph matching problems can be formalized
1.52 +in a similar way. Let their problem types be denoted as $\mathbf{SUB}$
1.53 +and $\mathbf{IND}$.
1.54
1.55 \begin{definition}
1.56 \label{expPT}
1.57 @@ -501,7 +501,7 @@
1.58 pruning the search tree. Each state $s$ of the matching process can
1.59 be associated with a mapping $M(s)$.
1.60
1.61 -\textbf{Algorithm~\ref{alg:VF2Pseu})} is a high level description of
1.62 +Algorithm~\ref{alg:VF2Pseu} is a high level description of
1.63 the VF2 matching algorithm.
1.64
1.65
1.66 @@ -571,7 +571,7 @@
1.67 \label{candidateComputingVF2}
1.68 $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.
1.69 Suppose that $PT$ is an expanding problem type, see
1.70 -\textbf{Definition~\ref{expPT})}.
1.71 +Definition~\ref{expPT}.
1.72
1.73 \begin{notation}
1.74 Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by
1.75 @@ -598,7 +598,7 @@
1.76
1.77 \subsection{Consistency}
1.78 This section defines the consistency functions for the different
1.79 -problem types mentioned in \textbf{Section \ref{sec:CommProb})}.
1.80 +problem types mentioned in Section~\ref{sec:CommProb}.
1.81 \begin{notation}
1.82 Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} :
1.83 (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large}
1.84 @@ -713,11 +713,11 @@
1.85 (\from) -- (\to);
1.86 % \draw[dashed] (\from) -- (\to);
1.87 \end{tikzpicture}
1.88 -\caption{Graphs for the proof of \textbf{Claim
1.89 - \ref{claimSUB}}} \label{fig:proofSUB}
1.90 +\caption{Graphs for the proof of Claim~\ref{claimSUB}}
1.91 +\label{fig:proofSUB}
1.92 \end{center}
1.93 \end{figure}
1.94 -Let the two graphs of \textbf{Figure \ref{fig:proofSUB})} be the input
1.95 +Let the two graphs of Figure~\ref{fig:proofSUB} be the input
1.96 graphs. Suppose the total ordering relation is $u_1 \prec u_2 \prec
1.97 u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add
1.98 $(u_2,v_2)\in P(s)$.\newline $Cons_{SUB}((u_2,v_2),M(s))=true$, so
1.99 @@ -954,7 +954,7 @@
1.100 \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
1.101 root node $r$. \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
1.102 $d$-th level \State Process $V_d$ \Comment{See Algorithm
1.103 - \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2})} \EndFor
1.104 + \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2}} \EndFor
1.105 \EndWhile \EndProcedure
1.106 \end{algorithmic}
1.107 \end{algorithm}
1.108 @@ -985,25 +985,24 @@
1.109 Refresh $F_\mathcal{M}$ \EndProcedure
1.110 \end{algorithmic}
1.111 \end{algorithm}
1.112 -\textbf{Algorithm~\ref{alg:VF2PPPseu})} is a high level description of
1.113 -the matching order procedure of VF2++. It computes a BFS tree for each
1.114 +Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
1.115 +matching order procedure of VF2++. It computes a BFS tree for each
1.116 component in ascending order of their rarest $lab$ and largest $deg$,
1.117 -whose root vertex is the component's minimal node. \textbf{Algorithm
1.118 - \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} are
1.119 -two different methods to process a level of the BFS tree.
1.120 +whose root vertex is the component's minimal
1.121 +node. Algorithms~\ref{alg:VF2PPProcess1} and \ref{alg:VF2PPProcess2}
1.122 +are two different methods to process a level of the BFS tree.
1.123
1.124 After sorting the nodes of the current level in descending
1.125 lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$,
1.126 -\textbf{Algorithm \ref{alg:VF2PPProcess2})} appends them
1.127 -simultaneously to the matching order $\mathcal{M}$ and refreshes
1.128 -$F_\mathcal{M}$ just once, whereas \textbf{Algorithm
1.129 - \ref{alg:VF2PPProcess1})} appends the nodes separately to
1.130 -$\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it
1.131 +Algorithm~\ref{alg:VF2PPProcess2} appends them simultaneously to the
1.132 +matching order $\mathcal{M}$ and refreshes $F_\mathcal{M}$ just once,
1.133 +whereas Algorithm~\ref{alg:VF2PPProcess1} appends the nodes separately
1.134 +to $\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it
1.135 provides up-to-date label information and may result in a more
1.136 efficient matching order.
1.137
1.138 -\textbf{Claim~\ref{claim:MOclaim})} shows that \textbf{Algorithm
1.139 - \ref{alg:VF2PPPseu})} provides a matching order.
1.140 +Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}
1.141 +provides a matching order.
1.142
1.143
1.144 \subsection{Cutting rules}
1.145 @@ -1079,20 +1078,20 @@
1.146 means "no node".
1.147 \subsubsection{Avoiding the recurrence}
1.148 Exploring the state space was described in a recursive fashion using
1.149 -sets (see \textbf{Algorithm~\ref{alg:VF2Pseu})}), which makes the
1.150 +sets (see Algorithm~\ref{alg:VF2Pseu}), which makes the
1.151 algorithm easy to understand but does not show directly an efficient
1.152 way of implementation. The following approach avoiding recursion and
1.153 using lookup tables instead of sets is not only fast but has linear
1.154 space complexity as well.
1.155
1.156 -The recursion of \textbf{Algorithm~\ref{alg:VF2Pseu})} can be realized
1.157 +The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
1.158 as a while loop, which has a loop counter $depth$ denoting the
1.159 all-time depth of the recursion. Fixing a matching order, let $M$
1.160 denote the array storing the all-time mapping. The initial state is
1.161 associated with the empty mapping, which means that $\forall i:
1.162 M[i]=INVALID$ and $depth=0$. In case of a recursive call, $depth$ has
1.163 to be incremented, while in case of a return, it has to be
1.164 -decremented. Based on \textbf{Claim~\ref{claim:claimCoverFromLeft})},
1.165 +decremented. Based on Claim~\ref{claim:claimCoverFromLeft},
1.166 $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before
1.167 $depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and
1.168 $\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes
1.169 @@ -1102,12 +1101,12 @@
1.170
1.171 The necessary part of the candidate set is easily maintainable or
1.172 computable by following
1.173 -\textbf{Section~\ref{candidateComputingVF2})}. A much faster method
1.174 +Section~\ref{candidateComputingVF2}. A much faster method
1.175 has been designed for biological- and sparse graphs, see the next
1.176 section for details.
1.177
1.178 \subsubsection{Calculating the candidates for a node}
1.179 -Being aware of \textbf{Claim~\ref{claim:claimCoverFromLeft})}, the
1.180 +Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
1.181 task is not to maintain the candidate set, but to generate the
1.182 candidate nodes in $G_{large}$ for a given node $u\in V_{small}$. In
1.183 case of an expanding problem type and $M$ mapping, if a node $v\in
1.184 @@ -1121,7 +1120,7 @@
1.185 describable if there exists a covered node in the component containing
1.186 $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily
1.187 --- such a node exists based on
1.188 -\textbf{Claim~\ref{claim:MOclaim})}. With all the candidates of $u$
1.189 +Claim~\ref{claim:MOclaim}. With all the candidates of $u$
1.190 being among the uncovered neighbours of $Pair(M,u')$, there are solely
1.191 $deg(Pair(M,u'))$ nodes to check.
1.192
1.193 @@ -1155,30 +1154,30 @@
1.194 seen before, a pointer shows where the first node of the FIFO is, and
1.195 another one shows where the next explored node has to be inserted. So
1.196 the nodes of each level of the BFS tree can be processed by
1.197 -\textbf{Algorithm \ref{alg:VF2PPProcess1})} and
1.198 -\textbf{\ref{alg:VF2PPProcess2})} in place by swapping nodes.
1.199 +Algorithms~\ref{alg:VF2PPProcess1} and
1.200 +\ref{alg:VF2PPProcess2} in place by swapping nodes.
1.201
1.202 After a node $u$ gets to the next place of the node order,
1.203 $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is
1.204 one less covered node in $V_{large}$ with label $lab(u)$, that is why
1.205 min selection sort is preferred which gives the elements from left to
1.206 -right in descending order, see \textbf{Algorithm
1.207 - \ref{alg:VF2PPProcess1})}.
1.208 +right in descending order, see Algorithm~\ref{alg:VF2PPProcess1}.
1.209
1.210 Note that using a $\Theta(n^2)$ sort absolutely does not slow down the
1.211 procedure on biological (and on sparse) graphs, since they have few
1.212 nodes on a level. If a level had a large number of nodes,
1.213 -\textbf{Algorithm \ref{alg:VF2PPProcess2})} would seem to be a better
1.214 +Algorithm~\ref{alg:VF2PPProcess2} would seem to be a better
1.215 choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the
1.216 efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not
1.217 be immediately refreshed, so it is unable to provide up-to-date label
1.218 information.
1.219
1.220 -Note that the \textit{while loop} of \textbf{Algorithm
1.221 - \ref{alg:VF2PPPseu})} takes one iteration per graph component and
1.222 -the graphs in biology are mostly connected.
1.223 +Note that the \textit{while loop} of Algorithm~\ref{alg:VF2PPPseu}
1.224 +takes one iteration per graph component and the graphs in biology are
1.225 +mostly connected.
1.226 +
1.227 \subsubsection{Cutting rules}
1.228 -In \textbf{Section \ref{VF2PPCuttingRules})}, the cutting rules were
1.229 +In Section~\ref{VF2PPCuttingRules}, the cutting rules were
1.230 described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$
1.231 and $\tilde T_{large}$, which are dependent on the all-time mapping
1.232 (i.e. on the all-time state). The aim is to check the labeled cutting
1.233 @@ -1261,7 +1260,7 @@
1.234 \end{algorithmic}
1.235 \end{algorithm}
1.236
1.237 -Using these notations, \textbf{Algorithm~\ref{alg:VF2PlusPseu})}
1.238 +Using these notations, Algorithm~\ref{alg:VF2PlusPseu}
1.239 provides the description of the sorting procedure.
1.240
1.241 Note that $P(u)$ is not the exact probability of finding a consistent
1.242 @@ -1293,8 +1292,8 @@
1.243 \subsubsection{Induced subgraph isomorphism}
1.244 This dataset contains a set of graph pairs, and \textbf{all} the
1.245 induced subgraph ismorphisms have to be found between
1.246 -them. \textbf{Figure \ref{fig:INDProt}), \ref{fig:INDContact}),} and
1.247 -\textbf{ \ref{fig:INDMolecule})} show the solution time of the
1.248 +them. Figure~\ref{fig:INDProt}), \ref{fig:INDContact}), and
1.249 +\ref{fig:INDMolecule} show the solution time of the
1.250 problems in the problem set.
1.251
1.252 \begin{figure}[H]
1.253 @@ -1358,9 +1357,10 @@
1.254 \subsubsection{Graph ismorphism}
1.255 In this experiment, the nodes of each graph in the database have been
1.256 shuffled and an isomorphism between the shuffled and the original
1.257 -graph has been searched. For runtime results, see \textbf{Figure
1.258 - \ref{fig:ISOProt}), \ref{fig:ISOContact}),} and
1.259 -\textbf{\ref{fig:ISOMolecule})}.
1.260 +graph has been searched. For runtime results, see
1.261 +Figure~\ref{fig:ISOProt}, \ref{fig:ISOContact}, and
1.262 +\ref{fig:ISOMolecule}.
1.263 +
1.264 \begin{figure}[H]
1.265 \begin{center}
1.266 \begin{tikzpicture}
1.267 @@ -1427,10 +1427,10 @@
1.268 To evaluate the efficiency of the algorithms in the case of graph
1.269 isomorphism, connected graphs of less than 20 000 nodes have been
1.270 considered. Generating a random graph and shuffling its nodes, an
1.271 -isomorphism had to be found. \textbf{Figure \ref{fig:randISO5}),
1.272 - \ref{fig:randISO10}), \ref{fig:randISO15}), \ref{fig:randISO35}),
1.273 - \ref{fig:randISO45}),} and \textbf{\ref{fig:randISO100}) } show the
1.274 -runtime results on graph sets of various density.
1.275 +isomorphism had to be found. Figure \ref{fig:randISO5},
1.276 +\ref{fig:randISO10}, \ref{fig:randISO15}, \ref{fig:randISO35},
1.277 +\ref{fig:randISO45}, and \ref{fig:randISO100} show the runtime results
1.278 +on graph sets of various density.
1.279
1.280 \begin{figure}[H]
1.281 \begin{center}
1.282 @@ -1545,7 +1545,7 @@
1.283 outperforms its rival especially on sparse graphs. The reason for the
1.284 slightly super linear behaviour of VF2++ on denser graphs is the
1.285 larger number of nodes in the BFS tree constructed in
1.286 -\textbf{Algorithm \ref{alg:VF2PPPseu})}.
1.287 +Algorithm~\ref{alg:VF2PPPseu}.
1.288
1.289 \subsubsection{Induced subgraph isomorphism}
1.290 This section provides a comparison of VF2++ and VF2 Plus in the case
1.291 @@ -1559,12 +1559,12 @@
1.292 choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes,
1.293 and for all the 10 subgraphs find a mapping by using both the graph
1.294 matching algorithms. The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
1.295 -0.3, 0.6, 0.8, 0.95$ cases have been examined (see \textbf{Figure
1.296 - \ref{fig:randIND5}), \ref{fig:randIND10})} and
1.297 -\textbf{\ref{fig:randIND35})}), and for each $\delta$, a cumulative
1.298 -chart is given as well, which excludes $\rho = 0.05$ and $0.1$ for the
1.299 -sake of perspicuity (see \textbf{Figure \ref{fig:randIND5Sum}),
1.300 - \ref{fig:randIND10Sum})} and \textbf{\ref{fig:randIND35Sum})}).
1.301 +0.3, 0.6, 0.8, 0.95$ cases have been examined (see
1.302 +Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
1.303 +\ref{fig:randIND35}, and for each $\delta$, a cumulative chart is
1.304 +given as well, which excludes $\rho = 0.05$ and $0.1$ for the sake of
1.305 +perspicuity (see Figure~\ref{fig:randIND5Sum}, \ref{fig:randIND10Sum}
1.306 +and \ref{fig:randIND35Sum}).
1.307
1.308
1.309
1.310 @@ -1970,8 +1970,8 @@
1.311 handle really large graphs in milliseconds. Note that when $IND$ was
1.312 considered and the small graphs had proportionally few nodes ($\rho =
1.313 0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
1.314 -orders(e.g. see the $\delta=10$ case on \textbf{Figure
1.315 - \ref{fig:randIND10})}). If these examples had been excluded, the
1.316 +orders (e.g. see the $\delta=10$ case on
1.317 +Figure~\ref{fig:randIND10}). If these examples had been excluded, the
1.318 charts would have seemed to be similar to the other ones.
1.319 Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
1.320 Plus slow slightly down, but remain practically usable even on graphs