author Alpar Juttner <alpar@cs.elte.hu>

Tue, 22 Nov 2016 14:11:49 +0100

changeset 4 c682d3237141

parent 3 550f81b2f81c

child 5 c04b029e623a

damecco.tex file | annotate | diff | comparison | revisions
     1.1 --- a/damecco.tex	Tue Nov 22 13:58:09 2016 +0100
     1.2 +++ b/damecco.tex	Tue Nov 22 14:11:49 2016 +0100
     1.3 @@ -229,23 +229,23 @@
     1.4  does it cover all the recent algorithms.
     1.5  
     1.6  The first practically usable approach was due to
     1.7 -\textbf{Ullmann}\cite{Ullmann} which is a commonly used depth-first
     1.8 +Ullmann\cite{Ullmann} which is a commonly used depth-first
     1.9  search based algorithm with a complex heuristic for reducing the
    1.10  number of visited states. A major problem is its $\Theta(n^3)$ space
    1.11  complexity, which makes it impractical in the case of big sparse
    1.12  graphs.
    1.13  
    1.14 -In a recent paper, \textbf{Ullmann}\cite{UllmannBit} presents an
    1.15 +In a recent paper, Ullmann\cite{UllmannBit} presents an
    1.16  improved version of this algorithm based on a bit-vector solution for
    1.17  the binary Constraint Satisfaction Problem.
    1.18  
    1.19 -The \textbf{Nauty} algorithm\cite{Nauty} transforms the two graphs to
    1.20 +The Nauty algorithm\cite{Nauty} transforms the two graphs to
    1.21  a canonical form before starting to check for the isomorphism. It has
    1.22  been considered as one of the fastest graph isomorphism algorithms,
    1.23  although graph categories were shown in which it takes exponentially
    1.24  many steps. This algorithm handles only the graph isomorphism problem.
    1.25  
    1.26 -The \textbf{LAD} algorithm\cite{Lad} uses a depth-first search
    1.27 +The \emph{LAD} algorithm\cite{Lad} uses a depth-first search
    1.28  strategy and formulates the matching as a Constraint Satisfaction
    1.29  Problem to prune the search tree. The constraints are that the mapping
    1.30  has to be injective and edge-preserving, hence it is possible to
    1.31 @@ -382,8 +382,8 @@
    1.32  
    1.33  \section{The VF2 Algorithm}
    1.34  This algorithm is the basis of both the VF2++ and the VF2 Plus.  VF2
    1.35 -is able to handle all the variations mentioned in \textbf{Section
    1.36 -  \ref{sec:CommProb})}.  Although it can also handle directed graphs,
    1.37 +is able to handle all the variations mentioned in Section
    1.38 +  \ref{sec:CommProb}.  Although it can also handle directed graphs,
    1.39  for the sake of simplicity, only the undirected case will be
    1.40  discussed.
    1.41  
    1.42 @@ -444,10 +444,10 @@
    1.43  
    1.44  For example the problem type of graph isomorphism problem is the
    1.45  following.  A whole mapping $W$ is in $\mathbf{ISO}$, iff the
    1.46 -bijection represented by $W$ satisfies \textbf{Definition
    1.47 -  \ref{sec:ismorphic})}.  The subgraph- and induced subgraph matching
    1.48 -problems can be formalized in a similar way. Let their problem types
    1.49 -be denoted as $\mathbf{SUB}$ and $\mathbf{IND}$.
    1.50 +bijection represented by $W$ satisfies Definition~\ref{sec:ismorphic}.
    1.51 +The subgraph- and induced subgraph matching problems can be formalized
    1.52 +in a similar way. Let their problem types be denoted as $\mathbf{SUB}$
    1.53 +and $\mathbf{IND}$.
    1.54  
    1.55  \begin{definition}
    1.56  \label{expPT}
    1.57 @@ -501,7 +501,7 @@
    1.58  pruning the search tree.  Each state $s$ of the matching process can
    1.59  be associated with a mapping $M(s)$.
    1.60  
    1.61 -\textbf{Algorithm~\ref{alg:VF2Pseu})} is a high level description of
    1.62 +Algorithm~\ref{alg:VF2Pseu} is a high level description of
    1.63  the VF2 matching algorithm.
    1.64  
    1.65  
    1.66 @@ -571,7 +571,7 @@
    1.67  \label{candidateComputingVF2}
    1.68  $P(s)$ is the set of the candidate pairs for inclusion in $M(s)$.
    1.69  Suppose that $PT$ is an expanding problem type, see
    1.70 -\textbf{Definition~\ref{expPT})}.
    1.71 +Definition~\ref{expPT}.
    1.72  
    1.73  \begin{notation}
    1.74  Let $\mathbf{T_{small}(s)}:=\{u \in V_{small} : u$ is not covered by
    1.75 @@ -598,7 +598,7 @@
    1.76  
    1.77  \subsection{Consistency}
    1.78  This section defines the consistency functions for the different
    1.79 -problem types mentioned in \textbf{Section \ref{sec:CommProb})}.
    1.80 +problem types mentioned in Section~\ref{sec:CommProb}.
    1.81  \begin{notation}
    1.82  Let $\mathbf{\Gamma_{small} (u)}:=\{\tilde{u}\in V_{small} :
    1.83  (u,\tilde{u})\in E_{small}\}$\\ Let $\mathbf{\Gamma_{large}
    1.84 @@ -713,11 +713,11 @@
    1.85    (\from) -- (\to);
    1.86  %    \draw[dashed] (\from) -- (\to);
    1.87  \end{tikzpicture}
    1.88 -\caption{Graphs for the proof of \textbf{Claim
    1.89 -    \ref{claimSUB}}} \label{fig:proofSUB}
    1.90 +\caption{Graphs for the proof of Claim~\ref{claimSUB}}
    1.91 +\label{fig:proofSUB}
    1.92  \end{center}
    1.93  \end{figure}
    1.94 -Let the two graphs of \textbf{Figure \ref{fig:proofSUB})} be the input
    1.95 +Let the two graphs of Figure~\ref{fig:proofSUB} be the input
    1.96  graphs.  Suppose the total ordering relation is $u_1 \prec u_2 \prec
    1.97  u_3 \prec u_4$,$M(s)\!=\{(u_1,v_1)\}$, and VF2 tries to add
    1.98  $(u_2,v_2)\in P(s)$.\newline $Cons_{SUB}((u_2,v_2),M(s))=true$, so
    1.99 @@ -954,7 +954,7 @@
   1.100  \mathcal{M})$)\label{alg:findMin} \State Compute $T$, a BFS tree with
   1.101  root node $r$.  \For{$d=0,1,...,depth(T)$} \State $V_d$:=nodes of the
   1.102  $d$-th level \State Process $V_d$ \Comment{See Algorithm
   1.103 -  \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2})} \EndFor
   1.104 +  \ref{alg:VF2PPProcess1}) and \ref{alg:VF2PPProcess2}} \EndFor
   1.105  \EndWhile \EndProcedure
   1.106  \end{algorithmic}
   1.107  \end{algorithm}
   1.108 @@ -985,25 +985,24 @@
   1.109  Refresh $F_\mathcal{M}$ \EndProcedure
   1.110  \end{algorithmic}
   1.111  \end{algorithm}
   1.112 -\textbf{Algorithm~\ref{alg:VF2PPPseu})} is a high level description of
   1.113 -the matching order procedure of VF2++. It computes a BFS tree for each
   1.114 +Algorithm~\ref{alg:VF2PPPseu} is a high level description of the
   1.115 +matching order procedure of VF2++. It computes a BFS tree for each
   1.116  component in ascending order of their rarest $lab$ and largest $deg$,
   1.117 -whose root vertex is the component's minimal node. \textbf{Algorithm
   1.118 -  \ref{alg:VF2PPProcess1})} and \textbf{\ref{alg:VF2PPProcess2})} are
   1.119 -two different methods to process a level of the BFS tree.
   1.120 +whose root vertex is the component's minimal
   1.121 +node. Algorithms~\ref{alg:VF2PPProcess1} and \ref{alg:VF2PPProcess2}
   1.122 +are two different methods to process a level of the BFS tree.
   1.123  
   1.124  After sorting the nodes of the current level in descending
   1.125  lexicographic order by $(Conn_{\mathcal{M}},deg,-F_\mathcal{M})$,
   1.126 -\textbf{Algorithm \ref{alg:VF2PPProcess2})} appends them
   1.127 -simultaneously to the matching order $\mathcal{M}$ and refreshes
   1.128 -$F_\mathcal{M}$ just once, whereas \textbf{Algorithm
   1.129 -  \ref{alg:VF2PPProcess1})} appends the nodes separately to
   1.130 -$\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it
   1.131 +Algorithm~\ref{alg:VF2PPProcess2} appends them simultaneously to the
   1.132 +matching order $\mathcal{M}$ and refreshes $F_\mathcal{M}$ just once,
   1.133 +whereas Algorithm~\ref{alg:VF2PPProcess1} appends the nodes separately
   1.134 +to $\mathcal{M}$ and refreshes $F_\mathcal{M}$ immediately, so it
   1.135  provides up-to-date label information and may result in a more
   1.136  efficient matching order.
   1.137  
   1.138 -\textbf{Claim~\ref{claim:MOclaim})} shows that \textbf{Algorithm
   1.139 -  \ref{alg:VF2PPPseu})} provides a matching order.
   1.140 +Claim~\ref{claim:MOclaim} shows that Algorithm~\ref{alg:VF2PPPseu}
   1.141 +provides a matching order.
   1.142  
   1.143  
   1.144  \subsection{Cutting rules}
   1.145 @@ -1079,20 +1078,20 @@
   1.146  means "no node".
   1.147  \subsubsection{Avoiding the recurrence}
   1.148  Exploring the state space was described in a recursive fashion using
   1.149 -sets (see \textbf{Algorithm~\ref{alg:VF2Pseu})}), which makes the
   1.150 +sets (see Algorithm~\ref{alg:VF2Pseu}), which makes the
   1.151  algorithm easy to understand but does not show directly an efficient
   1.152  way of implementation. The following approach avoiding recursion and
   1.153  using lookup tables instead of sets is not only fast but has linear
   1.154  space complexity as well.
   1.155  
   1.156 -The recursion of \textbf{Algorithm~\ref{alg:VF2Pseu})} can be realized
   1.157 +The recursion of Algorithm~\ref{alg:VF2Pseu} can be realized
   1.158  as a while loop, which has a loop counter $depth$ denoting the
   1.159  all-time depth of the recursion.  Fixing a matching order, let $M$
   1.160  denote the array storing the all-time mapping.  The initial state is
   1.161  associated with the empty mapping, which means that $\forall i:
   1.162  M[i]=INVALID$ and $depth=0$.  In case of a recursive call, $depth$ has
   1.163  to be incremented, while in case of a return, it has to be
   1.164 -decremented.  Based on \textbf{Claim~\ref{claim:claimCoverFromLeft})},
   1.165 +decremented.  Based on Claim~\ref{claim:claimCoverFromLeft},
   1.166  $M$ is $INVALID$ from index $depth$+1 and not $INVALID$ before
   1.167  $depth$, i.e. $\forall i: i < depth \Rightarrow M[i]\neq INVALID$ and
   1.168  $\forall i: i > depth \Rightarrow M[i]= INVALID$. $M[depth]$ changes
   1.169 @@ -1102,12 +1101,12 @@
   1.170  
   1.171  The necessary part of the candidate set is easily maintainable or
   1.172  computable by following
   1.173 -\textbf{Section~\ref{candidateComputingVF2})}. A much faster method
   1.174 +Section~\ref{candidateComputingVF2}. A much faster method
   1.175  has been designed for biological- and sparse graphs, see the next
   1.176  section for details.
   1.177  
   1.178  \subsubsection{Calculating the candidates for a node}
   1.179 -Being aware of \textbf{Claim~\ref{claim:claimCoverFromLeft})}, the
   1.180 +Being aware of Claim~\ref{claim:claimCoverFromLeft}, the
   1.181  task is not to maintain the candidate set, but to generate the
   1.182  candidate nodes in $G_{large}$ for a given node $u\in V_{small}$.  In
   1.183  case of an expanding problem type and $M$ mapping, if a node $v\in
   1.184 @@ -1121,7 +1120,7 @@
   1.185  describable if there exists a covered node in the component containing
   1.186  $u$. In this case choose a covered neighbour $u'$ of $u$ arbitrarily
   1.187  --- such a node exists based on
   1.188 -\textbf{Claim~\ref{claim:MOclaim})}. With all the candidates of $u$
   1.189 +Claim~\ref{claim:MOclaim}. With all the candidates of $u$
   1.190  being among the uncovered neighbours of $Pair(M,u')$, there are solely
   1.191  $deg(Pair(M,u'))$ nodes to check.
   1.192  
   1.193 @@ -1155,30 +1154,30 @@
   1.194  seen before, a pointer shows where the first node of the FIFO is, and
   1.195  another one shows where the next explored node has to be inserted. So
   1.196  the nodes of each level of the BFS tree can be processed by
   1.197 -\textbf{Algorithm \ref{alg:VF2PPProcess1})} and
   1.198 -\textbf{\ref{alg:VF2PPProcess2})} in place by swapping nodes.
   1.199 +Algorithms~\ref{alg:VF2PPProcess1} and
   1.200 +\ref{alg:VF2PPProcess2} in place by swapping nodes.
   1.201  
   1.202  After a node $u$ gets to the next place of the node order,
   1.203  $F_\mathcal{M}[lab[u]]$ has to be decreased by one, because there is
   1.204  one less covered node in $V_{large}$ with label $lab(u)$, that is why
   1.205  min selection sort is preferred which gives the elements from left to
   1.206 -right in descending order, see \textbf{Algorithm
   1.207 -  \ref{alg:VF2PPProcess1})}.
   1.208 +right in descending order, see Algorithm~\ref{alg:VF2PPProcess1}.
   1.209  
   1.210  Note that using a $\Theta(n^2)$ sort absolutely does not slow down the
   1.211  procedure on biological (and on sparse) graphs, since they have few
   1.212  nodes on a level. If a level had a large number of nodes,
   1.213 -\textbf{Algorithm \ref{alg:VF2PPProcess2})} would seem to be a better
   1.214 +Algorithm~\ref{alg:VF2PPProcess2} would seem to be a better
   1.215  choice with a $\Theta(nlog(n))$ or Bucket sort, but it may reduce the
   1.216  efficiency of the matching procedure, since $F_\mathcal{M}(i)$ can not
   1.217  be immediately refreshed, so it is unable to provide up-to-date label
   1.218  information.
   1.219  
   1.220 -Note that the \textit{while loop} of \textbf{Algorithm
   1.221 -  \ref{alg:VF2PPPseu})} takes one iteration per graph component and
   1.222 -the graphs in biology are mostly connected.
   1.223 +Note that the \textit{while loop} of Algorithm~\ref{alg:VF2PPPseu}
   1.224 +takes one iteration per graph component and the graphs in biology are
   1.225 +mostly connected.
   1.226 +
   1.227  \subsubsection{Cutting rules}
   1.228 -In \textbf{Section \ref{VF2PPCuttingRules})}, the cutting rules were
   1.229 +In Section~\ref{VF2PPCuttingRules}, the cutting rules were
   1.230  described using the sets $T_{small}$, $T_{large}$, $\tilde T_{small}$
   1.231  and $\tilde T_{large}$, which are dependent on the all-time mapping
   1.232  (i.e. on the all-time state). The aim is to check the labeled cutting
   1.233 @@ -1261,7 +1260,7 @@
   1.234  \end{algorithmic}
   1.235  \end{algorithm}
   1.236  
   1.237 -Using these notations, \textbf{Algorithm~\ref{alg:VF2PlusPseu})}
   1.238 +Using these notations, Algorithm~\ref{alg:VF2PlusPseu}
   1.239  provides the description of the sorting procedure.
   1.240  
   1.241  Note that $P(u)$ is not the exact probability of finding a consistent
   1.242 @@ -1293,8 +1292,8 @@
   1.243  \subsubsection{Induced subgraph isomorphism}
   1.244  This dataset contains a set of graph pairs, and \textbf{all} the
   1.245  induced subgraph ismorphisms have to be found between
   1.246 -them. \textbf{Figure \ref{fig:INDProt}), \ref{fig:INDContact}),} and
   1.247 -\textbf{ \ref{fig:INDMolecule})} show the solution time of the
   1.248 +them. Figure~\ref{fig:INDProt}), \ref{fig:INDContact}), and
   1.249 +\ref{fig:INDMolecule} show the solution time of the
   1.250  problems in the problem set.
   1.251  
   1.252  \begin{figure}[H]
   1.253 @@ -1358,9 +1357,10 @@
   1.254  \subsubsection{Graph ismorphism}
   1.255  In this experiment, the nodes of each graph in the database have been
   1.256  shuffled and an isomorphism between the shuffled and the original
   1.257 -graph has been searched. For runtime results, see \textbf{Figure
   1.258 -  \ref{fig:ISOProt}), \ref{fig:ISOContact}),} and
   1.259 -\textbf{\ref{fig:ISOMolecule})}.
   1.260 +graph has been searched. For runtime results, see
   1.261 +Figure~\ref{fig:ISOProt}, \ref{fig:ISOContact}, and
   1.262 +\ref{fig:ISOMolecule}.
   1.263 +  
   1.264  \begin{figure}[H]
   1.265  \begin{center}
   1.266  \begin{tikzpicture}
   1.267 @@ -1427,10 +1427,10 @@
   1.268  To evaluate the efficiency of the algorithms in the case of graph
   1.269  isomorphism, connected graphs of less than 20 000 nodes have been
   1.270  considered. Generating a random graph and shuffling its nodes, an
   1.271 -isomorphism had to be found. \textbf{Figure \ref{fig:randISO5}),
   1.272 -  \ref{fig:randISO10}), \ref{fig:randISO15}), \ref{fig:randISO35}),
   1.273 -  \ref{fig:randISO45}),} and \textbf{\ref{fig:randISO100}) } show the
   1.274 -runtime results on graph sets of various density.
   1.275 +isomorphism had to be found. Figure \ref{fig:randISO5},
   1.276 +\ref{fig:randISO10}, \ref{fig:randISO15}, \ref{fig:randISO35},
   1.277 +\ref{fig:randISO45}, and \ref{fig:randISO100} show the runtime results
   1.278 +on graph sets of various density.
   1.279  
   1.280  \begin{figure}[H]
   1.281  \begin{center}
   1.282 @@ -1545,7 +1545,7 @@
   1.283  outperforms its rival especially on sparse graphs. The reason for the
   1.284  slightly super linear behaviour of VF2++ on denser graphs is the
   1.285  larger number of nodes in the BFS tree constructed in
   1.286 -\textbf{Algorithm \ref{alg:VF2PPPseu})}.
   1.287 +Algorithm~\ref{alg:VF2PPPseu}.
   1.288  
   1.289  \subsubsection{Induced subgraph isomorphism}
   1.290  This section provides a comparison of VF2++ and VF2 Plus in the case
   1.291 @@ -1559,12 +1559,12 @@
   1.292  choose 10 of its induced subgraphs having $\rho\ |V_{large}|$ nodes,
   1.293  and for all the 10 subgraphs find a mapping by using both the graph
   1.294  matching algorithms.  The $\delta = 5, 10, 35$ and $\rho = 0.05, 0.1,
   1.295 -0.3, 0.6, 0.8, 0.95$ cases have been examined (see \textbf{Figure
   1.296 -  \ref{fig:randIND5}), \ref{fig:randIND10})} and
   1.297 -\textbf{\ref{fig:randIND35})}), and for each $\delta$, a cumulative
   1.298 -chart is given as well, which excludes $\rho = 0.05$ and $0.1$ for the
   1.299 -sake of perspicuity (see \textbf{Figure \ref{fig:randIND5Sum}),
   1.300 -  \ref{fig:randIND10Sum})} and \textbf{\ref{fig:randIND35Sum})}).
   1.301 +0.3, 0.6, 0.8, 0.95$ cases have been examined (see
   1.302 +Figure~\ref{fig:randIND5}, \ref{fig:randIND10} and
   1.303 +\ref{fig:randIND35}, and for each $\delta$, a cumulative chart is
   1.304 +given as well, which excludes $\rho = 0.05$ and $0.1$ for the sake of
   1.305 +perspicuity (see Figure~\ref{fig:randIND5Sum}, \ref{fig:randIND10Sum}
   1.306 +and \ref{fig:randIND35Sum}).
   1.307  
   1.308  
   1.309  
   1.310 @@ -1970,8 +1970,8 @@
   1.311  handle really large graphs in milliseconds. Note that when $IND$ was
   1.312  considered and the small graphs had proportionally few nodes ($\rho =
   1.313  0.05$, or $\rho = 0.1$), then VF2 Plus produced some inefficient node
   1.314 -orders(e.g. see the $\delta=10$ case on \textbf{Figure
   1.315 -  \ref{fig:randIND10})}). If these examples had been excluded, the
   1.316 +orders (e.g. see the $\delta=10$ case on
   1.317 +Figure~\ref{fig:randIND10}). If these examples had been excluded, the
   1.318  charts would have seemed to be similar to the other ones.
   1.319  Unsurprisingly, as denser graphs are considered, both VF2++ and VF2
   1.320  Plus slow slightly down, but remain practically usable even on graphs
author	Alpar Juttner <alpar@cs.elte.hu>
	Tue, 22 Nov 2016 14:11:49 +0100
changeset 4	c682d3237141
parent 3	550f81b2f81c
child 5	c04b029e623a