[Lemon-commits] Peter Kovacs: Rework the LGF section

Mon Feb 22 19:46:52 CET 2010

details:   http://lemon.cs.elte.hu/hg/lemon-tutorial/rev/a9f8282eb6b7
changeset: 44:a9f8282eb6b7
user:      Peter Kovacs <kpeter [at] inf.elte.hu>
date:      Mon Feb 22 00:42:19 2010 +0100
description:
	Rework the LGF section

diffstat:

 lgf.dox |  170 ++++++++++++++++++--------------------------------------
 1 files changed, 56 insertions(+), 114 deletions(-)

diffs (220 lines):

diff --git a/lgf.dox b/lgf.dox
--- a/lgf.dox
+++ b/lgf.dox
@@ -18,35 +18,31 @@
 
 namespace lemon {
 /**
-[PAGE]sec_lgf[PAGE] Input-Output for Graphs
+[PAGE]sec_lgf[PAGE] LEMON Graph Format
 
-\todo Clarify this section.
+LEMON provides a versatile file format for storing graphs and related
+node and arc maps. Such a format should be really flexible, it should be
+able to store arbitrary number of maps of arbitrary value types.
+On the other hand, the file size and the ease of processing are also critical
+to support huge graphs, which is a major goal of LEMON.
+These requirements forbid using complicated and deeply structured formats
+like XML. That is why a compact text file format is designed for LEMON
+instead of using hierarchical formats (such as GraphML, GXL or GML).
 
-LEMON provides a versatile file format for storing graphs
-and related node and arc maps.
-Such a format should be really flexible, it should be able to store arbitrary
-number of maps of arbitrary value types.
-On the other hand, the file size and the ease of processing are also critical
-to support storing huge graphs, which is a major goal of LEMON.
-These requirements forbid using complicated and deeply structured formats
-like XML.
-That is why a compact file format is designed for LEMON instead of using
-hierarchical formats, such as GraphML, GXL or GML.
+The LEMON Graph Format (LGF) consists of several sections, for example a
+digraph is stored in a <tt>\@nodes</tt> and an <tt>\@arcs</tt> section.
+These parts use column oriented formats, each column belongs to a map in
+the graph. The first line of the section associates names to these maps,
+which can be used to refer them. Note that this simple idea makes it
+possible to extend the files with new maps (columns) at any position
+without having to modify the codes using these files.
+Other data can also be added to an LGF file as individual properties
+in an <tt>\@attributes</tt> section. This part can be used to specify
+certain nodes or arcs, or store meta data for the file, such as copyright
+notice.
 
-The LEMON Graph Format (LGF) comprises different sections, for
-example a digraph is stored in a \c @nodes and an \c @arcs
-section. These parts use column oriented formats, each
-column belongs to a map in the graph. The first line of the section associate
-names to these maps, which can be used to refer them.
-Note that this simple idea makes it possible to extend the files with
-new maps (columns) at any
-position without having to modify the codes using these files.
-
-The \c label map has special role, it must store unique values, which in turn
-can be used to refer
-to the nodes and arcs in the file. Finally, the first two column of the
-\c @arcs section is anonymous, they indicate the source and target nodes,
-respectively.
+For example, a shortest path problem, which is represented as a directed
+graph with some node and arc maps, can be stored in LGF format as follows.
 
 \code
   @nodes
@@ -55,6 +51,7 @@
   1     (40,120)
   ...
   41    (600,100)
+
   @arcs
             label length
   0    1    0     16
@@ -62,19 +59,21 @@
   2    12   2     20
   ...
   36   41   123   21
+
   @attributes
   source 0
-  target 41
   caption "A shortest path problem"
 \endcode
 
-The \ref DigraphReader and \ref DigraphWriter classes are used
-to read and write a digraph and corresponding maps. By default, a map
-can be used with these classes if its value type has standard I/O
-operators (\c operator<<(ostream&, T) and \c operator>>(istream&, T&)).
-Otherwise, a function object
-can be specified which converts the value type to \c std::string.
-The above LGF file can be scanned as follows.
+In the <tt>\@nodes</tt> section, the \c label map has special role,
+it must store unique values, which in turn can be used to refer to the nodes
+in the file.
+The first two columns of the <tt>\@arcs</tt> section are anonymous, they
+store the source and target nodes, respectively.
+
+The \ref DigraphReader and \ref DigraphWriter classes can used
+to read and write data in LGF format.
+For example, the above file can be read as follows.
 
 \code
   ListDigraph g;
@@ -83,97 +82,40 @@
   ListDigraph::Node src;
   std::string title;
 
-  digraphReader(g, std::cin)
-    .nodeMap("coord", coord)
+  digraphReader(g, "digraph.lgf")
+    .nodeMap("coordinate", coord)
     .arcMap("length", length)
+    .node("source", src)
     .attribute("caption", title)
-    .node("source", src)
     .run();
 \endcode
 
-Apart from LGF, the library can also interpret other graph
-formats, such as the well-known DIMACS format or the NAUTY graph6
-format.
+Note that the \ref DigraphReader class is used through the \ref digraphReader()
+function with several named parameters.
 
-<hr>
+\note By default, a map can be used with these classes if its value type
+has standard I/O operators (\c operator<<(ostream&, T) and \c
+operator>>(istream&, T&)). Otherwise, a function object can be specified
+which converts the value type to \c std::string.
 
-The \e LGF is a <em>column oriented</em>
-file format for storing graphs and associated data like
-node and edge maps.
-
-Each line with \c '#' first non-whitespace
-character is considered as a comment line.
-
-Otherwise the file consists of sections starting with
-a header line. The header lines starts with an \c '@' character followed by the
-type of section. The standard section types are \c \@nodes, \c
-\@arcs and \c \@edges
-and \@attributes. Each header line may also have an optional
-\e name, which can be use to distinguish the sections of the same
-type.
-
-The standard sections are column oriented, each line consists of
-<em>token</em>s separated by whitespaces. A token can be \e plain or
-\e quoted. A plain token is just a sequence of non-whitespace characters,
-while a quoted token is a
-character sequence surrounded by double quotes, and it can also
-contain whitespaces and escape sequences.
-
-The \c \@nodes section describes a set of nodes and associated
-maps. The first is a header line, its columns are the names of the
-maps appearing in the following lines.
-One of the maps must be called \c
-"label", which plays special role in the file.
-The following
-non-empty lines until the next section describes nodes of the
-graph. Each line contains the values of the node maps
-associated to the current node.
+The following code demonstrates how the above digraph with the maps and
+attributes can be written into the standard output in LGF Format.
 
 \code
- @nodes
- label  coordinates  size    title
- 1      (10,20)      10      "First node"
- 2      (80,80)      8       "Second node"
- 3      (40,10)      10      "Third node"
+  digraphWriter(g, std::cout)
+    .nodeMap("coordinate", coord)
+    .arcMap("length", length)
+    .node("source", src)
+    .attribute("caption", title)
+    .run();
 \endcode
+ 
+Apart from LGF, the library can also handle other graph
+formats, such as the well-known DIMACS format.
 
-The \c \@arcs section is very similar to the \c \@nodes section,
-it again starts with a header line describing the names of the maps,
-but the \c "label" map is not obligatory here. The following lines
-describe the arcs. The first two tokens of each line are
-the source and the target node of the arc, respectively, then come the map
-values. The source and target tokens must be node labels.
-
-\code
- @arcs
-         capacity
- 1   2   16
- 1   3   12
- 2   3   18
-\endcode
-
-The \c \@edges is just a synonym of \c \@arcs. The \@arcs section can
-also store the edge set of an undirected graph. In such case there is
-a conventional method for store arc maps in the file, if two columns
-has the same caption with \c '+' and \c '-' prefix, then these columns
-can be regarded as the values of an arc map.
-
-The \c \@attributes section contains key-value pairs, each line
-consists of two tokens, an attribute name, and then an attribute
-value. The value of the attribute could be also a label value of a
-node or an edge, or even an edge label prefixed with \c '+' or \c '-',
-which regards to the forward or backward directed arc of the
-corresponding edge.
-
-\code
- @attributes
- source 1
- target 3
- caption "LEMON test digraph"
-\endcode
-
-The \e LGF can contain extra sections, but there is no restriction on
-the format of such sections.
+For more information, see a more detailed \ref lgf-format
+"description of the LGF format" and the \ref io_group module
+in the reference manual.
 
 [TRAILER]
 */