COIN-OR::LEMON - Graph Library

Opened 10 years ago

Closed 10 years ago

#35 closed task (fixed)

Port .lgf related tools

Reported by: alpar Owned by: deba
Priority: blocker Milestone: LEMON 1.0 release
Component: core Version: hg main
Keywords: Cc:
Revision id:

Description

This involves the .lgf specification and the reader and writer classes (GraphReader,GraphWriter, LemonReader and LemonWriter)

They also need a thorough revision, the .lgf specification must be clarified and simplified a lot.

Attachments (12)

lgf_io.hg (71.4 KB) - added by deba 10 years ago.
Preliminary version of redesigned IO
lgf_io-single.patch (40.0 KB) - added by alpar 10 years ago.
The changesets of lgf_io.hg merged into a single one.
lgf_io.bundle (13.0 KB) - added by deba 10 years ago.
digraph_reader_doc.patch (12.8 KB) - added by deba 10 years ago.
digraph_reader_doc.2.patch (12.6 KB) - added by deba 10 years ago.
lgf_doc.patch (28.7 KB) - added by deba 10 years ago.
lgf-page.patch (16.9 KB) - added by alpar 10 years ago.
To be applied on the top of lgf_doc.patch
lgf_bug_fix.patch (5.5 KB) - added by deba 10 years ago.
lgf_demo.patch (5.6 KB) - added by deba 10 years ago.
3d9971265677.patch (1.4 KB) - added by deba 10 years ago.
b026e9779b28.patch (9.8 KB) - added by deba 10 years ago.
9159de5e9657.patch (7.2 KB) - added by deba 10 years ago.

Download all attachments as: .zip

Change History (37)

comment:1 Changed 10 years ago by alpar

Now, I start a discussion here, how the .lgf format should look like.

First of all my basic principle is

There should be a single format specification, i.e. we must avoid extended or restricted formats.

Now, let's consider a single block (the inside of a @nodeset or @edgeset). This part of the file consists of columns. Each column is a string (i.e. sequence of characters), independently of how it is interpreted when it is read. My (second) principle here is

Columns must be reconstructible from the .lgf file without any additional information.

My rules for the column separation would be as follows.

  • A column can be either
    • A single word without any white-spaces inside. Each characters are interpreted as is.
    • A string enclosed with double quotes ("). Then, it can also contain white-spaces (except new-line) and escape sequences (like \n, \" etc). Then the interpretation of the string would be the same as the standard C strings.
  • The column are simply white-space separated.

More complex rules can also be worked out, but I think this simple one is comfortable enough for all purpose.

Once we agreed on this, we can continue with discussing how to handle

  • the blocks - especially the multiple @edgesets,
  • the directed, undirected and the bipartite graphs,

and also how the read/write API should look like.

comment:2 follow-up: Changed 10 years ago by deba

I agree with Alpar, that the new format should be readable without knowing the input method.
But I think format suggested by him is to restrictive.

In my opinion the format should look like that:

  • It contains well parsed expressions from (), {}, []
  • In each inner space between parses could be any sequence of characters and strings
  • The outer spaces should not contain whitespace out of a string.
  • The string representation is a possible escaped character sequence surrounding "" or

comment:3 in reply to: ↑ 2 Changed 10 years ago by alpar

Replying to deba:

I agree with Alpar, that the new format should be readable without knowing the input method.
But I think format suggested by him is to restrictive.

Could you show some examples where it is really restrictive?

As far as I understand, your specification is more flexible than mine only if both of the followings hold:

  • The value is a properly bracketed expression,
  • It contains white-spaces, but only inside a bracket.

Even in this case the only thing you can save is a pair of double quotes around the value. Is that really worth the increased complexity?
Remember, if we define a more complex rule for the column separation and reconstruction, then all tools working with .lgf files must implement this rule.

comment:4 Changed 10 years ago by alpar

  • Owner changed from alpar to deba

comment:5 Changed 10 years ago by alpar

Peter and Balazs,

Could you summarize here the conclusion of our private discussion on this topic?

comment:6 Changed 10 years ago by alpar

  • Priority changed from major to blocker

Changed 10 years ago by deba

Preliminary version of redesigned IO

comment:7 Changed 10 years ago by deba

I have uploaded a working version of redesigned IO.

Changed 10 years ago by alpar

The changesets of lgf_io.hg merged into a single one.

comment:8 Changed 10 years ago by alpar

For the sake of convenience, I merged the changesets of Balazs into a singe commit patch.

Changed 10 years ago by deba

comment:9 Changed 10 years ago by deba

I have uploaded a new bundle for lgf io. In my opinion it is close to the final version, but it should be tested and documented. On the writer side some runtime checking would be also fine.

comment:10 Changed 10 years ago by alpar

The changesets in attachment:lgf_io.bundle has been merged into a single changeset and it is now in the main branch, see [1c9a9e2f7d4d].

The undirected versions are still unimplemented.

comment:11 Changed 10 years ago by alpar

  • Version set to hg main

Changed 10 years ago by deba

comment:12 Changed 10 years ago by deba

I have just now uploaded a patch with the preliminary documentation of the DigraphReader?. Please fix it, improve it, advance it. If this documentation is finalized, the documentation of DigraphWriter? class could be made also.

Changed 10 years ago by deba

comment:13 Changed 10 years ago by alpar

Is there any reason/advantage to use a different column separation rule for the header line of @nodes/@edges sections and for their contents?

If we used the same, then

  • the code would be easier
  • the file format would be more homogeneous, therefore easier to describe and document,
  • it would be flexible (but still compatible)

than what we have now. For example, it would allow space in the column names. Currently, the reader must check if the column name is a single word, and all of the software working with column names given by the user (such as glemon) should also do so.

Changed 10 years ago by deba

comment:14 follow-up: Changed 10 years ago by deba

I uploaded a new patch with solution for Alpar issue (but I do not agree with him completely), and with the documentation of DigraphReader? and DigraphWriter?.

Changed 10 years ago by alpar

To be applied on the top of lgf_doc.patch

comment:15 in reply to: ↑ 14 ; follow-up: Changed 10 years ago by alpar

Replying to deba:

I moved the description of the lgf format into a separate page, and also made several improvements on this text, as well as on the doc of the DigraphReader and DigraphWriter.

See attachment:lgf-page.patch

comment:16 in reply to: ↑ 15 Changed 10 years ago by alpar

I merged attachment:digraph_reader_doc.2.patch, attachment:lgf_doc.patch and attachment:lgf-page.patch into a single changeset [e561aa7675de], and put it into the main branch.

Can we close this ticket?
Note that we have another related issue (#91).

comment:17 follow-up: Changed 10 years ago by deba

I mean, the undirected graph IO should be implemented, or a new ticket should opened for this. What do you think about its implementation, could we apply exhaustively the copy-paste, or do you have any other "good" idea?

comment:18 in reply to: ↑ 17 Changed 10 years ago by alpar

  • Resolution set to fixed
  • Status changed from new to closed

Replying to deba:

the undirected graph IO should be implemented, or a new ticket should opened for this.

I opened a new ticket for this undirected issue, see ticket:93.

Changed 10 years ago by deba

Changed 10 years ago by deba

comment:19 follow-up: Changed 10 years ago by deba

  • Resolution fixed deleted
  • Status changed from closed to reopened

I have uploaded to patches.
The first mainly solves some bugs:

  • Fixing function interface for lgf readers and writers
  • The characters under 0x20 should be escaped

The second patch contains a better demo file.

comment:20 in reply to: ↑ 19 Changed 10 years ago by alpar

  • Resolution set to fixed
  • Status changed from reopened to closed

Replying to deba:

I have uploaded to patches.
The first mainly solves some bugs:

  • Fixing function interface for lgf readers and writers
  • The characters under 0x20 should be escaped

The second patch contains a better demo file.

They went to the main branch, see [c82fd9568d75] and [00d297da491e].

Changed 10 years ago by deba

Changed 10 years ago by deba

comment:21 follow-up: Changed 10 years ago by deba

  • Resolution fixed deleted
  • Status changed from closed to reopened

I have uploaded two changsets:

comment:22 in reply to: ↑ 21 Changed 10 years ago by alpar

  • Resolution set to fixed
  • Status changed from reopened to closed

Replying to deba:

I have uploaded two changsets:

They went to the main branch.

  • [3d9971265677] makes some clarification in the usage of skipSection()

Probably you meant [c94a80f38d7f] here.

Changed 10 years ago by deba

comment:23 follow-up: Changed 10 years ago by deba

  • Resolution fixed deleted
  • Status changed from closed to reopened

The [9159de5e9657] solves that the label map is not necessary, and the map names line is necessary, if no map is read and there is no item in the item set.

comment:24 in reply to: ↑ 23 Changed 10 years ago by alpar

Replying to deba:

The [9159de5e9657] solves that the label map is not necessary, and the map names line is necessary, if no map is read and there is no item in the item set.

Accepted.

comment:25 Changed 10 years ago by alpar

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.