COIN-OR::LEMON - Graph Library

Opened 10 years ago

Closed 10 years ago

#167 closed task (fixed)

Port DIMACS readers

Reported by: Alpar Juttner Owned by: Alpar Juttner
Priority: major Milestone: LEMON 1.1 release
Component: core Version: hg main
Keywords: Cc:
Revision id:

Description

This ticket is a follow up of #55.

Attachments (4)

a6bc90e888ff.patch (4.2 KB) - added by Alpar Juttner 10 years ago.
Port of DIMACS stuff
c052e6718a95.patch (14.0 KB) - added by Alpar Juttner 10 years ago.
Rename the DIMACS reader functions
dimacs-50d96f2166d7-9d1faab5e0f1-24a2c6ee6cb0.bundle (6.8 KB) - added by Alpar Juttner 10 years ago.
Port + refactoring of the DIMACS stuff
dimacs_0a3ec097a76c.patch (3.3 KB) - added by Peter Kovacs 10 years ago.
Doc improvements

Download all attachments as: .zip

Change History (24)

comment:1 Changed 10 years ago by Alpar Juttner

Milestone: LEMON 1.1 release

Changed 10 years ago by Alpar Juttner

Attachment: a6bc90e888ff.patch added

Port of DIMACS stuff

Changed 10 years ago by Alpar Juttner

Attachment: c052e6718a95.patch added

Rename the DIMACS reader functions

comment:2 Changed 10 years ago by Alpar Juttner

Status: newassigned

Uploaded a basic port of the DIMACS reader plus the DIMACS->LGF converter, see [a6bc90e888ff], I found the overloaded readDimacs() function very confusing, thus I included the problem type in the function names (readDimacsMin(),readDimacsSp() etc.). I'm still not very satisfied with the API. The main problem is that we cannot detect the problem type in advance, thus e.g. you must give it as a command line option to the dimacs-to-lgf converter, though it could perfectly be autodetected. The possible solution is either

  • to develop an API for detecting the type. It is not clear how to do it while keep it single pass, or
  • re-implement the DIMACS parser in the converter script. Actually, it is not a big deal, the final code would probably be even shorter compared to the present one.

I'm also wondering if we really need the readDimacs*() stuff. Isn't it enough to provide the converter script only?

comment:3 in reply to:  2 ; Changed 10 years ago by Peter Kovacs

Replying to alpar:

I'm also wondering if we really need the readDimacs*() stuff. Isn't it enough to provide the converter script only?

I like these functions, and I use them a lot. A converter script cannot be used inside another program (without copy-pasting), however there are some natural use cases when it is important. E.g.

  • You would like to write a program that can directly handle DIMACS input.
  • You would like to work with really large DIMACS instances (e.g. created with a graph generator). Then using the converter script as a separate program requires twice as much time and/or twice as much space than using the functions.

comment:4 in reply to:  3 ; Changed 10 years ago by Alpar Juttner

Replying to kpeter:

Replying to alpar:

I'm also wondering if we really need the readDimacs*() stuff. Isn't it enough to provide the converter script only?

I like these functions, and I use them a lot. A converter script cannot be used inside another program (without copy-pasting), however there are some natural use cases when it is important. E.g.

  • You would like to write a program that can directly handle DIMACS input.
  • You would like to work with really large DIMACS instances (e.g. created with a graph generator). Then using the converter script as a separate program requires twice as much time and/or twice as much space than using the functions.

I'm aware that these are the (only) use-cases of the readDimacs*() tools. I'm just a bit concerned if they are real use-cases.

  • Firstly, why do you want to write a tool which reads DIMACS file instead of .lgf when the later one is simply more general and more robust?
  • Secondly, is the space issue is really a problem? Why not just compress those files, then? In fact, in most of the cases you do not even need to store the .lgf file, you can simply compose the tools with the pipe operator like this
    $ dimacs-to-lgf net.dim | ./myalg
    

comment:5 in reply to:  2 ; Changed 10 years ago by Peter Kovacs

Replying to alpar:

Uploaded a basic port of the DIMACS reader plus the DIMACS->LGF converter, see [a6bc90e888ff],

You exchanged the changesets: [c052e6718a95] is the port and [a6bc90e888ff] is the renaming. :)

I found the overloaded readDimacs() function very confusing, thus I included the problem type in the function names (readDimacsMin(),readDimacsSp() etc.).

I find it a good idea.

I'm still not very satisfied with the API. The main problem is that we cannot detect the problem type in advance,

Yes, it is true. But when you read a graph from an LGF file, than you encounter exactly the same problem: you usually have to know what maps and attributes can be found in the file. So I do not find this question really problematic.

thus e.g. you must give it as a command line option to the dimacs-to-lgf converter, though it could perfectly be autodetected.

However the script would be better with auto-detecting.

The possible solution is either

  • to develop an API for detecting the type. It is not clear how to do it while keep it single pass, or

What about introducing a simple class interface, e.g. something like that.

class DimacsReader {
public:
  enum DimacsType {
    MAX, SP, MIN, MAT, UNKNOWN
  }

  DimacsReader(istream &is = std::cin) : _is(is), _type(UNKNOWN) {}
  
  DimacsType readType() {...}
  
  void readMax(...) {...}
  void readSp(...) {...}
  void readMin(...) {...}
  void readMat(...) {...}

private:
  istream &_is;
  DimacsType _type;
  int _n, _m;
};

readType() would read the file until the line describing the problem (starting with p), set _type, _n, and _m, and return _type. After that according to this return value the appropriate readXyz() function can be called with giving the needed maps as parameters to them. (They would check if the _type variable meets their requirements.)

This solution would make the converter script much simpler and could be used separatley, too. Maybe we could also keep the functions but they would also use this class.

  • re-implement the DIMACS parser in the converter script. Actually, it is not a big deal, the final code would probably be even shorter compared to the present one.

It wouldn't be difficult to do, but the script wouldn't be so nice.

comment:6 in reply to:  5 Changed 10 years ago by Alpar Juttner

Replying to kpeter:

You exchanged the changesets: [c052e6718a95] is the port and [a6bc90e888ff] is the renaming. :)

Well, this is just an old trick to check if someone indeed have a look at these patches.

What about introducing a simple class interface, e.g. something like that.

I'm thinking on a simpler solution (in term of the required refactoring), a patch is coming soon.

comment:7 in reply to:  4 ; Changed 10 years ago by Peter Kovacs

Replying to alpar:

I'm aware that these are the (only) use-cases of the readDimacs*() tools. I'm just a bit concerned if they are real use-cases.

  • Firstly, why do you want to write a tool which reads DIMACS file instead of .lgf when the later one is simply more general and more robust?

Because (unfortunatelly) DIMACS format is more widely known and used than LGF. It is supported by various programs, but LGF is used only in connections with LEMON.

  • Secondly, is the space issue is really a problem? Why not just compress those files, then?

E.g. I used DIMACS files up to about 1 GB, which is about 300 MB in .gr.gz format, so space can be an issue, too.

In fact, in most of the cases you do not even need to store the .lgf file, you can simply compose the tools with the pipe operator like this

That's why I wrote: "twice as much time", because the graph have to be read and built twice in this case. (It could be a few minutes in case of really large instances.)

comment:8 in reply to:  7 Changed 10 years ago by Alpar Juttner

Replying to kpeter:

E.g. I used DIMACS files up to about 1 GB, [...] (It could be a few minutes in case of really large instances.)

How long would it exactly take to convert this file to .lgf, not considering the time of writing the output back to the disk? I mean something like this.

time dimacs-to-lgf -dunnothetype <bigfile.dim >/dev/null

or

time (dimacs-to-lgf -dunnothetype <bigfile.dim|cat >/dev/null)

(do these tests twice to make sure, that bigfile.dim is in the cache)

comment:9 Changed 10 years ago by Peter Kovacs

Here are the results for a DIMACS min. cost flow input file of 280 MB (n = 100,000; m = 11,000,000):

kpeter@lime:~> time dimacs-to-lgf -mcf <bigfile.dim >/dev/null

real    1m43.353s
user    1m38.174s
sys     0m5.200s
kpeter@lime:~> time dimacs-to-lgf -mcf <bigfile.dim >/dev/null

real    1m42.408s
user    1m38.134s
sys     0m4.284s
kpeter@lime:~> time (dimacs-to-lgf -mcf <bigfile.dim|cat >/dev/null)

real    2m24.584s
user    1m45.807s
sys     1m19.565s
kpeter@lime:~> time (dimacs-to-lgf -mcf <bigfile.dim|cat >/dev/null)

real    2m24.855s
user    1m48.103s
sys     1m15.125s

Note that there much larger DIMACS files, too. E.g. the USA road graph used in the 9th DIMACS implemenation challange is about 1 GB.

comment:10 in reply to:  9 Changed 10 years ago by Peter Kovacs

Replying to kpeter:

Note that there much larger DIMACS files, too. E.g. the USA road graph used in the 9th DIMACS implemenation challange is about 1 GB.

You can test it, if you would like to. You can find it in /mnt/sdb/testdata/www.dis.uniroma1.it/challenge9/data/USA-road-d/USA-road-d.USA.gr.gz on lime.cs.elte.hu.

Changed 10 years ago by Alpar Juttner

Port + refactoring of the DIMACS stuff

comment:11 Changed 10 years ago by Alpar Juttner

Milestone: LEMON 1.0 release

Please have a look and [attachmend:dimacs-50d96f2166d7-9d1faab5e0f1-24a2c6ee6cb0.bundle the attached bundle file]. Changesets [50d96f2166d7] and [9d1faab5e0f1] ports the tools and does the basic renaming, while [24a2c6ee6cb0] is a major refactoring of both the API and the command line tool.

comment:12 in reply to:  11 ; Changed 10 years ago by Peter Kovacs

Replying to alpar:

Please have a look and the attached bundle file.

It is great! This soulton is better than the one I suggested.

A question: why do you use the lineShift variable? It is set by dimacsType(), but it is not used anywhere. Is it important?

I also made some doc improvements. See [0a3ec097a76c].

Changed 10 years ago by Peter Kovacs

Attachment: dimacs_0a3ec097a76c.patch added

Doc improvements

comment:13 in reply to:  12 ; Changed 10 years ago by Alpar Juttner

Replying to kpeter:

Replying to alpar:

Please have a look and the attached bundle file.

It is great! This soulton is better than the one I suggested.

A question: why do you use the lineShift variable? It is set by dimacsType(), but it is not used anywhere. Is it important?

Not important. I could be used for providing line number info for the FormatError exceptions, if someone took the hassle to implement is.

Btw, the error handling is far from being rigorous currently. I wasn't very motivated to make it fool-proof, as DIMACS is not our primary graph descriptor format.

comment:14 in reply to:  12 ; Changed 10 years ago by Alpar Juttner

Replying to kpeter:

I also made some doc improvements. See [0a3ec097a76c].

When a doc module contains single file only, I prefer to keep the module description in the same file, as it makes the doc maintenance easier. Thus if you do not protest very strongly, I will take this change out of your path.

comment:15 in reply to:  13 Changed 10 years ago by Peter Kovacs

Replying to alpar:

Not important. I could be used for providing line number info for the FormatError exceptions, if someone took the hassle to implement is.

Okay. It can be kept or it can be removed, as you like it.

Btw, the error handling is far from being rigorous currently. I wasn't very motivated to make it fool-proof, as DIMACS is not our primary graph descriptor format.

I agree. It is not so important.

comment:16 in reply to:  14 ; Changed 10 years ago by Peter Kovacs

Replying to alpar:

When a doc module contains single file only, I prefer to keep the module description in the same file, as it makes the doc maintenance easier. Thus if you do not protest very strongly, I will take this change out of your path.

I prefer the groups.dox, since I like if they are kept together, e.g. it is much easier to look over or reorganize them. Moreover, which is much more important: placing the docs into separate files you do not have the ability to specify the order of the groups or subgroups. E.g. in [24a2c6ee6cb0] the DIMACS group follows the Nauty group, but in [0a3ec097a76c] DIMACS is the first.

comment:17 in reply to:  11 ; Changed 10 years ago by Peter Kovacs

Replying to alpar:

Btw. why do you move this ticket to the closed 1.0 milestone?

comment:18 in reply to:  17 Changed 10 years ago by Alpar Juttner

Milestone: LEMON 1.0 releaseLEMON 1.1 release

Replying to kpeter:

Replying to alpar:

Btw. why do you move this ticket to the closed 1.0 milestone?

By a mistake, of course. Did you suspected some another answer to your question?

comment:19 in reply to:  16 Changed 10 years ago by Alpar Juttner

Replying to kpeter:

Replying to alpar:

When a doc module contains single file only, I prefer to keep the module description in the same file, as it makes the doc maintenance easier. Thus if you do not protest very strongly, I will take this change out of your path.

I prefer the groups.dox, since I like if they are kept together, e.g. it is much easier to look over or reorganize them.

Why? We are talking about the leaf of the module tree, a leaf containing a single file.

Moreover, which is much more important: placing the docs into separate files you do not have the ability to specify the order of the groups or subgroups. E.g. in [24a2c6ee6cb0] the DIMACS group follows the Nauty group, but in [0a3ec097a76c] DIMACS is the first.

It is very important, indeed. :)

comment:20 Changed 10 years ago by Alpar Juttner

Resolution: fixed
Status: assignedclosed

The port went to the main branch.

Note: See TracTickets for help on using tickets.