COIN-OR::LEMON - Graph Library

Opened 8 years ago

Closed 2 weeks ago

#377 closed enhancement (done)

Smaller version of StaticDigraph

Reported by: Peter Kovacs Owned by: Alpar Juttner
Priority: major Milestone: LEMON 1.4 release
Component: core Version: hg main
Keywords: Cc:
Revision id:

Description

The current implementation of StaticDigraph stores 2n+4m int values (for n nodes and m arcs). However, in many cases, one needs outgoing arc iteration only. For such cases, a static graph representation could be implemented with n+2m integers by fully omitting the outgoing and incomming arc lists (but arcs would be sorted by their source nodes).

The NodeIt, ArcIt and OutArcIt iterators could be implemented efficiently, but the basic iteration of the ougoing arcs would be somewhat slower (see #68 for antecedents). For ensuring the compatibility with the Digraph interface, an InArcIt iterator should also be implemented, but it would be really slow (as slow as full arc iteration).

What do you think? Would this data structure make sense?

Attachments (3)

smaller_static_graph.patch (3.2 KB) - added by Gabriel Gouvine 21 months ago.
Patch for a slightly smaller StaticDigraph?: 2n + 3m instead of 2n + 4m
forward_graph.patch (15.4 KB) - added by Gabriel Gouvine 20 months ago.
First version of the compact digraph
compact_graph.patch (15.3 KB) - added by Gabriel Gouvine 20 months ago.
Removed the "last" member from Arc

Download all attachments as: .zip

Change History (17)

comment:1 in reply to:  description Changed 8 years ago by Alpar Juttner

Replying to kpeter:

The current implementation of StaticDigraph stores 2n+4m int values (for n nodes and m arcs). However, in many cases, one needs outgoing arc iteration only. For such cases, a static graph representation could be implemented with n+2m integers by fully omitting the outgoing and incoming arc lists (but arcs would be sorted by their source nodes).

It may even be possible to implement n+m, at the cost that Arc is two times bigger.

Namely, the arc's are ordered according to their source node id, then we only store their target nodes (m integers) plus the id of the first outgoing arc for each node (n integer). The Arc structure must store the arc id and the source node id.

For ensuring the compatibility with the Digraph interface, an InArcIt iterator should also be implemented, but it would be really slow (as slow as full arc iteration).

This is the easier (and probably more pragmatic) way. The other one is to introduce a new concept for out-iterable graphs.

Changed 21 months ago by Gabriel Gouvine

Attachment: smaller_static_graph.patch added

Patch for a slightly smaller StaticDigraph?: 2n + 3m instead of 2n + 4m

comment:2 Changed 21 months ago by Gabriel Gouvine

Here is a first improvement that removes the node_first_out array from StaticDigraph?; I'm working on an n+m version as suggested by Alpar as well

comment:3 in reply to:  2 ; Changed 21 months ago by Peter Kovacs

Replying to ggouvin:

Here is a first improvement that removes the node_first_out array from StaticDigraph; ...

This change has been proposed and discussed several years ago. Here you can find that patch: 68-4-a3de05c56e7e.patch

We finally decided not to apply it, because the additional array can make arc iteration faster e.g. when graph adaptors are applied on the top of StaticDigraph. See related comments here: ticket:68#comment:5 ticket:68#comment:8

So I wouldn't apply this patch.

However, the n+m version is still interesting, but I would suggest to implement it as a separate Digraph class. E.g. we could call it CompactDigraph and its API doc should describe the differences compared to StaticDigraph (much more space-efficient, but only OutArcIt is fast, InArcIt is much slower).

comment:4 Changed 20 months ago by Gabriel Gouvine

Here is a first working version of this Digraph implementation. I think it's almost OK, but I could improve a few points:

  • Arc is 3 ints; it could be 2
  • The documentation could be improved
Last edited 20 months ago by Gabriel Gouvine (previous) (diff)

comment:5 in reply to:  3 ; Changed 20 months ago by Gabriel Gouvine

Replying to kpeter:

Replying to ggouvin:

Here is a first improvement that removes the node_first_out array from StaticDigraph; ...

This change has been proposed and discussed several years ago. Here you can find that patch: 68-4-a3de05c56e7e.patch

We finally decided not to apply it, because the additional array can make arc iteration faster e.g. when graph adaptors are applied on the top of StaticDigraph. See related comments here: ticket:68#comment:5 ticket:68#comment:8

So I wouldn't apply this patch.

However, the n+m version is still interesting, but I would suggest to implement it as a separate Digraph class. E.g. we could call it CompactDigraph and its API doc should describe the differences compared to StaticDigraph (much more space-efficient, but only OutArcIt is fast, InArcIt is much slower).

Hi Péter. Thanks for the feedback; would you have comments on the ForwardDigraph? patch below (rather CompactDigraph??)? It still needs polishing.

ExtendedStaticDigraphBase::OutArcIt? is shadowed by StaticDigraph::OutArcIt?, and no code instantiates it directly. So firstOut/nextOut - and the additional array - are never used. But I understand the need to be careful here.

comment:6 in reply to:  5 ; Changed 20 months ago by Peter Kovacs

Replying to ggouvin:

Hi Péter. Thanks for the feedback; would you have comments on the ForwardDigraph patch below (rather CompactDigraph?)? It still needs polishing.

I had a short look at it. Basically, it seems to be OK, but at first it was strange that during out arc iteration of node n, node_first_out[n] is actually the last one, not the first. And the last field of Arc type is actually the "after the last one" according to out arc iteration order (so actually "before the first one" according to IDs). Maybe these aspects could be clarified, but I'm not sure whether ++ or -- direction would be better for out arc iteration.

ExtendedStaticDigraphBase::OutArcIt is shadowed by StaticDigraph::OutArcIt, and no code instantiates it directly. So firstOut/nextOut - and the additional array - are never used. But I understand the need to be careful here.

I don't think so. Could you have a look at the implementation of graph adaptors? E.g. SubDigraph class uses Parent::firstOut() and Parent::nextOut() in its firstOut() and nextOut() methods instead of the Parent graph's iterators: https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/adaptors.h?rev=dceba191c00d#L452 Other adaptors (e.g. ReverseDigraph) and other core utilities also use these methods. E.g., the arc lookup tools rely on this helper class: https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/core.h?rev=c199e9976d93#L1600

comment:7 in reply to:  6 Changed 20 months ago by Gabriel Gouvine

Replying to kpeter:

Replying to ggouvin:

Hi Péter. Thanks for the feedback; would you have comments on the ForwardDigraph patch below (rather CompactDigraph?)? It still needs polishing.

I had a short look at it. Basically, it seems to be OK, but at first it was strange that during out arc iteration of node n, node_first_out[n] is actually the last one, not the first. And the last field of Arc type is actually the "after the last one" according to out arc iteration order (so actually "before the first one" according to IDs). Maybe these aspects could be clarified, but I'm not sure whether ++ or -- direction would be better for out arc iteration.

Right; I based it on ArcIt? rather than on StaticDigraph?'s OutArcIt?. I will change that to a more standard ++.

ExtendedStaticDigraphBase::OutArcIt is shadowed by StaticDigraph::OutArcIt, and no code instantiates it directly. So firstOut/nextOut - and the additional array - are never used. But I understand the need to be careful here.

I don't think so. Could you have a look at the implementation of graph adaptors? E.g. SubDigraph class uses Parent::firstOut() and Parent::nextOut() in its firstOut() and nextOut() methods instead of the Parent graph's iterators: https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/adaptors.h?rev=dceba191c00d#L452 Other adaptors (e.g. ReverseDigraph) and other core utilities also use these methods. E.g., the arc lookup tools rely on this helper class: https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/core.h?rev=c199e9976d93#L1600

Thanks for the insight, I didn't see this. One possible way out would be to have first/nextOut and the like take something other than the Arc class (OutArcItBase??), which would be typedef to Arc in most cases but would contain "last" in Static and CompactDigraph?. But this looks less and less like a minor change.

comment:8 Changed 20 months ago by Peter Kovacs

Yes, indeed. So I wouldn't modify StaticDigraph: it focuses on running time performance (which is the primary preference in most cases).

The other implementation in turn focuses on memory requirements and provides fast iteration support only for the most typical use cases.

That's a clear trade-off, one can select according to her/his preference.

Changed 20 months ago by Gabriel Gouvine

Attachment: forward_graph.patch added

First version of the compact digraph

comment:9 Changed 20 months ago by Gabriel Gouvine

I updated the patch so CompactDigraph::OutArcIt? uses increasing id; ArcIt? is kept consistent with StaticDigraph? as well (decreasing id)

comment:10 Changed 20 months ago by Peter Kovacs

I briefly checked the attached patch. Here are some comments.

Handling of the last field of Arc still seems to be problematic. According to the usage of its value (lines 87, 110, 114), e.last should always be equal to node_first_out[e.source+1], unless e.id==-1. However, some assignments do not correspond to this invariant. The change in iteration order of OutArcIt seems to be inconsistently updated, see lines 89, 98, 138, 216). Please review these parts of the code.

Anyway, this field is not necessary, recalculating its value would always be feasible. However, it would make out arc iteration slower. It would be interesting to compare the performance of an alternative implementation that does not store this field (on various graphs: small/large, sprase/dense).

Apart from that, please use LEMON_COMPACT_GRAPH_H instead of LEMON_FORWARD_GRAPH_H in lines 19 and 20.

Last edited 20 months ago by Peter Kovacs (previous) (diff)

Changed 20 months ago by Gabriel Gouvine

Attachment: compact_graph.patch added

Removed the "last" member from Arc

comment:11 Changed 20 months ago by Gabriel Gouvine

Thank you! It's cleaner. I updated the patch and benchmarked both. I ran Dijkstra on random graphs of various densities - it seemed closer to real use-cases - with 100 to 10000 nodes.

I used both ArcMap?<double> and ArcMap?<int> - the latter to test if aliasing issues would prevent some optimizations. I built the graph, then ran many Dijkstra iterations under oprofile.

comment:12 Changed 19 months ago by Peter Kovacs

Thanks for the update. I agree: if the 'last' field does not cause significant speed-up, then it could be removed.

I would accept this version, the patch could be applied to the main branch.

comment:13 Changed 2 weeks ago by Peter Kovacs

Milestone: LEMON 1.4 release

comment:14 Changed 2 weeks ago by Alpar Juttner

Resolution: done
Status: newclosed

The patch compact_graph.patch has been rebased as [73bd8d5200df] and pushed to the main branch.

Thanks for the contribution.

Note: See TracTickets for help on using tickets.