COIN-OR::LEMON - Graph Library

Opened 7 years ago

Last modified 8 months ago

#377 new enhancement

Smaller version of StaticDigraph

Reported by: kpeter Owned by: alpar
Priority: major Milestone:
Component: core Version: hg main
Keywords: Cc:
Revision id:

Description

The current implementation of StaticDigraph stores 2n+4m int values (for n nodes and m arcs). However, in many cases, one needs outgoing arc iteration only. For such cases, a static graph representation could be implemented with n+2m integers by fully omitting the outgoing and incomming arc lists (but arcs would be sorted by their source nodes).

The NodeIt, ArcIt and OutArcIt iterators could be implemented efficiently, but the basic iteration of the ougoing arcs would be somewhat slower (see #68 for antecedents). For ensuring the compatibility with the Digraph interface, an InArcIt iterator should also be implemented, but it would be really slow (as slow as full arc iteration).

What do you think? Would this data structure make sense?

Attachments (3)

smaller_static_graph.patch (3.2 KB) - added by ggouvin 9 months ago.
Patch for a slightly smaller StaticDigraph?: 2n + 3m instead of 2n + 4m
forward_graph.patch (15.4 KB) - added by ggouvin 8 months ago.
First version of the compact digraph
compact_graph.patch (15.3 KB) - added by ggouvin 8 months ago.
Removed the "last" member from Arc

Download all attachments as: .zip

Change History (15)

comment:1 in reply to: ↑ description Changed 7 years ago by alpar

Replying to kpeter:

The current implementation of StaticDigraph stores 2n+4m int values (for n nodes and m arcs). However, in many cases, one needs outgoing arc iteration only. For such cases, a static graph representation could be implemented with n+2m integers by fully omitting the outgoing and incoming arc lists (but arcs would be sorted by their source nodes).

It may even be possible to implement n+m, at the cost that Arc is two times bigger.

Namely, the arc's are ordered according to their source node id, then we only store their target nodes (m integers) plus the id of the first outgoing arc for each node (n integer). The Arc structure must store the arc id and the source node id.

For ensuring the compatibility with the Digraph interface, an InArcIt iterator should also be implemented, but it would be really slow (as slow as full arc iteration).

This is the easier (and probably more pragmatic) way. The other one is to introduce a new concept for out-iterable graphs.

Changed 9 months ago by ggouvin

Patch for a slightly smaller StaticDigraph?: 2n + 3m instead of 2n + 4m

comment:2 follow-up: Changed 9 months ago by ggouvin

Here is a first improvement that removes the node_first_out array from StaticDigraph?; I'm working on an n+m version as suggested by Alpar as well

comment:3 in reply to: ↑ 2 ; follow-up: Changed 9 months ago by kpeter

Replying to ggouvin:

Here is a first improvement that removes the node_first_out array from StaticDigraph; ...

This change has been proposed and discussed several years ago. Here you can find that patch:
68-4-a3de05c56e7e.patch

We finally decided not to apply it, because the additional array can make arc iteration faster e.g. when graph adaptors are applied on the top of StaticDigraph. See related comments here:
ticket:68#comment:5
ticket:68#comment:8

So I wouldn't apply this patch.

However, the n+m version is still interesting, but I would suggest to implement it as a separate Digraph class. E.g. we could call it CompactDigraph and its API doc should describe the differences compared to StaticDigraph (much more space-efficient, but only OutArcIt is fast, InArcIt is much slower).

comment:4 Changed 8 months ago by ggouvin

Here is a first working version of this Digraph implementation. I think it's almost OK, but I could improve a few points:

  • Arc is 3 ints; it could be 2
  • The documentation could be improved
Last edited 8 months ago by ggouvin (previous) (diff)

comment:5 in reply to: ↑ 3 ; follow-up: Changed 8 months ago by ggouvin

Replying to kpeter:

Replying to ggouvin:

Here is a first improvement that removes the node_first_out array from StaticDigraph; ...

This change has been proposed and discussed several years ago. Here you can find that patch:
68-4-a3de05c56e7e.patch

We finally decided not to apply it, because the additional array can make arc iteration faster e.g. when graph adaptors are applied on the top of StaticDigraph. See related comments here:
ticket:68#comment:5
ticket:68#comment:8

So I wouldn't apply this patch.

However, the n+m version is still interesting, but I would suggest to implement it as a separate Digraph class. E.g. we could call it CompactDigraph and its API doc should describe the differences compared to StaticDigraph (much more space-efficient, but only OutArcIt is fast, InArcIt is much slower).

Hi Péter. Thanks for the feedback; would you have comments on the ForwardDigraph? patch below (rather CompactDigraph??)? It still needs polishing.

ExtendedStaticDigraphBase::OutArcIt? is shadowed by StaticDigraph::OutArcIt?, and no code instantiates it directly. So firstOut/nextOut - and the additional array - are never used. But I understand the need to be careful here.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 8 months ago by kpeter

Replying to ggouvin:

Hi Péter. Thanks for the feedback; would you have comments on the ForwardDigraph patch below (rather CompactDigraph?)? It still needs polishing.

I had a short look at it. Basically, it seems to be OK, but at first it was strange that during out arc iteration of node n, node_first_out[n] is actually the last one, not the first. And the last field of Arc type is actually the "after the last one" according to out arc iteration order (so actually "before the first one" according to IDs). Maybe these aspects could be clarified, but I'm not sure whether ++ or -- direction would be better for out arc iteration.

ExtendedStaticDigraphBase::OutArcIt is shadowed by StaticDigraph::OutArcIt, and no code instantiates it directly. So firstOut/nextOut - and the additional array - are never used. But I understand the need to be careful here.

I don't think so. Could you have a look at the implementation of graph adaptors? E.g. SubDigraph class uses Parent::firstOut() and Parent::nextOut() in its firstOut() and nextOut() methods instead of the Parent graph's iterators:
https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/adaptors.h?rev=dceba191c00d#L452
Other adaptors (e.g. ReverseDigraph) and other core utilities also use these methods. E.g., the arc lookup tools rely on this helper class:
https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/core.h?rev=c199e9976d93#L1600

comment:7 in reply to: ↑ 6 Changed 8 months ago by ggouvin

Replying to kpeter:

Replying to ggouvin:

Hi Péter. Thanks for the feedback; would you have comments on the ForwardDigraph patch below (rather CompactDigraph?)? It still needs polishing.

I had a short look at it. Basically, it seems to be OK, but at first it was strange that during out arc iteration of node n, node_first_out[n] is actually the last one, not the first. And the last field of Arc type is actually the "after the last one" according to out arc iteration order (so actually "before the first one" according to IDs). Maybe these aspects could be clarified, but I'm not sure whether ++ or -- direction would be better for out arc iteration.

Right; I based it on ArcIt? rather than on StaticDigraph?'s OutArcIt?. I will change that to a more standard ++.

ExtendedStaticDigraphBase::OutArcIt is shadowed by StaticDigraph::OutArcIt, and no code instantiates it directly. So firstOut/nextOut - and the additional array - are never used. But I understand the need to be careful here.

I don't think so. Could you have a look at the implementation of graph adaptors? E.g. SubDigraph class uses Parent::firstOut() and Parent::nextOut() in its firstOut() and nextOut() methods instead of the Parent graph's iterators:
https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/adaptors.h?rev=dceba191c00d#L452
Other adaptors (e.g. ReverseDigraph) and other core utilities also use these methods. E.g., the arc lookup tools rely on this helper class:
https://lemon.cs.elte.hu/trac/lemon/browser/lemon/lemon/core.h?rev=c199e9976d93#L1600

Thanks for the insight, I didn't see this.
One possible way out would be to have first/nextOut and the like take something other than the Arc class (OutArcItBase??), which would be typedef to Arc in most cases but would contain "last" in Static and CompactDigraph?. But this looks less and less like a minor change.

comment:8 Changed 8 months ago by kpeter

Yes, indeed. So I wouldn't modify StaticDigraph: it focuses on running time performance (which is the primary preference in most cases).

The other implementation in turn focuses on memory requirements and provides fast iteration support only for the most typical use cases.

That's a clear trade-off, one can select according to her/his preference.

Changed 8 months ago by ggouvin

First version of the compact digraph

comment:9 Changed 8 months ago by ggouvin

I updated the patch so CompactDigraph::OutArcIt? uses increasing id; ArcIt? is kept consistent with StaticDigraph? as well (decreasing id)

comment:10 Changed 8 months ago by kpeter

I briefly checked the attached patch. Here are some comments.

Handling of the last field of Arc still seems to be problematic. According to the usage of its value (lines 87, 110, 114), e.last should always be equal to node_first_out[e.source+1], unless e.id==-1. However, some assignments do not correspond to this invariant. The change in iteration order of OutArcIt seems to be inconsistently updated, see lines 89, 98, 138, 216). Please review these parts of the code.

Anyway, this field is not necessary, recalculating its value would always be feasible. However, it would make out arc iteration slower. It would be interesting to compare the performance of an alternative implementation that does not store this field (on various graphs: small/large, sprase/dense).

Apart from that, please use LEMON_COMPACT_GRAPH_H instead of LEMON_FORWARD_GRAPH_H in lines 19 and 20.

Last edited 8 months ago by kpeter (previous) (diff)

Changed 8 months ago by ggouvin

Removed the "last" member from Arc

comment:11 Changed 8 months ago by ggouvin

Thank you! It's cleaner. I updated the patch and benchmarked both. I ran Dijkstra on random graphs of various densities - it seemed closer to real use-cases - with 100 to 10000 nodes.

I used both ArcMap?<double> and ArcMap?<int> - the latter to test if aliasing issues would prevent some optimizations. I built the graph, then ran many Dijkstra iterations under oprofile.

comment:12 Changed 8 months ago by kpeter

Thanks for the update. I agree: if the 'last' field does not cause significant speed-up, then it could be removed.

I would accept this version, the patch could be applied to the main branch.

Note: See TracTickets for help on using tickets.