COIN-OR::LEMON - Graph Library

Opened 9 years ago

Last modified 15 months ago

#224 new enhancement

Static graph maps

Reported by: alpar Owned by: deba
Priority: minor Milestone: LEMON 1.5 release
Component: core Version: hg main
Keywords: Cc:
Revision id:

Description (last modified by alpar)

Sometimes it would be nice to have a map which is not registered in the alteration notifier of the graph.

One reason might be running time efficiency, but the more important is multi-thread applications (see #223). Currently, it is not safe to run say two Dijkstra algorithms on the same graph in parallel, as they would allocate maps and register them into the alteration notifier simultaneously, which is not safe. Changing these map allocations to static one would solve this problem.

My suggestion is to use the standard maps with a special constructor for this purpose, e.g.

ListGraph::NodeMap<int> map1(g,STATIC);
ListGraph::NodeMap<int> map1(g,15,STATIC);

or

ListGraph::NodeMap<int> map1(STATIC,g);
ListGraph::NodeMap<int> map1(STATIC,g,15);

See also #223.

'

Attachments (2)

d5e4ef884f8f.patch (32.8 KB) - added by deba 8 years ago.
Static maps
765a16dc7c3a.patch (32.2 KB) - added by deba 8 years ago.
Once again static maps

Download all attachments as: .zip

Change History (31)

comment:1 in reply to: ↑ description ; follow-up: Changed 9 years ago by kpeter

Replying to alpar:

One reason might be running time efficiency,

I like this concept. An algorithm typically allocates one or more maps for the underlying graph, but it do not modify the graph, since it is usually a const parameter. In these cases we could use static maps.

My suggestion is to use the standard maps with a special constructor for this purpose

What about StaticNodeMap as an alternative?

comment:2 in reply to: ↑ 1 ; follow-up: Changed 9 years ago by alpar

My suggestion is to use the standard maps with a special constructor for this purpose

What about StaticNodeMap as an alternative?

The dynamic and the static maps are the same data structure, the only difference is that in the static case it is not registered in the alteration notifier. So I prefer using the special constructor.

It is also better because many algorithms set their template parameters to the standard NodeMap/EdgeMap by default. Using these algorithms with {StaticNodeMap would be an extra trouble.

comment:3 in reply to: ↑ 2 ; follow-up: Changed 9 years ago by kpeter

Replying to alpar:

My suggestion is to use the standard maps with a special constructor for this purpose

What about StaticNodeMap as an alternative?

The dynamic and the static maps are the same data structure, the only difference is that in the static case it is not registered in the alteration notifier. So I prefer using the special constructor.

I see. That's clear.

STATIC would be a global constant, such as INVALID?

comment:4 in reply to: ↑ 3 Changed 8 years ago by alpar

  • Priority changed from major to critical

STATIC would be a global constant, such as INVALID?

Yes.

comment:5 follow-up: Changed 8 years ago by alpar

Can we agree in this concept, then?

Could anyone familiar with the map internals estimate how difficult is to implement it?

comment:6 in reply to: ↑ 5 ; follow-up: Changed 8 years ago by deba

Replying to alpar:

Can we agree in this concept, then?

Could anyone familiar with the map internals estimate how difficult is to implement it?

In my opinion, the extra parameter is a good solution. However, the thread-safe map construction must be solved somehow (not just the static ones). If it was solved, then this concept would become useless (If we do not consider the runtime efficiency improvement, which might be very small).

The implementation is not so difficult, but it needs some refactoring. The most important thing, that the map must hold a pointer to the notifier, however it should not be attached.

I see only one problem with the concept, the detection and the chance of memory leaks could become greater. If there is a registered static array map and the graph is altered during its existence, then some items will not constructed or destructed in the static map. I think, if we make possible to use static maps, then we should provide some tools(for example debug macro) to detect such bugs.

comment:7 in reply to: ↑ 6 Changed 8 years ago by alpar

Replying to deba:

Could anyone familiar with the map internals estimate how difficult is to implement it?

In my opinion, the extra parameter is a good solution. However, the thread-safe map construction must be solved somehow (not just the static ones).

In theory, yes, it would be a nice feature.

My feeling however, that this simple static map idea covers virtually all practical use-cases.

I see only one problem with the concept, the detection and the chance of memory leaks could become greater. If there is a registered static array map and the graph is altered during its existence, then some items will not constructed or destructed in the static map.

I don't think it increases the risk significantly. If you alter a graph behind any existing map based data structure, the chance is high that it will be ruined without explicitly updating the maps, even if the maps are dynamic.

I think, if we make possible to use static maps, then we should provide some tools(for example debug macro) to detect such bugs.

comment:8 follow-up: Changed 8 years ago by deba

I have one more problem with this algorithm. Currently, most of the algorithm classes are able to run several times, and between the executions you can modify the graph. I feel, with this modification we lost this opportunity.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 8 years ago by alpar

Replying to deba:

I have one more problem with this algorithm. Currently, most of the algorithm classes are able to run several times, and between the executions you can modify the graph. I feel, with this modification we lost this opportunity.

Could you list some concrete tools where this might be a problem?

comment:10 in reply to: ↑ 9 ; follow-up: Changed 8 years ago by deba

Replying to alpar:

Replying to deba:

I have one more problem with this algorithm. Currently, most of the algorithm classes are able to run several times, and between the executions you can modify the graph. I feel, with this modification we lost this opportunity.

Could you list some concrete tools where this might be a problem?

For example in the price and repair algorithms. They start with a small sparse graph, and if the generated solution is not valid for the whole graph, then it is extended with the violating nodes, arcs or edges.

This idea can be used for shortest path, matching, constrained shortest path algorithms.

comment:11 in reply to: ↑ 10 ; follow-up: Changed 8 years ago by alpar

Replying to deba:

Replying to alpar:

Replying to deba:
Could you list some concrete tools where this might be a problem?

For example in the price and repair algorithms.

I mean which current tools of LEMON tool would loose flexibility because of using static maps?

I.e. do we indeed have tools that enable changing the graph between two executions, but rely only on automatic extension of maps with no manual initialization at the beginning of the execution?

comment:12 in reply to: ↑ 11 ; follow-up: Changed 8 years ago by deba

Replying to alpar:

Replying to deba:

Replying to alpar:

Replying to deba:
Could you list some concrete tools where this might be a problem?

For example in the price and repair algorithms.

I mean which current tools of LEMON tool would loose flexibility because of using static maps?

I.e. do we indeed have tools that enable changing the graph between two executions, but rely only on automatic extension of maps with no manual initialization at the beginning of the execution?

The initialization has two steps in most cases, the first step allocates memory(maps and other data) for the algorithm, the second step assigns initial values to the allocated maps. The memory allocations are made once per algorithm class instance, while the initial data setting is made per execution.

The problem is that the proposed issue would change the algorithms to use the static maps (the initial example of the issue).

Let see the following example:

  ListDigraph digraph;
  ListDigraph::ArcMap<double> length(digraph);
  buildInitialDigraph(digraph, length);

  Dijkstra<ListDigraph, ListDigraph::ArcMap<double> > dijkstra(digraph, length);
  while (true) {
    dijkstra.run(s);
    if (checkPricingAndExtendDigraph(digraph, length)) break;
}

This algorithm works now well, because the maps are extended. If the new approach would be introduced, then some workaround have to be find:

  • We should use separate dijkstra instances for every phases.
  • Every map used by the Dijkstra algorithm have to be specified directly.
  • Parameter for the algorithms, that the maps have to be static or not.

comment:13 in reply to: ↑ 12 ; follow-up: Changed 8 years ago by alpar

Replying to deba:

This algorithm works now well, because the maps are extended. If the new approach would be introduced, then some workaround have to be find:

  • We should use separate dijkstra instances for every phases.
  • Every map used by the Dijkstra algorithm have to be specified directly.
  • Parameter for the algorithms, that the maps have to be static or not.

What about the fourth (and most natural) solution?

  • Dijkstra<>::run() extends the map manually if need be.

comment:14 in reply to: ↑ 13 ; follow-up: Changed 8 years ago by deba

Replying to alpar:

Replying to deba:

This algorithm works now well, because the maps are extended. If the new approach would be introduced, then some workaround have to be find:

  • We should use separate dijkstra instances for every phases.
  • Every map used by the Dijkstra algorithm have to be specified directly.
  • Parameter for the algorithms, that the maps have to be static or not.

What about the fourth (and most natural) solution?

  • Dijkstra<>::run() extends the map manually if need be.

How does the Dijkstra know which items are added to the graph? It is not trivial question. The current map implementations assume that the maps are not changed without they are notified about it. Because the ArrayMaps constructs the map values as the items added to the graph, and it knows about each graph alteration, it can be kept consistent. But if we do not track which items are changed, then we have to find all the changes. I do not know better solution, than an extra bool flag for each value in the map. But it needs extra space and time. Is it possible to do it better?

comment:15 in reply to: ↑ 14 ; follow-up: Changed 8 years ago by alpar

Replying to deba:

How does the Dijkstra know which items are added to the graph? It is not trivial question. The current map implementations assume that the maps are not changed without they are notified about it. Because the ArrayMaps constructs the map values as the items added to the graph, and it knows about each graph alteration, it can be kept consistent. But if we do not track which items are changed, then we have to find all the changes. I do not know better solution, than an extra bool flag for each value in the map. But it needs extra space and time. Is it possible to do it better?

I think it is. It is enough to extend the storage array of the map to the highest ID and construct the elements of the interval of new values. I.e. a static map would always contain a continuous range of initialized elements. The destructors are only called when the whole map is deallocated. This approach can cause problem if

  • the constructor or the destructor of the items have side-effects or
  • the algorithm needs a freshly initialized (by the default constructor) objects for the newly added item.

I don't think either of them would be a real problem in case of the standard algorithms.

comment:16 in reply to: ↑ 15 ; follow-up: Changed 8 years ago by deba

  • Type changed from enhancement to defect
  • Version changed from hg main to other

I think it is. It is enough to extend the storage array of the map to the highest ID and construct the elements of the interval of new values. I.e. a static map would always contain a continuous range of initialized elements. The destructors are only called when the whole map is deallocated. This approach can cause problem if

  • the constructor or the destructor of the items have side-effects or
  • the algorithm needs a freshly initialized (by the default constructor) objects for the newly added item.

I don't think either of them would be a real problem in case of the standard algorithms.

You say that this is not a problem for standard algorithms, but I think the STATIC maps can be used also in user codes. So we cannot check only how they are used in standard algorithms, because the user can use them directly. The different behavior can be misleading.

I think, another approach would solve this problem cleaner and it could be used more generally, because the user should not care about the concurrent modifications.

comment:17 in reply to: ↑ 16 ; follow-up: Changed 8 years ago by alpar

Replying to deba:

I think it is. It is enough to extend the storage array of the map to the highest ID and construct the elements of the interval of new values. I.e. a static map would always contain a continuous range of initialized elements. The destructors are only called when the whole map is deallocated. This approach can cause problem if

  • the constructor or the destructor of the items have side-effects or
  • the algorithm needs a freshly initialized (by the default constructor) objects for the newly added item.

I don't think either of them would be a real problem in case of the standard algorithms.

You say that this is not a problem for standard algorithms, but I think the STATIC maps can be used also in user codes.

Of course yes, assuming they know the limitations.

So we cannot check only how they are used in standard algorithms, because the user can use them directly.

I don't understand what are you complaining about.

The different behavior can be misleading.

Everything can be misleading.

I think, another approach would solve this problem cleaner and it could be used more generally, because the user should not care about the concurrent modifications.

They are independent. A locking mechanism might be necessary sometimes, but in 99.9% of the cases it is a completely unnecessary extra hassle. Static maps cover this 99.9%, while #223 does so with the remaining 0.01%.

Btw. static maps has an additional benefit - using them in the algorithm can cause significant speedup in graph modifying operations if there are more algorithms preallocated to the graph.

comment:18 in reply to: ↑ 17 ; follow-up: Changed 8 years ago by deba

Replying to alpar:

They are independent. A locking mechanism might be necessary sometimes, but in 99.9% of the cases it is a completely unnecessary extra hassle. Static maps cover this 99.9%, while #223 does so with the remaining 0.01%.

Btw. static maps has an additional benefit - using them in the algorithm can cause significant speedup in graph modifying operations if there are more algorithms preallocated to the graph.

I think, the performance improvements of the static maps cannot be significant in any real applications. In addition, one solution have to be enough for the concurrent map addition problem, and I suggest to use only the locking mechanism. It has more advantages:

  • This works always when the static map concept works.
  • The user should not care about which map should be used in a place. In my opinion this is the more important reason to choose the locking approach. I think if a user want to write an algorithm for a singlethreaded program, then he may use normal maps (non static). If he want to reuse the algorithm in a multithreaded program, then he have to intrusively modify his old code.
  • I think, the implementation of lock based solution is easier also.

comment:19 in reply to: ↑ 18 Changed 8 years ago by alpar

  • Description modified (diff)

Replying to deba:

I think, the performance improvements of the static maps cannot be significant in any real applications.

They really can be. Just think of a graph which is used by some algorithms, say by just one Dfs, one Dijkstra, and one MinCostArborescense?`. This means we already have 11 nodemaps and 2 arcmaps attached to the graph, not counting the input of these algs. It means the same amount of virtual function calls at each node/arc addition/deletion.

In addition, one solution have to be enough for the concurrent map addition problem,

I try to argue that it isn't.

and I suggest to use only the locking mechanism. It has more advantages:

  • This works always when the static map concept works.
  • The user should not care about which map should be used in a place.

In fact, he should. If Dijsktra used static map, it was thread-safe out-of-box (because e.g. ListGraph?+static maps were thread-safe). With the locking approach, the default tools were not thread-safe by default, but you have to use a specialized version of the tools.

In my opinion this is the more important reason to choose the locking approach. I think if a user want to write an algorithm for a single threaded program, then he may use normal maps (non static). If he want to reuse the algorithm in a multithreaded program, then he have to intrusively modify his old code.

I consider the following use-case: the user wants to run 8 instances of Dijkstra on the same graph. She don't write any shared data thus don't want to care about locking.

  • I think, the implementation of lock based solution is easier also.

It depends if you count with the development and maintenance of the various lock implementations. If you do, than implementing the lock model is far more complex.

comment:20 follow-up: Changed 8 years ago by kpeter

I do like using static graph maps in standard algorithms, it is a natural requirement to have them thread-safe (using default options).

However I think some locking mechanism is really necessary even if we have static maps. While a thread is running an algorithm on a graph, it should lock the graph, i.e. prevent the changing of it (addig or deleting nodes and edges), or many problems can arise. Thus it would be guaranteed that the graph does not change during the running of the algorithm. Using static maps, they could be reallocated or extended in each execution of the algorithm. It wouldn't be a significant drawback, since they typically have to be initialized. In fact, standard maps (of internal value types) are std::vectors of appropriate size that are indexed by the IDs of the items. So static maps would only be a convenient way instead of doing this manually, e.g. we could write map[n] instead of map[gr.id(n)]. For the users we could write in the doc that static maps are safe only if the graph is not changed during the existance of the static map.

comment:21 in reply to: ↑ 20 Changed 8 years ago by alpar

  • Type changed from defect to enhancement
  • Version changed from other to hg main

Replying to kpeter:

However I think some locking mechanism is really necessary even if we have static maps. While a thread is running an algorithm on a graph, it should lock the graph, i.e. prevent the changing of it (addig or deleting nodes and edges), or many problems can arise.

I don't think we need this. "Thread safety" generally means "it is safe to use in parallel provided the data structure is not modified by other threads during the usage". I guess, your suggestions would imply a high performance penalty. That's the reason for this definition. So these problems are generally handled on a higher level.

The maps are a special as allocating a map is logically not a modification of the graph, but physically it is.

comment:22 follow-up: Changed 8 years ago by deba

Replying to kpeter:

However I think some locking mechanism is really necessary even if we have static maps. While a thread is running an algorithm on a graph, it should lock the graph, i.e. prevent the changing of it (addig or deleting nodes and edges), or many problems can arise.

The current discussion is about another locking mechanism. The approach, what you mention, can be solved easily in any threading library. You create a lock for the graph, or even a read-write lock, and you lock manually, when you want to run algorithm or modify the graph.

The current problem is in the backend, because we store lists for the maps assigned to a graph. Because the graph is not changed in an algorithm, therefore it is possible to create two Dijkstra algorithms or min cost flows and concurrently run them on the graph. But it is not true, because both instances creates new maps and they modify the same lists. The main problem is that we cannot use manual locking, because then we might have to lock the whole algorithms and run them sequentially. So now we would like to defend only this lists, and not the whole graph.

However, I have problem with static graph maps:

  • You have to write each algorithm to use this static map concept, if you want to create reusable code. (And of course we have to modify the existing ones.)
  • If you have to write each algorithm with STATIC maps, why isn't it the default setting. We could introduce rather the DYNAMIC static const variable.
  • In my opinion, it is quite easier to write bad code with this approach, if you write a big application using LEMON, it is error-prone to forget to add STATIC or map updates.
  • In my opinion, it is harder to debugging this approach.

comment:23 in reply to: ↑ 22 Changed 8 years ago by alpar

Replying to deba:

However, I have problem with static graph maps:

  • You have to write each algorithm to use this static map concept, if you want to create reusable code. (And of course we have to modify the existing ones.)

Not every algorithm, only those we want to make thread safe (and even in that case we might have the choice of depending on the future thread safety of dynamic maps).

The vast majority of the user codes does not belong to this category, even in LEMON lib itself it is superfluous or meaningless to ensure thread safety in many cases.

  • If you have to write each algorithm with STATIC maps,

No, you don't have to. See above.

why isn't it the default setting. We could introduce rather the DYNAMIC static const variable.

No we couldn't.

The short explanation is "because of backward compatibility".

The long one is that dynamic maps provide huge advantages when building up graph based datastructures at a very low costs (e.g. in terms of efficiency) in general. Therefore it is the right choice in user codes almost all the time.

  • In my opinion, it is quite easier to write bad code with this approach, if you write a big application using LEMON, it is error-prone to forget to add STATIC or map updates.

As I already indicated in a previous comment, I don't think this concern is valid.

The automatic update almost never ensures the correct update of the datastructures, it only does the memory allocations.

Similarly, we could say std::vector<> is dangerous because it does not extend automatically. No, it isn't, it just needs more attention. (well, in other words, we must live together with this danger in C++.)

Actually, I can go even further - static maps can be safer. Forgetting to update a map is a bug both using static or dynamic maps. If you use static map, valgrind will immediately warn about this error, while it can very easily remain hidden with dynamic maps.

  • In my opinion, it is harder to debugging this approach.

I can't imagine, why.

Changed 8 years ago by deba

Static maps

comment:24 follow-up: Changed 8 years ago by deba

I have uploaded an implementation of this approach, but I dislike the whole idea. If the static maps will be used in standard algorithms, then it will break the backward compatibility of LEMON.

comment:25 in reply to: ↑ 24 Changed 8 years ago by alpar

Replying to deba:

If the static maps will be used in standard algorithms, then it will break the backward compatibility of LEMON.

Could you elaborate this potential problem?

Changed 8 years ago by deba

Once again static maps

comment:26 Changed 8 years ago by kpeter

  • Milestone changed from LEMON 1.2 release to LEMON 1.3 release

comment:27 Changed 8 years ago by kpeter

We decided to postpone this task to be able to make a major release sooner.

comment:28 Changed 5 years ago by alpar

  • Milestone changed from LEMON 1.3 release to LEMON 1.4 release
  • Priority changed from critical to minor

I still think static maps may be useful in some cases, but the graph maps are already thread safe (see #223), thus it is of much less importance.

comment:29 Changed 15 months ago by alpar

  • Milestone changed from LEMON 1.4 release to LEMON 1.5 release
Note: See TracTickets for help on using tickets.