COIN-OR::LEMON - Graph Library

Opened 9 years ago

Last modified 17 months ago

#146 assigned enhancement

Cheap copy of maps (reference counting) PHASE II.

Reported by: alpar Owned by: alpar
Priority: blocker Milestone: LEMON 1.5 release
Component: core Version:
Keywords: Cc:
Revision id:

Description

This is a follow-up of #137.

Change History (24)

comment:1 Changed 9 years ago by alpar

  • Priority changed from major to blocker

comment:2 Changed 9 years ago by alpar

  • Status changed from new to assigned

comment:3 follow-up: Changed 9 years ago by deba

I would like to choose the first option from Alpar's introduction in the ticket #137.

-Declare that this is a basically bad idea and therefore we will never introduce it. I would not be happy with a decision like this.

In my point of view, it does not increase substantially the usability of the library, and it is quite hard to implement it. On other way, the proposed solution is not quite neat, if the reference counting could be implemented similarly as the std::string (seamlessly), I would give more chance to support it.

comment:4 in reply to: ↑ 3 ; follow-up: Changed 9 years ago by alpar

Replying to deba:

In my point of view, it does not increase substantially the usability of the library,

Strange. As the time is going, my opposite opinion is just getting stronger.

and it is quite hard to implement it. On other way, the proposed solution is not quite neat,

It is neat and it is the default behavior of the data structures in many languages. Therefore it should be very natural for many people. e.g. for those can "speak" in Python.

if the reference counting could be implemented similarly as the std::string (seamlessly), I would give more chance to support it.

As you probably know, this is not an option. The implicit copy-on-write approach is suitable when the data is rarely written but frequently read. It is a natural choice in those case, but not in our. I can't recall any languages where this implicit copying would be default behavior when modifying a data structure.

So, let's discuss the 'cons' of the proposed approach.

  • Unless I'm mistaken, there isn't any significant drawback of this approach from the user point of view. Can you confirm this?
  • Therefore (for me) the only question is how difficult the implementation is. I still can't see any fundamental problem with it, but I might overlook something.

comment:5 follow-up: Changed 9 years ago by kpeter

I do not understand the concept exactly. If two maps have exactly the same type, I see how the reference counting should work. But what about assignments of maps of different types? How can you assign an implicit map constructed with one or more map adaptors (based on e.g. a funtor object) to a basic node/arc map with reference counting? There isn't any storage array for the former one. Maybe it is my fault, but I do not understand it. If reference counting would be used for some of the cases, it must be used for all cases. Am I right? Could you please tell me more details about these questions?

Moreover I do not see fundamental advantages. Could you please tell me examples? On the other hand I think it would be more difficult to understand this concept in a language that basically use another concept. I see that there are several languages in which it is natural, but I think it is quite strange in C++. E.g. I would surely be mixed by this.

comment:6 in reply to: ↑ 4 Changed 9 years ago by deba

Replying to alpar:

On the lemon-devel list you have written the following answer(06. 11. 2008. 12:20):

1) If we use cheap copy then in each algorithm we should use value data
members and value parameters.

Not exactly. This should only be an option.

Therefore, if we would like to make a heavy
weight map, then we have to make this class reference counted.

More exactly, if we want to use it in conjunction with an algorithm
which uses this feature and we want to avoid copying the full data of
the map, then yes, it must be reference counted.

What the first answer means, what can be an option? What is your concrete conception? How should an algorithm looks like, what are the parameters, the data members, etc... The heavy weight map can be mixed with light weight maps(both kind of maps can be used in lemon)? And the heavy weight(takes parameters by reference) and the light weight(takes parameters by value) algorithms can be mixed in lemon? What is the detailed idea?

comment:7 in reply to: ↑ 5 ; follow-ups: Changed 9 years ago by alpar

Replying to kpeter:

I do not understand the concept exactly. If two maps have exactly the same type, I see how the reference counting should work. But what about assignments of maps of different types?

Strange question. Are there more than one answer to it?

How can you assign an implicit map constructed with one or more map adaptors (based on e.g. a funtor object) to a basic node/arc map with reference counting? There isn't any storage array for the former one.

What is the problem of copying a data structure like this.
The title of the ticket says "cheap copy of maps". This is the main concept. By adopting this concept, the user can assume that map copying is a cheap operation whenever it makes sense. It is that simple.

  • For copying the graphs' own maps to themselves it means reference counting.
  • Copying a map adaptor to another one of the same type is a simple copying. There are map adaptors that allocate additional storages, then - in theory - they could also be reference counted, but - in practice - it makes no sense.
  • Copying any kind of map to a standard map means element-by-element copy, except for the case when the source map type is the same.

Maybe it is my fault, but I do not understand it. If reference counting would be used for some of the cases, it must be used for all cases. Am I right?

Not at all. Only when it makes sense.

Moreover I do not see fundamental advantages. Could you please tell me examples?

The main advantage is that by adopting this concept

  • a function can allocate a map then return it in an efficient way. More exactly, the callee can allocate a standard map, or even an adaptor, and return it. Then the caller can store it to a standard graph map independently of what the function allocated and returned, but it will as efficient as it can be: the elements will be copied only if the types are different.
  • a function can accept map parameters as values efficiently. The you can call a function like safely and efficiently
    • with a standard graph maps (it will effectively passed "by reference") or
    • with any kind of temporarily allocated maps. Again, the parameter passing will be cheap as far as it makes sense.

On the other hand I think it would be more difficult to understand this concept in a language that basically use another concept.

Come on! First, C++ doesn't "use" any kind of concept. It gives you the freedom to use both explicit and reference counting based memory model. And there are commonly used libraries based on both concepts.

Secondly, when you judge how difficult is to understand this feature, you should remember that we are using a language that forces its users to understand that for example

  Xyz x(12);
  return x;

is not always the same as

  return Xyz(12);

Comparing cheap copying concept to just this single example, I must say that this "understanding barrier" is negligible.

I see that there are several languages in which it is natural, but I think it is quite strange in C++.

Once again, there are much stranger built-in features in C++.
Anyway, the users are not affected by this feature until they read the related tutorial section explaining that it is possible to use the assignment operator to copy maps. At that place a single sentence (and probably an example) is enough to warn the users that maps of the same type will be copied by reference.

E.g. I would surely be mixed by this.

I can't believe that - once you read the above mentioned explanation in the tutorial - you would make even a single bug because of this feature.

comment:8 in reply to: ↑ 7 Changed 9 years ago by kpeter

Replying to alpar:

To tell the truth, I'm not convinced at all.

If reference counting would be used for some of the cases, it must be used for all cases. Am I right?

Not at all. Only when it makes sense.

That is the main problem with it. How can I figure out that an assignment was reference counted or it was performed by copying.
This is a critical question, since it is much more than just an implementation detail for efficiency reasons. E.g. if we called the following function with a digraph and a node map of some numerical value type, what would it print out? It depends on the type NodeMap?
It would print out "20 20" for NodeMap<int>, but what about NodeMap<unsigned>, NodeMap<short>, NodeMap<long>, NodeMap<double>? Would it print out "10 20" for all of them? Does reference counting make sense in these cases? What about the case when sizeof(int)==sizeof(long)?

template <typename Graph, typename NodeMap>
void function(Graph& g, NodeMap& m) {
  TEMPLATE_DIGRAPH_TYPEDEFS(Graph);
  IntNodeMap a;
  a = m;
  Node s = NodeIt(g);
  a[s] = 10;
  m[s] = 20;
  std::cout << a[s] << " " << m[s] << "\n";
}

If I modify a type from NodeMap<int> to NodeMap<long> somewhere in a huge program than it could happen that it will make something definitely different? Even if sizeof(long)==sizeof(int)? I simply cannot accept a concept like this.

Moreover I do not see fundamental advantages. Could you please tell me examples?

The main advantage is that by adopting this concept

  • a function can allocate a map then return it in an efficient way. More exactly, the callee can allocate a standard map, or even an adaptor, and return it. Then the caller can store it to a standard graph map independently of what the function allocated and returned, but it will as efficient as it can be: the elements will be copied only if the types are different.

I see these facts, but could you tell me relevant examples? E.g. an algorithm or another tool for which this concept could make possible to have better interface (better means more efficient and/or more convenient) along with simple practical use-cases where it is clear that this concept would be better.

If you would like to store any map returned somehow by an algorithm or another tool, you could only use the standard graph maps (map adaptors etc. cannot be used).
For this use-case you have an opportunity to create your map and pass it to the tool before you use in the current concept. Then this map will be used directly, which is surely the most efficient solution. Moreover the default map types are usually the ones you would like to use for storing, so you usually don't have to bother with template parameters or named template parameters even if you use a class interface. But if you do have to use another map type, you must specify it (with e.g. a named template parameter) using both concepts in order to be as efficient as you can.
So how could the cheap copy concept make these use-cases easier or more efficient?

Btw. note that copying a map is not an expensive operation compared to running an algorithm except for the few very fast ones, like graph search, Dijkstra or Kruskal. But we have efficient and convenient function-type interfaces for such algorithms, that cannot be made better by this concept, since their run() function could only return only one map by value. (Tell me if I'm wrong here.) In case of other algorithms the time required for copying a map is negligible compared to the running time. E.g. they typically create and use several internal maps.
So it is not a big deal for a user if a map is copied when a class interface is used in a non-optimal way, e.g.

Preflow<ListDigraph> pref(g, cap);
pref.run();
ArcMap<int> x = pref.flowMap();

instead of

Preflow<ListDigraph> pref(g, cap);
ArcMap<int> x;
pref.flowMap(x).run();

But if you read the corresponding part of the tutorial about using class interfaces only once, you will surely write the second code, and you will probably find it convenient. Or not? What's the matter with it?

And note again that for the function-type interfaces (e.g. bfs(), dfs(), dijkstra()) only the second solution could be used even if we accept the cheap copy concept, since the run() function cannot return a dist map and a pred map at once. Am I right?

comment:9 in reply to: ↑ 7 Changed 9 years ago by deba

Replying to alpar:

  • Copying any kind of map to a standard map means element-by-element copy, except for the case when the source map type is the same.

I think, it is a very bad idea. The mixing of the two behavior is quite confusing. With the c++ type structure you may do not know, that two types are the same or not.

The main advantage is that by adopting this concept

  • a function can allocate a map then return it in an efficient way. More exactly, the callee can allocate a standard map, or even an adaptor, and return it. Then the caller can store it to a standard graph map independently of what the function allocated and returned, but it will as efficient as it can be: the elements will be copied only if the types are different.

I think, it is not so easy. It will not have so nice interface or it will not be so efficient. If you think, that the map will be constructed, and after that you would like to assign the returned map, then the first construction and memory allocation are unnecessary. Other way, if you would like to call constructor to assign the returned map, then the graph object should be also specified, because not all the maps contain the graph object.

  • a function can accept map parameters as values efficiently. The you can call a function like safely and efficiently
    • with a standard graph maps (it will effectively passed "by reference") or
    • with any kind of temporarily allocated maps. Again, the parameter passing will be cheap as far as it makes sense.

But can we use heavy weight maps? Is it possible in your concept, isn't it?

On the other hand I think it would be more difficult to understand this concept in a language that basically use another concept.

Come on! First, C++ doesn't "use" any kind of concept. It gives you the freedom to use both explicit and reference counting based memory model. And there are commonly used libraries based on both concepts.

I have seen more reference counting based libraries, but they mostly used explicit notation. For example: glib::RefPtr?<T>, or boost::shared_ptr<T>.

comment:10 follow-up: Changed 9 years ago by deba

Can we re-enable the operator= based map copying in maps?

comment:11 in reply to: ↑ 10 ; follow-up: Changed 9 years ago by alpar

Replying to deba:

Can we re-enable the operator= based map copying in maps?

Yes, as soon as we decide on the operator= should do.

I went through the comments above. The only real issue is this:

I think, it is not so easy. It will not have so nice interface or it will not be so efficient. If you think, that the map will be constructed, and after that you would like to assign the returned map, then the first construction and memory allocation are unnecessary. Other way, if you would like to call constructor to assign the returned map, then the graph object should be also specified, because not all the maps contain the graph object.

All the other comments are invalid or self-answering. Can we do something with this?

comment:12 in reply to: ↑ 11 Changed 9 years ago by kpeter

Replying to alpar:

All the other comments are invalid or self-answering. Can we do something with this?

I definitely not agree. What about the main problem pointed by Balazs (and me)?

I think, it is a very bad idea. The mixing of the two behavior is quite confusing. With the c++ type structure you may do not know, that two types are the same or not.

I explained similar questions in the first part of my previous comment. I find it critical, but you never tried to answer it.

These questions are about "Would this concept be acceptable?" I don't think so. (And it seems Balazs doesn't think so, too.)

Moreover there is another important question: "Suppose that the concept is acceptable, and we could implement it. Why would it be much better?" This question is detailed in the second part of my previous comment. Please, try to answer in a way, that I could understand the real advantages of cheap copying. While there are obvious problems.

comment:13 Changed 9 years ago by alpar

  • Milestone changed from LEMON 1.1 release to LEMON 1.2 release

As far as I see we can't revolve this ticket in either way in the available time-frame until the deadline for the 1.1 release. Thus I postpone it to the next release milestone.

I must admit I didn't invest enough time for this ticket. Shame on me!

comment:14 Changed 9 years ago by kpeter

Although this question was postponed twice, it seems to proceed towards a decision automatically, since we already have some tools in the releases that could/should have different interface if we choose the cheap copy concept.

E.g. consider Preflow::minCut() that gives back a found minimum cut as a bool map.

  template<typename CutMap> 
  void minCutMap(CutMap &cutMap) const 

Accepting cheap copying, it should definitely look like this:

  CutMap minCutMap() const

It also applies to Circulation::barrierMap(), NetworkSimplex::flowMap(), NetworkSimplex::potentialMap() and other functions, like topologicalSort().

And what about similar functions having non-void return value? E.g. connectedComponents(), stronglyConnectedComponents() etc. Using cheap copy they would clearly be expected to return a map by value, however it would make complicated to retrieve the number of components.

So I think the interface of these tools (which have already been fixed) are contradictory to the entirely different approach of the cheap copy concept, let alone my previous problems.

What do you think about it? Am I right?

comment:15 follow-up: Changed 8 years ago by kpeter

Another reason: if cheap copy was accpeted, then it should also be applied to paths, I think. However paths are assigned by actual copying in the releases.

comment:16 in reply to: ↑ 15 ; follow-up: Changed 8 years ago by alpar

Replying to kpeter:

Another reason: if cheap copy was accpeted, then it should also be applied to paths, I think.

Obviously not. I can't imagine how did you come to this conclusion, beyond the fact that instead of understanding and analyzing the pros and cons of this concept, you simply want to collect as much points against it as you can, no matter if they are right or not. This leads nowhere.

I can just reiterate myself, the only real issue in this thread is from deba:

I think, it is not so easy. It will not have so nice interface or it will not be so efficient. If you think, that the map will be constructed, and after that you would like to assign the returned map, then the first construction and memory allocation are unnecessary. Other way, if you would like to call constructor to assign the returned map, then the graph object should be also specified, because not all the maps contain the graph object.

All the other comments are either invalid or self-answering. Can we do anything with this?

comment:17 in reply to: ↑ 16 Changed 8 years ago by kpeter

Replying to alpar:

Replying to kpeter:

Another reason: if cheap copy was accpeted, then it should also be applied to paths, I think.

Obviously not. I can't imagine how did you come to this conclusion,

I think it is a logical idea/consequence, though it is obviously not as important as for maps.

beyond the fact that instead of understanding and analyzing the pros and cons of this concept, you simply want to collect as much points against it as you can, no matter if they are right or not.

Definitely not!

I can just reiterate myself, the only real issue in this thread is from deba:
All the other comments are either invalid or self-answering. Can we do anything with this?

Why? You just stated it, but never have you justified it.

Maybe my last comment is a questionable or bad idea (though it semms to be logical), but consider the previous one. Using cheap copy, Preflow::minCutMap() (and all similar methods) should obviously return the map by value. That is why this concept is imagined! Otherwise it is unnecessary/useless. Could you answer this question? (It is neither invalid nor self-answering for me.)

comment:18 Changed 8 years ago by kpeter

The only thing we can conclude here without doubt is that this question is by far the most controversial and doubtful in the whole development so far. There are many detached and emotional reasons on both sides.

To tell the truth, I find both proposed concepts somewhat strange, especially because of their different meaning for maps of the same type and maps of different types. That's why I suggest another solution without supporting any kind of implicit copy. Let's introduce a mapCopy(from, to) function with actual copying and no assignment operators and copy constructors.

This solution can also be viewed as a compromise. Note that it allows to introduce any kind of copy concept later. Just like we have copyPath() although paths are copied implicitly on assignment and construction.

What do you think?

comment:19 Changed 8 years ago by kpeter

The more I'm thinking about this question, the more I find the above solution the best one. In fact, I prefer to extend it with the "standard" copy mechanism for maps of the same type (maybe in the future). Moreover, cheap copy could also be supported, but for separate map types, not for the standard ones, I think.

Note that we are not in a position any more to be able to exploit the advantages of cheap copy in the main tools of the library, since the tools that will be introduced later must have similar interfaces to the ones that have already been released. (As I stated this problem as a major issue above.)

comment:20 Changed 8 years ago by alpar

  • Milestone changed from LEMON 1.2 release to LEMON 1.3 release

comment:21 follow-up: Changed 4 years ago by deba

I would close this ticket as "Won't fix". There was no real agreement, that we should make this change in LEMON.

comment:22 in reply to: ↑ 21 Changed 4 years ago by alpar

  • Milestone changed from LEMON 1.3 release to LEMON 1.4 release

Replying to deba:

I would close this ticket as "Won't fix". There was no real agreement, that we should make this change in LEMON.

I basically agree. However we may want to deal with the ever-arising problem of map-copying. Thus I keep it open as a reference and postpone the release 1.4 instead.

comment:23 Changed 4 years ago by Aszarsha

I seems to me that what you really need here (interface and performance wise) is C++11 move-semantic.

comment:24 Changed 17 months ago by alpar

  • Milestone changed from LEMON 1.4 release to LEMON 1.5 release
Note: See TracTickets for help on using tickets.