Context Navigation

← Previous Ticket
Next Ticket →

#3 assigned enhancement

ListGraph should store/update the number of edges and nodes

Reported by:	Alpar Juttner	Owned by:	Peter Kovacs
Priority:	major	Milestone:	LEMON 1.4 release
Component:	core	Version:	hg main
Keywords:	ListGraph	Cc:
Revision id:

Description (last modified by Peter Kovacs)

Counting the number of edges and nodes is a frequent graph operation. Now, it takes linear time to get these values. Instead we might store these values and update them at every edge/node addition and deletion.

Of course this change would slow down these operations a bit.

This ticket is a copy of report #8 of the old bug tracking system. It refers to svn trunk -r2460.

Attachments (2)

smart_itemcount_a7e8ad460d66.patch (794 bytes) - added by Peter Kovacs 16 years ago.
3fe3c838a89e.patch (3.2 KB) - added by Peter Kovacs 16 years ago.

Download all attachments as: .zip

Change History (26)

comment:1 Changed 17 years ago by Peter Kovacs

Description:	modified (diff)

comment:2 Changed 17 years ago by Alpar Juttner

Milestone:	→ Post 1.0

comment:3 in reply to: description ; follow-up: 4 Changed 16 years ago by Peter Kovacs

Owner:	changed from Alpar Juttner to Peter Kovacs
Revision id:	→ 3fb8ed1322de
Status:	new → assigned

Replying to alpar:

Counting the number of edges and nodes is a frequent graph operation. Now, it takes linear time to get these values. Instead we might store these values and update them at every edge/node addition and deletion.

It is still relevant in [3fb8ed1322de]. Moreover SmartGraph also needs such improvement, since it doesn't have the needed tags and functions, although I think it would be easy to implement.

comment:4 in reply to: 3 ; follow-up: 5 Changed 16 years ago by Alpar Juttner

Replying to kpeter:

It is still relevant in [3fb8ed1322de]. Moreover SmartGraph also needs such improvement, since it doesn't have the needed tags and functions, although I think it would be easy to implement.

For SmartGraph we definitely want O(1) node/edge counting.

For ListGraph it is a question. It increases a bit the memory footprint (well, the increment is negligible), and makes the edge addition/deletion a bit slower. I cannot tell how big this effect would be. I think we can wait with this until we have a reliable benchmarking system.

comment:5 in reply to: 4 ; follow-ups: 6 10 Changed 16 years ago by Peter Kovacs

Replying to alpar:

For SmartGraph we definitely want O(1) node/edge counting.

I know, and I thought that it is true for both SmartGraph and SmartDigraph, but for the former one it isn't implemented yet.

For ListGraph it is a question.

I think it would be much better. Item counting operations are used really frequently (even by algorithms). And since alterations are (relatively) slow operations (modify the containers, indices, notifying all observers etc.) compared to a single assignment of type int, so I think this change would worth to make.

For example, when I implemented algorithms I queried the number of nodes and edges/arcs only once and stored it just because I would like to be efficient even for List(Di)Graph. It is not convenient.

comment:6 in reply to: 5 ; follow-up: 7 Changed 16 years ago by Alpar Juttner

Replying to kpeter:

For example, when I implemented algorithms I queried the number of nodes and edges/arcs only once and stored it just because I would like to be efficient even for List(Di)Graph. It is not convenient.

Yes, I know this. The question is if we want to require that these queries are O(1) for _all_ graphs?

If yes, then these query functions could/should be the members of the graph classes.
If not, then a generic algorithm cannot depend on this property.

comment:7 in reply to: 6 ; follow-up: 8 Changed 16 years ago by Peter Kovacs

Replying to alpar:

Yes, I know this. The question is if we want to require that these queries are O(1) for _all_ graphs?

Definitely not. Think of the various adaptors implemented in the 0.x series (and we want to have them in the 1.x series as well). E.g. in the case of sub graph adaptors and ResGraphAdaptor I think it could not be implemented (or would be very difficult and/or inefficient). Although these adaptors are very important.

With this observation I aswered my own question: we should not call count functions many times in a generic algorithm, but not because of List(Di)Graph, even bacause of adaptors. :)

However I think it would be important to have O(1) count functions in all graph implementations that are not adaptors. For example, in STL all containers are expected to have an O(1) size() function, even lists, sets etc., since it is an important and frequent operation.

If yes, then these query functions could/should be the members of the graph classes.

Think of adaptors again.

If not, then a generic algorithm cannot depend on this property.

Yes, it is true.

comment:8 in reply to: 7 ; follow-up: 9 Changed 16 years ago by Alpar Juttner

Replying to kpeter:

For example, in STL all containers are expected to have an O(1) size() function, even lists, sets etc., since it is an important and frequent operation.

I don't know what do you mean by "expected", but std::list::size() are recommended but not required to be O(1) by the standard.

comment:9 in reply to: 8 Changed 16 years ago by Peter Kovacs

Replying to alpar:

I don't know what do you mean by "expected", but std::list::size() are recommended but not required to be O(1) by the standard.

I didn't know that, I just expected what is recommended.

However it is exactly the same concept that I would like to follow in LEMON: O(1) item counting is not required for all graph types (e.g. adaptors won't support it), but it is strongly recommended for all basic graph implementations. They are much more heavy-weight structures than e.g. a single std::list, thus the relative overhead would be much smaller.

Changed 16 years ago by Peter Kovacs

Attachment:	smart_itemcount_a7e8ad460d66.patch added

comment:10 in reply to: 5 Changed 16 years ago by Peter Kovacs

Replying to kpeter:

I know, and I thought that it is true for both SmartGraph and SmartDigraph, but for the former one it isn't implemented yet.

[a7e8ad460d66] adds missing tags and functions for item counting in SmartGraph.

comment:11 follow-up: 13 Changed 16 years ago by Peter Kovacs

[3fe3c838a89e] adds support for constant time item counting in ListDigraph and ListGraph.

Changed 16 years ago by Peter Kovacs

Attachment:	3fe3c838a89e.patch added

comment:12 Changed 16 years ago by Peter Kovacs

Version:	svn trunk → hg main

comment:13 in reply to: 11 ; follow-up: 15 Changed 16 years ago by Alpar Juttner

Milestone:	LEMON 1.1 release

Replying to kpeter:

[3fe3c838a89e] adds support for constant time item counting in ListDigraph and ListGraph.

I would like to see a detailed running time evaluation, before putting this patch into the main branch.

The question is important, but in my opinion, it is not of the highest priority. Therefore I remove it from the 1.1 milestone. (Which does not mean that this change cannot get into the 1.1 release. If a satisfactory running time analyses will have been done before the release, we will include this path into the release.)

comment:14 Changed 15 years ago by Alpar Juttner

Additional benchmarking results can be found at #309.

comment:15 in reply to: 13 ; follow-up: 18 Changed 15 years ago by Peter Kovacs

Replying to alpar:

I would like to see a detailed running time evaluation, before putting this patch into the main branch.

What kind of "detailed running time evaluation" would you like to see?

Isn't it clear enough (with virtually no tests) that a simple ++ or -- operation for an int variable cannot make a noticeable difference in the running time of such functions that contain 8-10 assignments and 2-3 ifs, each of which performs at least one operator[] for a vector? And keep in mind, that if there were node/arc maps (that's the typical use case), then the relative difference would be even smaller! I think it is clear. However I made some tests that validated this assumption. Of course, many other tests could be performed, but that could be an endless story. Why don't we make tests in which countNodes() and/or countArcs() is also used?

If constant time item counting is expected for simple data structures like std::list, then it seems to be more than natural to expect this for List(Di)graph as well, since the relative overhead must be much smaller.

I suggest to focus on the huge advantage of this modification rather than this captious discussion about its negligible overhead.

comment:16 follow-up: 17 Changed 15 years ago by Peter Kovacs

Milestone:	→ LEMON 1.2 release
Revision id:	3fb8ed1322de

comment:17 in reply to: 16 Changed 15 years ago by Alpar Juttner

Milestone:	LEMON 1.2 release → LEMON 1.3 release

Replying to kpeter:

comment:18 in reply to: 15 ; follow-up: 19 Changed 14 years ago by Peter Kovacs

I made some tests again and I cannot detect measureable overhead of this proposal (which is quite obvious indeed). Therefore, I see no reason to refuse or postpone it again. I can only repeat my previous reasoning.

comment:19 in reply to: 18 ; follow-up: 20 Changed 14 years ago by Alpar Juttner

Replying to kpeter:

I made some tests again and I cannot detect measureable overhead of this proposal (which is quite obvious indeed). Therefore, I see no reason to refuse or postpone it again. I can only repeat my previous reasoning.

This, and all other tickets possibly affecting the performance of the core tools of LEMON are awaiting to a satisfactory solution to #33.

comment:20 in reply to: 19 ; follow-up: 21 Changed 14 years ago by Peter Kovacs

Replying to alpar:

This, and all other tickets possibly affecting the performance of the core tools of LEMON are awaiting to a satisfactory solution to #33.

That's what I would like to confirm: this modification does not cause a measurable overhead. It is so obvious, but I validated it in various tests. So it is not performance issue! It is much more about natural requirements/expectations.

I see your reasons about pushing #33, but to tell the truth, I don't think that a comprehensive and satisfactory benchmark system will be realized in a reasonable time (e.g. a few years). I do find it a cumbersome policy to strictly bound this simple but important decision to such a vague task.

comment:21 in reply to: 20 ; follow-up: 22 Changed 14 years ago by Alpar Juttner

Replying to kpeter:

I see your reasons about pushing #33, but to tell the truth, I don't think that a comprehensive and satisfactory benchmark system will be realized in a reasonable time (e.g. a few years). I do find it a cumbersome policy to strictly bound this simple but important decision to such a vague task.

Short answer:

Then, we must wait a few years to have this change accepted.

Long answer:

Firstly, it requires no more than a week of work of a single person to develop the core of an appropriate benchmarking tool. Forecasting that it won't be done within a few years is not only stupid, but also counterproductive, demoralizing and gives an overall bad reputation to the project.

Secondly, this kind of decision should not be based on personal trust, but on well documented, reproducible and reliable tests. None of the tests I can see here in the ticket are either well documented or reproducible, therefore none of them are reliable.

I admit in case of this ticket the situation might be pretty clear, therefore the question is matter of principle in some sense. BUT the question is whether or not we want opt for high standards in our developing process. If yes, it means we must insist on them in all cases and can't make random exceptions.

comment:22 in reply to: 21 Changed 14 years ago by Peter Kovacs

Replying to alpar:

Short answer:

Then, we must wait a few years to have this change accepted.

Long answer:

Firstly, it requires no more than a week of work of a single person to develop the core of an appropriate benchmarking tool. Forecasting that it won't be done within a few years is not only stupid, but also counterproductive, demoralizing and gives an overall bad reputation to the project.

I must refuse your rude words (stupid, etc.).

In the last 2-3 years, it was me who made tha majority of benchmark test around LEMON (see e.g. http://lime.cs.elte.hu/~kpeter/hg/hgwebdir.cgi/lemon-benchmark/) So I have some experiances about it. And I must say that it is not an easy and agreeable work. The only sure thing is that however hard you try to make correct benchmarks, they can still be criticized.
Therefore, I find it extremely difficult to build a benchmark system you are talking about in #33, which could be accepted by the LEMON developers as comprehensive and correct in _all_ aspect to rely on about design decisions. Not to mention the actual testing on various computers using different compilers. It is much more than a week.
We have plenty of tasks that are much more important (e.g. porting, multi-thread support, etc.).
LEMON currently has only a few stable developers. That is why all releases were delayed. Unfortunatelly, I'm sure that I won't be able to realize #33, and I will have much less time for core development in the future. Unless you find a student directly for this task, I'm uncertain about who will do this.

Secondly, this kind of decision should not be based on personal trust, but on well documented, reproducible and reliable tests. None of the tests I can see here in the ticket are either well documented or reproducible, therefore none of them are reliable.

The tests are reproductible, they can be found in this repository:
http://lime.cs.elte.hu/~kpeter/hg/hgwebdir.cgi/lemon-benchmark/

However, I regard this ticket as a proposal of fixing a bad design decision. If it is widely expected for a simple linked list structure that it knows its length, then it is more than a natural requirement for Lis(Di)graph. As I mentioned several times, it is NOT a perforamnce issue!

I admit in case of this ticket the situation might be pretty clear, therefore the question is matter of principle in some sense.

That's what I say...

BUT the question is whether or not we want opt for high standards in our developing process. If yes, it means we must insist on them in all cases and can't make random exceptions.

BUT we mustn't make such an inexplicably huge argument about the imaginary performance overhead of a single increment or decrement opertaion vs several vector access operations and a few if statements (not to mention the typical case of also maintaining maps) and totally forgeting the conceptual reasons and the dominant advantage and of this proposal. It's nonsense. I give it up.

comment:23 Changed 14 years ago by Peter Kovacs

I'm sorry for my vehement comments. I hope I was wrong, and a "satisfactory solution to #33" will be realized and the necessary experiments will be conducted in the near future. Especially, if they are not required to be comprehensive. I just thought that they are.

comment:24 Changed 12 years ago by Alpar Juttner

Milestone:	LEMON 1.3 release → LEMON 1.4 release

Note: See TracTickets for help on using tickets.

Download in other formats: