[Lemon-user] bug in ListDigraph::erase(Node &) ?

Wed Mar 30 07:55:15 CEST 2011

Hi Alpar,

On Wed, Mar 30, 2011 at 07:09:56AM +0200, Alp�r J�ttner wrote:
> > 	I think there is a bug in list_graph.h's erase(Node) function, which
> > in my version is this:
> 
> Thank you for the observation. I believe it is not a bug (see below),

You're absolutely right. I did find that my code was attempting to delete
the same node from a graph twice. (And it does look like the only way an
entry in the node list could get -2 is if it is erased once.)

> but the rational behind is certainly not described anywhere.
> 
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >     void erase(const Node& node) {
> >       int n = node.id;
> >       
> >       if(nodes[n].next != -1) {
> >         nodes[nodes[n].next].prev = nodes[n].prev;
> >       }
> >        
> >       if(nodes[n].prev != -1) {
> >         nodes[nodes[n].prev].next = nodes[n].next;
> >       } else {
> >         first_node = nodes[n].next;
> >       }
> >        
> >       nodes[n].next = first_free_node;
> >       first_free_node = n;
> >       nodes[n].prev = -2;
> >     
> >     }
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > 	Suppose nodes[n].prev is -2. Then the check against 1 will succeed,
> > and nodes[-2].next will be accessed, which will _usually_ not lead to a
> > crash, just silent corruption.
> 
> nodes[n].prev==-1 means n is the first node in the list. 
> nodes[n].prev==-2 means n has been deleted.
> 
> The point is that you are oly allowed to delete _existing_ items. If you
> want to delete a node with prev==-2 then your code is certainly buggy.
> We may want to try to hide the bug in your code for some time, but this
> will not solve the problem.

I agree that not wanting to support multiple calls to erase() is fine,
and that my program is at fault. Actually, I had code which was
effectively doing this:

	if(n != INVALID)
	  G.erase(n)

which "looks right" but is completely wrong. A node remains perfectly
valid after being removed from a graph.

> You may ask, why is prev set to -2 if it is never accessed by a valid
> code. The reason is to make it possible to write language bindings.
> Assume you write a Python interface to LEMON. You cannot prevent your
> user from writing a Python code that double deletes an item, but you
> still have to keep the memory in a consistent state so that the Python
> interpreter itself will not crash. Thus you have to check if an item is
> valid with this function:
> 
>     bool ListDigraph::valid(Node n) const {
>       return n.id >= 0 && n.id < static_cast<int>(nodes.size()) &&
>         nodes[n.id].prev != -2;
>     }
> 
> Nota bene, this will still _not_ safely localize double erase of a node.
> If you erase a node, then add new ones, then the id of an erased node
> may become valid again. If you erase again the old one later, it will be
> unnoticed, but you will delete a different node that you intended.

You're right. One correct (though impractical) workaround is to iterate
over all the graph's nodes, and erase the one desired, if found.
Another workaround would be for list_graph to maintain a set (e.g.
hash) of nodes/arcs it contains, so querying the presence of these
objects becomes rapid.

Another workaround is to just "leave it up to the user", but this kind
of functionality seems quite natural to me, so I imagine a lot of
people would be effectively adding their own "does this graph contain
this object" logic to their programs. I wonder if it would make sense
to extend the Lemon graph types to include support for this kind of
query, which could either be left out, or implemented with no space
overhead (but potentially slow) or with some space overhead (but very
fast.)

> But, the graph data structure will at least remain in a consistent
> state.
> 
> > 	If the intended interpretation of "x != -1" here is "x is valid",
> 
> Not exactly. x==-1 means "x in INVALID". (where INVALID is just a node
> constant, thus n!=INVALID does not mean n is valid). Validity of a node
> is defined a bit differently (see above).

Right, I understand.

> > maybe it would make sense to change all these comparisons to "x >= 0" ?
> 
> I think it would just hide the bugs in the user code even more.

I agree, that suggestion does nothing to help. However:

> For example, when you double delete an item, Valgind (http://valgrind.org/
> , the most useful debugging tool for C/C++) will point out the array
> underindexing with the current code, but not with your modification.

Not always! In fact, most of the time Valgrind won't say anything,
because so long as array[-2] is still inside valid memory (which it
might tend to be, statistically speaking), Valgrind doesn't report
anything. (I've just observed exactly this behaviour in my code.)

> I think it would really make more sense (and more work) to check --- in
> DEBUG mode only --- the validity of the items and ASSERT if they aren't
> valid. In fact, this could be done at each and every graph operation.

I think that's a _great_ idea. I would certainly use such a feature
constantly during development. But as I mentioned above, wanting to be
able to repeatedly call G.erase(node) without corrupting memory is
really just something that becomes trivial to implement (for someone
like me, who does actually want to do that) if you already have a way
to query a graph for the presence of a given node/arc. Of course people
can go and implement this themselves, but I think it might be nice if
LEMON offered different options (e.g. a 'null' checker which always
returns UNKNOWN, a fast checker which has some memory cost, and a slow
checker which does not.) Then the DEBUG mode you suggest could be
implemented using one of these.

	Thanks,

	Duraid

> Regards,
> Alpar
>