[Lemon-user] Infinite loop in LEMON Cost Scaling

Tue Oct 3 08:38:57 CEST 2017

Hi David,

I checked the graph you sent me in private mail. I could reproduce this 
issue, but it turned out to be a numerical problem caused by the large 
capacity values in your input. (I guess that CS2 fails because of the 
same reason.)

Please note that the min-cost flow algorithms in LEMON use int type for 
capacities and costs internally, even if your node and arc maps contain 
values of another type (int64_t in your case). The key point here is the 
line where you declare the algorithm object (which you haven't sent me).

If you declare the CostScaling object like this:
   CostScaling<ListDigraph> cs;
then it uses int type and the infinite loop issue occurs, but if you 
declare it this way:
   CostScaling<ListDigraph, int64_t> cs;
then it will work as expected.

It should yield an equivalent solution to NetworkSimplex's in terms of 
the total flow cost, but (as you noticed) the flows themselves might be 
different (even significantly), because these algorithms use very 
different approaches.

I hope this helps and makes everything clear.

Péter

> Hi Péter,
> 
> Thank you very much for this fast reply.
> 
> I am calling CostScaling from inside a fairly complicated load-balancing program, but I will try to write a simple reproducer for this bug and send it to you privately.  Meanwhile, I will try NetworkSimplex.
> 
> FWIW, the "cs2" program from IG Systems also goes into an infinite loop when presented with this input.  We are running CostScaling with a number of different inputs, and only this one caused a problem.
> 
> --David
> 
>> On Sep 28, 2017, at 12:43 PM, Péter Kovács <kpeter at inf.elte.hu> wrote:
>>
>> Hi David,
>>
>> I didn't experience such bug in CostScaling before.
>>
>> Could you send me the whole input network and a simple code so that I could reproduce the issue (e.g. in a private email)?
>>
>> If the source of the network is non-tirivial, then you can use the DigraphWriter tool to export your graph and the related maps at the point of the code where you use CostScaling:
>> http://lemon.cs.elte.hu/pub/doc/1.3.1/a00132.html
>>
>> Anyway, you can try another min-cost flow algorithm of LEMON. There are four such classes with the same API, so you can easily switch between them:
>> http://lemon.cs.elte.hu/pub/doc/1.3.1/a00612.html
>>
>> I suggest you to check NetworkSimplex, because it is the fastest and most robust algorithm in most cases.
>>
>> Best regards,
>> Péter
>>
>>
>>
>> 2017.09.28. 17:53 keltezéssel, Lobron, David írta:
>>> A bit more info here: I've been stepping with a debugger in the "while true" loop, and I noticed that the last few entries in the _next_out array are changing.  For example:
>>> Breaking at line 1405:
>>> (gdb) p _next_out
>>> $39 = std::vector of length 103, capacity 103 = {0, 15, 30, 45, 60, 88, 90, 105, 120, 135, 163, 165, 193, 208, 210, 225, 244, 255, 270, 285, 300, 315, 330, 345, 360,
>>>    375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855,
>>>    870, 885, 900, 915, 930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080, 1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245, 1260, 1275, 1290,
>>>    1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410, 1425, 1440, 1455, 1470, 1485, 1733, 2801, 2902}
>>> A few steps later, at line 1405:
>>> (gdb) p _next_out
>>> $41 = std::vector of length 103, capacity 103 = {0, 15, 30, 45, 60, 88, 90, 105, 120, 135, 163, 165, 193, 208, 210, 225, 244, 255, 270, 285, 300, 315, 330, 345, 360,
>>>    375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855,
>>>    870, 885, 900, 915, 930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080, 1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245, 1260, 1275, 1290,
>>>    1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410, 1425, 1440, 1455, 1470, 1485, 1759, 2801, 2902}
>>> The third-to-last element changes from 1733 to 1759.  I skipped the breakpoint 100,000 times, and observed again that only the third-to-last item had changed:
>>> $45 = std::vector of length 103, capacity 103 = {0, 15, 30, 45, 60, 88, 90, 105, 120, 135, 163, 165, 193, 208, 210, 225, 244, 255, 270, 285, 300, 315, 330, 345, 360,
>>>    375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855,
>>>    870, 885, 900, 915, 930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080, 1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245, 1260, 1275, 1290,
>>>    1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410, 1425, 1440, 1455, 1470, 1485, 1863, 2801, 2902}
>>> The last_out variable, which needs to match _next_out[tip] for us to exit the loop, is equal to the second-to-last item in the array:
>>> (gdb) p last_out
>>> $46 = 2801
>>> It seems like there might be an off-by-one error here, but I can't quite tell.  It definitely appears that we're not augmenting _tip properly.  The _target array looks like this:
>>> (gdb) p _target
>>> $48 = std::vector of length 3004, capacity 3004 = {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100,
>>>    100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
>>>    100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101,
>>>    102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100,
>>>    100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100,
>>>    100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100,
>>>    100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100...}
>>> I'll continue to debug, but any help you can give would be appreciated, as I'm still not entirely sure what the expected behavior is that should guarantee that we get to the augment step.
>>> Thank you,
>>> David
>>>> On Sep 28, 2017, at 10:50 AM, Lobron, David <dlobron at akamai.com> wrote:
>>>>
>>>> Hello LEMON developers,
>>>>
>>>> I recently encountered an infinite-loop behavior in lemon's Cost Scaling algorithm, in the loop starting on line 1378 of cost_scaling.h (the partial augment and relabel operations).  To exit the loop, the _active_nodes deque has to be empty.  I am finding that the number of elements in the array never changes- it always has 102 elements.  Here's a snapshot of the array in gdb:
>>>>
>>>> (gdb) p _active_nodes
>>>> $3 = std::deque with 102 elements = {16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
>>>>   48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
>>>>   89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 0, 1, 2, 3, 4, 6, 7, 8, 9, 5, 11, 10, 14, 12, 15, 13}
>>>>
>>>> I set breakpoints on the lines where _active_nodes is changed (1382 and 1459), but neither one was triggered.
>>>>
>>>> I printed out all of the arcs in my input graph that have nonzero flow.  Here, the first number is the internal arc number, and the rest of the numbers respectively are the source, destination, l bound cap, u bound cap, arc flow, and arc cost:
>>>>
>>>> 13: src=0, dest=2, l=0, u=75915662, flow=5148395, cost=0.000000
>>>> 42: src=5, dest=1, l=0, u=5939549, flow=5939549, cost=0.000000
>>>> 79: src=7, dest=1, l=0, u=520947, flow=17127440, cost=90.000000
>>>> 91: src=8, dest=1, l=0, u=143010, flow=5148395, cost=70.000000
>>>> 195: src=0, dest=15, l=0, u=75915662, flow=5939549, cost=0.000000
>>>> 377: src=0, dest=28, l=0, u=75915662, flow=1365077, cost=0.000000
>>>> 639: src=47, dest=1, l=0, u=488270, flow=7171507, cost=90.000000
>>>> 643: src=0, dest=47, l=0, u=75915662, flow=7171507, cost=0.000000
>>>> 709: src=52, dest=1, l=0, u=943, flow=33956, cost=90.000000
>>>> 1003: src=73, dest=1, l=0, u=934392, flow=33638135, cost=90.000000
>>>> 1007: src=0, dest=73, l=0, u=75915662, flow=33638135, cost=0.000000
>>>> 1031: src=75, dest=1, l=0, u=37918, flow=1365077, cost=90.000000
>>>> 1129: src=82, dest=1, l=0, u=152544, flow=5491603, cost=90.000000
>>>> 1133: src=0, dest=82, l=0, u=75915662, flow=5491603, cost=0.000000
>>>> 1203: src=0, dest=87, l=0, u=75915662, flow=17127440, cost=0.000000
>>>> 1273: src=0, dest=92, l=0, u=75915662, flow=33956, cost=0.000000
>>>>
>>>> I was curious as to whether you have seen this behavior before.  Do this this is a bug in LEMON, or is my input graph invalid?  I'd very much like to understand this.  I've seen a similar behavior from the cs2 program from IG Systems, so it appears that either my input is invalid in both cases, or there's a shared bug.
>>>>
>>>> Thank you in advance for any help you can give.
>>>>
>>>> Sincerely,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>
>