[Lemon-user] Infinite loop in LEMON Cost Scaling

Wed Oct 4 05:42:21 CEST 2017

Hi David,

If you have the typedefs LemonCapacityValueType and LemonCostValueType, 
then you should declare the algorithm types this way:

//typedef CostScaling<LemonGraph, LemonCapacityValueType, 
LemonCostValueType> LemonScalingType;
typedef NetworkSimplex<LemonGraph, LemonCapacityValueType, 
LemonCostValueType> LemonScalingType;

If you do not specify the 2nd and 3rd template parameters of the 
classes, they default to int, which can cause numerical issues (wrong 
results, infinite loop etc.), even in NetworkSimplex.

Answering to your question: yes, int64_t is acceptable for capacity and 
cost, but floating-point values are currently not.

Regarding spreading: basically none of the min-cost flow algorithms 
strive for spreading the flow accross different paths. One algorithm 
might be better in this aspect than the others, but it is accidental. 
(It depends on the order of the nodes and arcs.)

I checked your input, but it seems to be strange for me. I do not 
understand how and why you assign the nodes and arcs to these data 
centers. However, if I guess your goal right, then I would suggest to 
run multiple min-cost flow calculations using gradually decreased 
capacity values on the incoming arcs of the destination node (dst), 
which will increase spreading. You can select the setup where an optimal 
flow is still found with exactly the same total cost, but the most arcs 
are used (i.e. where flow value > 0).

The attached file contains results for such an attempt I made with your 
input graph. It shows that when I changed the capacity of each incoming 
arc of dst to 0.04 times its original capacity, then an optimal flow can 
still be found with 0 total cost, but 152 arcs were used instead of only 
8. But for factor 0.03, the total cost is larger.

I hope this helps. If you still have questions, then let's continue in 
private email.

Regards,
Péter

> Hi Péter,
> 
> Thank you very much for this reply!  It absolutely makes sense.  If you don't mind, I had two follow-up questions.
> 
> a) We use the mincost calculation in order to compute the least-expensive, minimum capacity for a series of Internet locations (data centers).  Most of the time, our users expect traffic to be reasonably balanced across those datacenters.  When I observe the CostScaling and NetworkSimplex results, it seems like CostScaling is a bit better at "spreading" the flow across many locations, whereas NetworkSimplex seems more likely send all of the flow to just a few locations (or even all of it to one location).  Is there something about CostScaling that leads it to spread load more evenly?  Is CostScaling the best algorithm to use when even spreading is desired?
> 
> b) I can certainly update my digraph object to use int64_t, as you suggest.  I'm currently using int64_t as the type for both capacity and cost.  My current object declarations look like this:
> 
> typedef ListDigraph LemonGraph;
> typedef int64_t LemonCapacityValueType;
> typedef int64_t LemonCostValueType;
> typedef LemonGraph::ArcMap<LemonCapacityValueType> LemonCapacityMap;
> typedef LemonGraph::NodeMap<LemonCapacityValueType> LemonSupplyMap;
> typedef LemonGraph::ArcMap<LemonCostValueType> LemonCostMap;
> typedef unordered_map<int, LemonGraph::Node> LemonNodeIdToNode;
> typedef unordered_map<int, LemonGraph::Arc> LemonArcIdToArc;
> //typedef CostScaling<LemonGraph> LemonScalingType;
> typedef NetworkSimplex<LemonGraph> LemonScalingType;
> 
> (I've commented out CostScaling here, because I'm currently using NetworkSimplex, but of course that can be reversed).
> 
> My question is: is int64_t acceptable for capacity and cost?  Also, would it be OK to use floating-point values for any of these?  My reading of the docs implies that all these values must be integer, but I just wanted to make sure.
> 
> Thank you again for your help here- I really appreciate it!
> 
> Sincerely,
> 
> David
> 
>> On Oct 3, 2017, at 2:38 AM, Péter Kovács <kpeter at inf.elte.hu> wrote:
>>
>> Hi David,
>>
>> I checked the graph you sent me in private mail. I could reproduce this issue, but it turned out to be a numerical problem caused by the large capacity values in your input. (I guess that CS2 fails because of the same reason.)
>>
>> Please note that the min-cost flow algorithms in LEMON use int type for capacities and costs internally, even if your node and arc maps contain values of another type (int64_t in your case). The key point here is the line where you declare the algorithm object (which you haven't sent me).
>>
>> If you declare the CostScaling object like this:
>>   CostScaling<ListDigraph> cs;
>> then it uses int type and the infinite loop issue occurs, but if you declare it this way:
>>   CostScaling<ListDigraph, int64_t> cs;
>> then it will work as expected.
>>
>> It should yield an equivalent solution to NetworkSimplex's in terms of the total flow cost, but (as you noticed) the flows themselves might be different (even significantly), because these algorithms use very different approaches.
>>
>> I hope this helps and makes everything clear.
>>
>> Péter
>>
>>
>>
>>> Hi Péter,
>>> Thank you very much for this fast reply.
>>> I am calling CostScaling from inside a fairly complicated load-balancing program, but I will try to write a simple reproducer for this bug and send it to you privately.  Meanwhile, I will try NetworkSimplex.
>>> FWIW, the "cs2" program from IG Systems also goes into an infinite loop when presented with this input.  We are running CostScaling with a number of different inputs, and only this one caused a problem.
>>> --David
>>>> On Sep 28, 2017, at 12:43 PM, Péter Kovács <kpeter at inf.elte.hu> wrote:
>>>>
>>>> Hi David,
>>>>
>>>> I didn't experience such bug in CostScaling before.
>>>>
>>>> Could you send me the whole input network and a simple code so that I could reproduce the issue (e.g. in a private email)?
>>>>
>>>> If the source of the network is non-tirivial, then you can use the DigraphWriter tool to export your graph and the related maps at the point of the code where you use CostScaling:
>>>> http://lemon.cs.elte.hu/pub/doc/1.3.1/a00132.html
>>>>
>>>> Anyway, you can try another min-cost flow algorithm of LEMON. There are four such classes with the same API, so you can easily switch between them:
>>>> http://lemon.cs.elte.hu/pub/doc/1.3.1/a00612.html
>>>>
>>>> I suggest you to check NetworkSimplex, because it is the fastest and most robust algorithm in most cases.
>>>>
>>>> Best regards,
>>>> Péter
>>>>
>>>>
>>>>
>>>> 2017.09.28. 17:53 keltezéssel, Lobron, David írta:
>>>>> A bit more info here: I've been stepping with a debugger in the "while true" loop, and I noticed that the last few entries in the _next_out array are changing.  For example:
>>>>> Breaking at line 1405:
>>>>> (gdb) p _next_out
>>>>> $39 = std::vector of length 103, capacity 103 = {0, 15, 30, 45, 60, 88, 90, 105, 120, 135, 163, 165, 193, 208, 210, 225, 244, 255, 270, 285, 300, 315, 330, 345, 360,
>>>>>    375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855,
>>>>>    870, 885, 900, 915, 930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080, 1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245, 1260, 1275, 1290,
>>>>>    1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410, 1425, 1440, 1455, 1470, 1485, 1733, 2801, 2902}
>>>>> A few steps later, at line 1405:
>>>>> (gdb) p _next_out
>>>>> $41 = std::vector of length 103, capacity 103 = {0, 15, 30, 45, 60, 88, 90, 105, 120, 135, 163, 165, 193, 208, 210, 225, 244, 255, 270, 285, 300, 315, 330, 345, 360,
>>>>>    375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855,
>>>>>    870, 885, 900, 915, 930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080, 1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245, 1260, 1275, 1290,
>>>>>    1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410, 1425, 1440, 1455, 1470, 1485, 1759, 2801, 2902}
>>>>> The third-to-last element changes from 1733 to 1759.  I skipped the breakpoint 100,000 times, and observed again that only the third-to-last item had changed:
>>>>> $45 = std::vector of length 103, capacity 103 = {0, 15, 30, 45, 60, 88, 90, 105, 120, 135, 163, 165, 193, 208, 210, 225, 244, 255, 270, 285, 300, 315, 330, 345, 360,
>>>>>    375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855,
>>>>>    870, 885, 900, 915, 930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080, 1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245, 1260, 1275, 1290,
>>>>>    1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410, 1425, 1440, 1455, 1470, 1485, 1863, 2801, 2902}
>>>>> The last_out variable, which needs to match _next_out[tip] for us to exit the loop, is equal to the second-to-last item in the array:
>>>>> (gdb) p last_out
>>>>> $46 = 2801
>>>>> It seems like there might be an off-by-one error here, but I can't quite tell.  It definitely appears that we're not augmenting _tip properly.  The _target array looks like this:
>>>>> (gdb) p _target
>>>>> $48 = std::vector of length 3004, capacity 3004 = {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100,
>>>>>    100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
>>>>>    100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101,
>>>>>    102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100,
>>>>>    100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100,
>>>>>    100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100,
>>>>>    100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100...}
>>>>> I'll continue to debug, but any help you can give would be appreciated, as I'm still not entirely sure what the expected behavior is that should guarantee that we get to the augment step.
>>>>> Thank you,
>>>>> David
>>>>>> On Sep 28, 2017, at 10:50 AM, Lobron, David <dlobron at akamai.com> wrote:
>>>>>>
>>>>>> Hello LEMON developers,
>>>>>>
>>>>>> I recently encountered an infinite-loop behavior in lemon's Cost Scaling algorithm, in the loop starting on line 1378 of cost_scaling.h (the partial augment and relabel operations).  To exit the loop, the _active_nodes deque has to be empty.  I am finding that the number of elements in the array never changes- it always has 102 elements.  Here's a snapshot of the array in gdb:
>>>>>>
>>>>>> (gdb) p _active_nodes
>>>>>> $3 = std::deque with 102 elements = {16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
>>>>>>   48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
>>>>>>   89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 0, 1, 2, 3, 4, 6, 7, 8, 9, 5, 11, 10, 14, 12, 15, 13}
>>>>>>
>>>>>> I set breakpoints on the lines where _active_nodes is changed (1382 and 1459), but neither one was triggered.
>>>>>>
>>>>>> I printed out all of the arcs in my input graph that have nonzero flow.  Here, the first number is the internal arc number, and the rest of the numbers respectively are the source, destination, l bound cap, u bound cap, arc flow, and arc cost:
>>>>>>
>>>>>> 13: src=0, dest=2, l=0, u=75915662, flow=5148395, cost=0.000000
>>>>>> 42: src=5, dest=1, l=0, u=5939549, flow=5939549, cost=0.000000
>>>>>> 79: src=7, dest=1, l=0, u=520947, flow=17127440, cost=90.000000
>>>>>> 91: src=8, dest=1, l=0, u=143010, flow=5148395, cost=70.000000
>>>>>> 195: src=0, dest=15, l=0, u=75915662, flow=5939549, cost=0.000000
>>>>>> 377: src=0, dest=28, l=0, u=75915662, flow=1365077, cost=0.000000
>>>>>> 639: src=47, dest=1, l=0, u=488270, flow=7171507, cost=90.000000
>>>>>> 643: src=0, dest=47, l=0, u=75915662, flow=7171507, cost=0.000000
>>>>>> 709: src=52, dest=1, l=0, u=943, flow=33956, cost=90.000000
>>>>>> 1003: src=73, dest=1, l=0, u=934392, flow=33638135, cost=90.000000
>>>>>> 1007: src=0, dest=73, l=0, u=75915662, flow=33638135, cost=0.000000
>>>>>> 1031: src=75, dest=1, l=0, u=37918, flow=1365077, cost=90.000000
>>>>>> 1129: src=82, dest=1, l=0, u=152544, flow=5491603, cost=90.000000
>>>>>> 1133: src=0, dest=82, l=0, u=75915662, flow=5491603, cost=0.000000
>>>>>> 1203: src=0, dest=87, l=0, u=75915662, flow=17127440, cost=0.000000
>>>>>> 1273: src=0, dest=92, l=0, u=75915662, flow=33956, cost=0.000000
>>>>>>
>>>>>> I was curious as to whether you have seen this behavior before.  Do this this is a bug in LEMON, or is my input graph invalid?  I'd very much like to understand this.  I've seen a similar behavior from the cs2 program from IG Systems, so it appears that either my input is invalid in both cases, or there's a shared bug.
>>>>>>
>>>>>> Thank you in advance for any help you can give.
>>>>>>
>>>>>> Sincerely,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>
> 
-------------- next part --------------
FACTOR: 1
Optimal flow found: yes
Total cost: 0
ARCS USED: 8

FACTOR: 0.5
Optimal flow found: yes
Total cost: 0
ARCS USED: 12

FACTOR: 0.4
Optimal flow found: yes
Total cost: 0
ARCS USED: 20

FACTOR: 0.3
Optimal flow found: yes
Total cost: 0
ARCS USED: 20

FACTOR: 0.2
Optimal flow found: yes
Total cost: 0
ARCS USED: 36

FACTOR: 0.1
Optimal flow found: yes
Total cost: 0
ARCS USED: 70

FACTOR: 0.09
Optimal flow found: yes
Total cost: 0
ARCS USED: 76

FACTOR: 0.08
Optimal flow found: yes
Total cost: 0
ARCS USED: 80

FACTOR: 0.07
Optimal flow found: yes
Total cost: 0
ARCS USED: 86

FACTOR: 0.06
Optimal flow found: yes
Total cost: 0
ARCS USED: 100

FACTOR: 0.05
Optimal flow found: yes
Total cost: 0
ARCS USED: 122

FACTOR: 0.04
Optimal flow found: yes
Total cost: 0
ARCS USED: 152

FACTOR: 0.03
Optimal flow found: yes
Total cost: 30578690450
ARCS USED: 904

FACTOR: 0.02
Optimal flow found: yes
Total cost: 275969494650
ARCS USED: 957

FACTOR: 0.01
Optimal flow found: yes
Total cost: 529551656750
ARCS USED: 1036