[Lemon-user] Infinite loop in LEMON Cost Scaling
Lobron, David
dlobron at akamai.com
Tue Oct 3 15:55:44 CEST 2017
Hi Péter,
Thank you very much for this reply! It absolutely makes sense. If you don't mind, I had two follow-up questions.
a) We use the mincost calculation in order to compute the least-expensive, minimum capacity for a series of Internet locations (data centers). Most of the time, our users expect traffic to be reasonably balanced across those datacenters. When I observe the CostScaling and NetworkSimplex results, it seems like CostScaling is a bit better at "spreading" the flow across many locations, whereas NetworkSimplex seems more likely send all of the flow to just a few locations (or even all of it to one location). Is there something about CostScaling that leads it to spread load more evenly? Is CostScaling the best algorithm to use when even spreading is desired?
b) I can certainly update my digraph object to use int64_t, as you suggest. I'm currently using int64_t as the type for both capacity and cost. My current object declarations look like this:
typedef ListDigraph LemonGraph;
typedef int64_t LemonCapacityValueType;
typedef int64_t LemonCostValueType;
typedef LemonGraph::ArcMap<LemonCapacityValueType> LemonCapacityMap;
typedef LemonGraph::NodeMap<LemonCapacityValueType> LemonSupplyMap;
typedef LemonGraph::ArcMap<LemonCostValueType> LemonCostMap;
typedef unordered_map<int, LemonGraph::Node> LemonNodeIdToNode;
typedef unordered_map<int, LemonGraph::Arc> LemonArcIdToArc;
//typedef CostScaling<LemonGraph> LemonScalingType;
typedef NetworkSimplex<LemonGraph> LemonScalingType;
(I've commented out CostScaling here, because I'm currently using NetworkSimplex, but of course that can be reversed).
My question is: is int64_t acceptable for capacity and cost? Also, would it be OK to use floating-point values for any of these? My reading of the docs implies that all these values must be integer, but I just wanted to make sure.
Thank you again for your help here- I really appreciate it!
Sincerely,
David
> On Oct 3, 2017, at 2:38 AM, Péter Kovács <kpeter at inf.elte.hu> wrote:
>
> Hi David,
>
> I checked the graph you sent me in private mail. I could reproduce this issue, but it turned out to be a numerical problem caused by the large capacity values in your input. (I guess that CS2 fails because of the same reason.)
>
> Please note that the min-cost flow algorithms in LEMON use int type for capacities and costs internally, even if your node and arc maps contain values of another type (int64_t in your case). The key point here is the line where you declare the algorithm object (which you haven't sent me).
>
> If you declare the CostScaling object like this:
> CostScaling<ListDigraph> cs;
> then it uses int type and the infinite loop issue occurs, but if you declare it this way:
> CostScaling<ListDigraph, int64_t> cs;
> then it will work as expected.
>
> It should yield an equivalent solution to NetworkSimplex's in terms of the total flow cost, but (as you noticed) the flows themselves might be different (even significantly), because these algorithms use very different approaches.
>
> I hope this helps and makes everything clear.
>
> Péter
>
>
>
>> Hi Péter,
>> Thank you very much for this fast reply.
>> I am calling CostScaling from inside a fairly complicated load-balancing program, but I will try to write a simple reproducer for this bug and send it to you privately. Meanwhile, I will try NetworkSimplex.
>> FWIW, the "cs2" program from IG Systems also goes into an infinite loop when presented with this input. We are running CostScaling with a number of different inputs, and only this one caused a problem.
>> --David
>>> On Sep 28, 2017, at 12:43 PM, Péter Kovács <kpeter at inf.elte.hu> wrote:
>>>
>>> Hi David,
>>>
>>> I didn't experience such bug in CostScaling before.
>>>
>>> Could you send me the whole input network and a simple code so that I could reproduce the issue (e.g. in a private email)?
>>>
>>> If the source of the network is non-tirivial, then you can use the DigraphWriter tool to export your graph and the related maps at the point of the code where you use CostScaling:
>>> http://lemon.cs.elte.hu/pub/doc/1.3.1/a00132.html
>>>
>>> Anyway, you can try another min-cost flow algorithm of LEMON. There are four such classes with the same API, so you can easily switch between them:
>>> http://lemon.cs.elte.hu/pub/doc/1.3.1/a00612.html
>>>
>>> I suggest you to check NetworkSimplex, because it is the fastest and most robust algorithm in most cases.
>>>
>>> Best regards,
>>> Péter
>>>
>>>
>>>
>>> 2017.09.28. 17:53 keltezéssel, Lobron, David írta:
>>>> A bit more info here: I've been stepping with a debugger in the "while true" loop, and I noticed that the last few entries in the _next_out array are changing. For example:
>>>> Breaking at line 1405:
>>>> (gdb) p _next_out
>>>> $39 = std::vector of length 103, capacity 103 = {0, 15, 30, 45, 60, 88, 90, 105, 120, 135, 163, 165, 193, 208, 210, 225, 244, 255, 270, 285, 300, 315, 330, 345, 360,
>>>> 375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855,
>>>> 870, 885, 900, 915, 930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080, 1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245, 1260, 1275, 1290,
>>>> 1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410, 1425, 1440, 1455, 1470, 1485, 1733, 2801, 2902}
>>>> A few steps later, at line 1405:
>>>> (gdb) p _next_out
>>>> $41 = std::vector of length 103, capacity 103 = {0, 15, 30, 45, 60, 88, 90, 105, 120, 135, 163, 165, 193, 208, 210, 225, 244, 255, 270, 285, 300, 315, 330, 345, 360,
>>>> 375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855,
>>>> 870, 885, 900, 915, 930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080, 1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245, 1260, 1275, 1290,
>>>> 1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410, 1425, 1440, 1455, 1470, 1485, 1759, 2801, 2902}
>>>> The third-to-last element changes from 1733 to 1759. I skipped the breakpoint 100,000 times, and observed again that only the third-to-last item had changed:
>>>> $45 = std::vector of length 103, capacity 103 = {0, 15, 30, 45, 60, 88, 90, 105, 120, 135, 163, 165, 193, 208, 210, 225, 244, 255, 270, 285, 300, 315, 330, 345, 360,
>>>> 375, 390, 405, 420, 435, 450, 465, 480, 495, 510, 525, 540, 555, 570, 585, 600, 615, 630, 645, 660, 675, 690, 705, 720, 735, 750, 765, 780, 795, 810, 825, 840, 855,
>>>> 870, 885, 900, 915, 930, 945, 960, 975, 990, 1005, 1020, 1035, 1050, 1065, 1080, 1095, 1110, 1125, 1140, 1155, 1170, 1185, 1200, 1215, 1230, 1245, 1260, 1275, 1290,
>>>> 1305, 1320, 1335, 1350, 1365, 1380, 1395, 1410, 1425, 1440, 1455, 1470, 1485, 1863, 2801, 2902}
>>>> The last_out variable, which needs to match _next_out[tip] for us to exit the loop, is equal to the second-to-last item in the array:
>>>> (gdb) p last_out
>>>> $46 = 2801
>>>> It seems like there might be an off-by-one error here, but I can't quite tell. It definitely appears that we're not augmenting _tip properly. The _target array looks like this:
>>>> (gdb) p _target
>>>> $48 = std::vector of length 3004, capacity 3004 = {100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100,
>>>> 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100,
>>>> 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101,
>>>> 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100,
>>>> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100,
>>>> 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100, 100, 100, 100,
>>>> 100, 100, 100, 100, 100, 101, 102, 100, 100, 100, 100, 100...}
>>>> I'll continue to debug, but any help you can give would be appreciated, as I'm still not entirely sure what the expected behavior is that should guarantee that we get to the augment step.
>>>> Thank you,
>>>> David
>>>>> On Sep 28, 2017, at 10:50 AM, Lobron, David <dlobron at akamai.com> wrote:
>>>>>
>>>>> Hello LEMON developers,
>>>>>
>>>>> I recently encountered an infinite-loop behavior in lemon's Cost Scaling algorithm, in the loop starting on line 1378 of cost_scaling.h (the partial augment and relabel operations). To exit the loop, the _active_nodes deque has to be empty. I am finding that the number of elements in the array never changes- it always has 102 elements. Here's a snapshot of the array in gdb:
>>>>>
>>>>> (gdb) p _active_nodes
>>>>> $3 = std::deque with 102 elements = {16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
>>>>> 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,
>>>>> 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 0, 1, 2, 3, 4, 6, 7, 8, 9, 5, 11, 10, 14, 12, 15, 13}
>>>>>
>>>>> I set breakpoints on the lines where _active_nodes is changed (1382 and 1459), but neither one was triggered.
>>>>>
>>>>> I printed out all of the arcs in my input graph that have nonzero flow. Here, the first number is the internal arc number, and the rest of the numbers respectively are the source, destination, l bound cap, u bound cap, arc flow, and arc cost:
>>>>>
>>>>> 13: src=0, dest=2, l=0, u=75915662, flow=5148395, cost=0.000000
>>>>> 42: src=5, dest=1, l=0, u=5939549, flow=5939549, cost=0.000000
>>>>> 79: src=7, dest=1, l=0, u=520947, flow=17127440, cost=90.000000
>>>>> 91: src=8, dest=1, l=0, u=143010, flow=5148395, cost=70.000000
>>>>> 195: src=0, dest=15, l=0, u=75915662, flow=5939549, cost=0.000000
>>>>> 377: src=0, dest=28, l=0, u=75915662, flow=1365077, cost=0.000000
>>>>> 639: src=47, dest=1, l=0, u=488270, flow=7171507, cost=90.000000
>>>>> 643: src=0, dest=47, l=0, u=75915662, flow=7171507, cost=0.000000
>>>>> 709: src=52, dest=1, l=0, u=943, flow=33956, cost=90.000000
>>>>> 1003: src=73, dest=1, l=0, u=934392, flow=33638135, cost=90.000000
>>>>> 1007: src=0, dest=73, l=0, u=75915662, flow=33638135, cost=0.000000
>>>>> 1031: src=75, dest=1, l=0, u=37918, flow=1365077, cost=90.000000
>>>>> 1129: src=82, dest=1, l=0, u=152544, flow=5491603, cost=90.000000
>>>>> 1133: src=0, dest=82, l=0, u=75915662, flow=5491603, cost=0.000000
>>>>> 1203: src=0, dest=87, l=0, u=75915662, flow=17127440, cost=0.000000
>>>>> 1273: src=0, dest=92, l=0, u=75915662, flow=33956, cost=0.000000
>>>>>
>>>>> I was curious as to whether you have seen this behavior before. Do this this is a bug in LEMON, or is my input graph invalid? I'd very much like to understand this. I've seen a similar behavior from the cs2 program from IG Systems, so it appears that either my input is invalid in both cases, or there's a shared bug.
>>>>>
>>>>> Thank you in advance for any help you can give.
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>
More information about the Lemon-user
mailing list