COIN-OR::LEMON - Graph Library

Opened 15 years ago

Last modified 5 years ago

#168 new task

Port bipartite matching algorithms

Reported by: Alpar Juttner Owned by: Alpar Juttner
Priority: critical Milestone: LEMON 1.4 release
Component: core Version: hg main
Keywords: Cc:
Revision id:

Description

This ticket is a follow-up of #48.

The following tools are affected:

  • lemon/bipartite_matching.h
  • lemon/pr_bipartite_matching.h (I really dislike this filename)
  • test/bipartite_matching_test.cc

It also depends on #69 (the port of the bipartite graphs).

Attachments (20)

bp_matching.h (34.1 KB) - added by Mohammed El-Kebir 13 years ago.
bp_matching_b08198bdd613.patch (31.6 KB) - added by Mohammed El-Kebir 13 years ago.
Maximum Weight Bipartite Matching (b08198bdd613)
bp_matching_test_751a90ada614.patch (4.0 KB) - added by Mohammed El-Kebir 13 years ago.
Test cases for Maximum Weight Bipartite Matching (751a90ada614)
h_k_1a2daf1f6bf7.patch (17.5 KB) - added by Poroszkai Daniel 13 years ago.
Hopcroft-Karp bipartite matching
h_k_45d2e43f52ea.patch (17.3 KB) - added by Poroszkai Daniel 13 years ago.
In the previous patch the algorithm was erroneous, it is corrected.
hopcroft_karp-ec5a4ceb4481.patch (19.7 KB) - added by Poroszkai Daniel 12 years ago.
Correct Hopcroft-Karp
bp_benchmark_7a790fbf8e7d.patch (19.9 KB) - added by Poroszkai Daniel 12 years ago.
Bipartite matching benchmark tools
results.txt (15.5 KB) - added by Poroszkai Daniel 12 years ago.
Benchmark results
d6df00b92912.patch (19.4 KB) - added by Poroszkai Daniel 12 years ago.
Update weighted bp matching for typesafe node sets
hopcroft_karp_55711f25c446.patch (19.8 KB) - added by Poroszkai Daniel 12 years ago.
Hopcroft-Karp
bp_benchmark_ea93dc149092.patch (20.5 KB) - added by Poroszkai Daniel 12 years ago.
Benchmark tools
df198f12f7fa.patch (9.0 KB) - added by Poroszkai Daniel 12 years ago.
Min cost flow algorithms included in the benchmark
res2.txt (17.8 KB) - added by Poroszkai Daniel 12 years ago.
Benchmark result (extended with min cost flow)
bpm_upgrade.patch (81.2 KB) - added by Poroszkai Daniel 12 years ago.
Improved Hopcroft-Karp, an two additional
bpm_upgrade.2.patch (81.2 KB) - added by Poroszkai Daniel 12 years ago.
Improved Hopcroft-Karp; two preflow based matching
res3.txt (20.6 KB) - added by Poroszkai Daniel 12 years ago.
Benchmark results
bpm_pack.patch (124.6 KB) - added by Poroszkai Daniel 12 years ago.
3180fd18651b.patch (105.7 KB) - added by Balazs Dezso 5 years ago.
results.2.txt (134.3 KB) - added by Balazs Dezso 5 years ago.
5b4317f3ce36.patch (48.0 KB) - added by Balazs Dezso 5 years ago.

Download all attachments as: .zip

Change History (56)

comment:1 Changed 15 years ago by Alpar Juttner

Summary: Port bipartite matvhing algorithmsPort bipartite matching algorithms

comment:2 Changed 15 years ago by Alpar Juttner

Milestone: LEMON 1.2 release
Priority: majorcritical
Type: defecttask

comment:3 in reply to:  description ; Changed 15 years ago by Balazs Dezso

Replying to alpar:

This ticket is a follow-up of #48.

The following tools are affected:

  • lemon/bipartite_matching.h
  • lemon/pr_bipartite_matching.h (I really dislike this filename)
  • test/bipartite_matching_test.cc

The current weighted bipartite algorithms should not be ported, because they use a very inefficient approach (successive shortest path algorithm without stopping when an unmatched node reached).

comment:4 in reply to:  3 ; Changed 15 years ago by Alpar Juttner

Replying to deba:

The current weighted bipartite algorithms should not be ported, because they use a very inefficient approach (successive shortest path algorithm without stopping when an unmatched node reached).

I don't mind. Having a slow implementation is better than having nothing. We can mention in the doc that the implementation is inefficient and may significantly change in the future.

On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.

comment:5 in reply to:  4 ; Changed 15 years ago by Peter Kovacs

Replying to alpar:

On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.

You mean "worth keeping", do you?

comment:6 in reply to:  5 ; Changed 15 years ago by Alpar Juttner

Replying to kpeter:

Replying to alpar:

On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.

You mean "worth keeping", do you?

Yes, of course.

comment:7 Changed 14 years ago by Alpar Juttner

Milestone: LEMON 1.2 releaseLEMON 1.3 release

comment:8 in reply to:  6 Changed 13 years ago by Balazs Dezso

Replying to alpar:

Replying to kpeter:

Replying to alpar:

On the other hand, the successive shortest path algorithm is the oldest and the best known approach, thus it may be worst keeping it as a reference solution, anyway.

You mean "worth keeping", do you?

Yes, of course.

No, if a SSP matching is necessary, then a new implementation should be written, because the current ones are extremely slow. I think, the current capacity scaling MCF is pretty fast, it would be necessary to add the same ideas to the matching algorithms (potential shift, stop on first found node...).

The max cardinality algorithms need really good names.

Changed 13 years ago by Mohammed El-Kebir

Attachment: bp_matching.h added

comment:9 Changed 13 years ago by Mohammed El-Kebir

I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found. The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?

comment:10 in reply to:  9 ; Changed 13 years ago by Balazs Dezso

Replying to elkebir:

I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found. The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?

I did not look at the code deeply, but could you make two changes:

  • The ticket:69 contains patch for bipartite graphs, can you port your code to use one of the graph (the class and function names can be changed, but the implementation details probably not).
  • You can also create a patch file

comment:11 in reply to:  10 Changed 13 years ago by Mohammed El-Kebir

Replying to deba:

Replying to elkebir:

I'm not sure what the proper way is to submit a patch. But I attached two implementations for maximum weight bipartite matching. MaxWeightedBipartiteMatching? is a SSP algorithm, but it works differently from the one of the 0.x branch: the invariant that is maintained states that at iteration i the current matching has maximum weight in the induced subgraph G[X_i \cup Y]. This is more efficient than doing a Dijkstra on the whole graph every augmentation phase. Furthermore the search stops as soon as a free node is found. The second implementation, MaxWeightedDenseBipartiteMatching?, is the classical Hungarian algorithm which outperforms the SSP algorithm in dense bipartite graphs. Could you please have a look at it?

I did not look at the code deeply, but could you make two changes:

  • The ticket:69 contains patch for bipartite graphs, can you port your code to use one of the graph (the class and function names can be changed, but the implementation details probably not).
  • You can also create a patch file

Thanks. The BpGraph? classes are exactly what I needed. I will port my code to use them.

Changed 13 years ago by Mohammed El-Kebir

Maximum Weight Bipartite Matching (b08198bdd613)

Changed 13 years ago by Mohammed El-Kebir

Test cases for Maximum Weight Bipartite Matching (751a90ada614)

comment:12 Changed 13 years ago by Mohammed El-Kebir

I attached a patch containing two implementations of Maximum Weight Bipartite Matching using the latest BpGraph? patch by deba. Also a patch containing test cases is attached. I made some improvements in the SSP matching algorithms, now the performance difference in dense graphs between the two methods is negligible.

Changed 13 years ago by Poroszkai Daniel

Attachment: h_k_1a2daf1f6bf7.patch added

Hopcroft-Karp bipartite matching

comment:13 Changed 13 years ago by Poroszkai Daniel

I atteched a patch implementing the Hopcroft-Karp matching algoritm. The patch uses the bipartite graph concept and the lgf_reader for bipartite graphs from ticket #69

Changed 13 years ago by Poroszkai Daniel

Attachment: h_k_45d2e43f52ea.patch added

In the previous patch the algorithm was erroneous, it is corrected.

comment:14 Changed 13 years ago by Poroszkai Daniel

This patch ( [45d2e43f52ea] ) should replace the previous, that was totally incorrect.

comment:15 Changed 12 years ago by Balazs Dezso

I'm not exactly sure, but I think the Hopcroft-Karp matching is wrong in the h_k_1a2daf1f6bf7.patch file.

The problem is that the HK algorithm has to find maximal independent shortest augmenting path set between the free red and blue nodes, but the implementation does not guarantee that it finds the shortest paths. For that, in the forward search, levels should be computed on the nodes, and the backward search should be restricted to the leveled graph.

Changed 12 years ago by Poroszkai Daniel

Correct Hopcroft-Karp

comment:16 in reply to:  15 ; Changed 12 years ago by Poroszkai Daniel

You are totally right, in the above attached patch it should be correct.

Replying to deba:

I'm not exactly sure, but I think the Hopcroft-Karp matching is wrong in the h_k_1a2daf1f6bf7.patch file.

The problem is that the HK algorithm has to find maximal independent shortest augmenting path set between the free red and blue nodes, but the implementation does not guarantee that it finds the shortest paths. For that, in the forward search, levels should be computed on the nodes, and the backward search should be restricted to the leveled graph.

comment:17 in reply to:  16 ; Changed 12 years ago by Alpar Juttner

Replying to poroszd:

You are totally right, in the above attached patch it should be correct.

  • Unfortunately, I cannot find the parent changeset of your patch in any of my repositories.
  • Have you done any running time tests on how does this version performs compared to the previous one?

comment:18 in reply to:  17 Changed 12 years ago by Poroszkai Daniel

Replying to alpar:

Replying to poroszd:

You are totally right, in the above attached patch it should be correct.

  • Unfortunately, I cannot find the parent changeset of your patch in any of my repositories.

I used the development repository, after importing the patches with the bpgraphs and the LGF reader.

  • Have you done any running time tests on how does this version performs compared to the previous one?

Well, this is roughly the version I used in the thesis (I forgot to commit here after noticing my error), so those results are valid.

comment:19 Changed 12 years ago by Balazs Dezso

Could you upload a simple benchmark for the matching algorithms?

  • Random bipartite graphs are fine with random costs
  • For unweighted case, I would like to see comparison with general matching and with the maxflow algorithm
  • For the weighted case, with the general weighted matching and the min cost flow algorithms
  • A text file, with a few test cases would be fine (including sparse and dense graphs)

comment:20 Changed 12 years ago by Peter Kovacs

Notes for benchmarking weighted matchings:

  • You can also use the NETGEN generator to create problem instances easily. It is capable of generating problem instances for the so-called assignment problem, which is indeed a weighted bipartite matching. See e.g. http://en.wikipedia.org/wiki/Assignment_problem.
  • The NETGEN generator can be obtained from here: ftp://dimacs.rutgers.edu/pub/netflow/generators/network/netgen/netgen-bcjl/ See the readme files and netgen.c for more information.
  • I agree with Balazs that comparision with min cost flow algorithms would be important. I suggest you to try CapacityScaling, CostScaling and NetworkSimplex. I guess that CapacityScaling will be the fastest for this special case, but the matching algorithms should be even more efficient. :)

comment:21 in reply to:  20 Changed 12 years ago by Alpar Juttner

Replying to kpeter:

The NETGEN generator is already integrated with benchmark tool. I wish all lemon related benchmark were done using that tool.

Changed 12 years ago by Poroszkai Daniel

Bipartite matching benchmark tools

Changed 12 years ago by Poroszkai Daniel

Attachment: results.txt added

Benchmark results

comment:22 Changed 12 years ago by Poroszkai Daniel

I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket). I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.

comment:23 Changed 12 years ago by Alpar Juttner

I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:

https://bitbucket.org/alpar/lemon-bipartite

Changed 12 years ago by Poroszkai Daniel

Attachment: d6df00b92912.patch added

Update weighted bp matching for typesafe node sets

Changed 12 years ago by Poroszkai Daniel

Hopcroft-Karp

Changed 12 years ago by Poroszkai Daniel

Benchmark tools

comment:24 in reply to:  23 ; Changed 12 years ago by Poroszkai Daniel

Replying to alpar:

I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:

https://bitbucket.org/alpar/lemon-bipartite

Sorry, I missed the changes regarding the typesafe red and blue nodes. The above patches use them.

However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.

comment:25 in reply to:  22 ; Changed 12 years ago by Peter Kovacs

Replying to poroszd:

I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket).

I checked these results. As far as I see, the bipartite matchings are usually faster than general matching and preflow. That's good. However, I suggest you to test larger graphs for which min. a few seconds is taken by the algorithms. Have you made experiments with graphs having millions of nodes and arcs?

I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.

There are more ways to do that. E.g.:

  • create a directed graph that contains a copy of each red and blue node and those arcs whose source node is red
  • add another extra node s to this digraph
  • add an arc (r,s) for each node r that was red in the bipartite graph and an arc (s,b) for each other node b
  • create a node map supply as follows:
    • supply[r] = 1 for each "red" node r
    • supply[b] = -1 for each "blue" node b
    • supply[s] = number_of_blue_nodes - number_of_red_nodes (i.e. the sum of supply[v] fo all nodes v is equal to zero)
  • create an arc map cost as follows:
    • cost[a] = -original_weight[a] for each "old" arc a
    • cost[a] = 0 for each new arc a
  • call a min cost flow algorithm as follows
      NetworkSimplex<Digraph> mcf;
      mcf.supplyMap(supply).costMap(cost).upperMap(constMap<Digraph::Arc, int>(1));
      mcf.run();
    
  • obtain the matching by reading the flow values (mcf.flow(a)).

However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.

That's too bad. IMHO, the reasons must be investigated before relasing this concept of bipartite graphs.

Changed 12 years ago by Poroszkai Daniel

Attachment: df198f12f7fa.patch added

Min cost flow algorithms included in the benchmark

Changed 12 years ago by Poroszkai Daniel

Attachment: res2.txt added

Benchmark result (extended with min cost flow)

comment:26 in reply to:  25 Changed 12 years ago by Poroszkai Daniel

Replying to kpeter:

Replying to poroszd:

I uploaded some benchmark results, and a patch containing the tools i used (it relies on the max. weighted bp. matching and the Hopcroft-Karp from this ticket).

I checked these results. As far as I see, the bipartite matchings are usually faster than general matching and preflow. That's good. However, I suggest you to test larger graphs for which min. a few seconds is taken by the algorithms. Have you made experiments with graphs having millions of nodes and arcs?

I did not included the min. cost flow algorithms in the benchmark, as I don't know how to use them to solve the maximum weighted bipartite matching.

There are more ways to do that. E.g.:

  • create a directed graph that contains a copy of each red and blue node and those arcs whose source node is red
  • add another extra node s to this digraph
  • add an arc (r,s) for each node r that was red in the bipartite graph and an arc (s,b) for each other node b
  • create a node map supply as follows:
    • supply[r] = 1 for each "red" node r
    • supply[b] = -1 for each "blue" node b
    • supply[s] = number_of_blue_nodes - number_of_red_nodes (i.e. the sum of supply[v] fo all nodes v is equal to zero)
  • create an arc map cost as follows:
    • cost[a] = -original_weight[a] for each "old" arc a
    • cost[a] = 0 for each new arc a
  • call a min cost flow algorithm as follows
      NetworkSimplex<Digraph> mcf;
      mcf.supplyMap(supply).costMap(cost).upperMap(constMap<Digraph::Arc, int>(1));
      mcf.run();
    
  • obtain the matching by reading the flow values (mcf.flow(a)).

However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.

That's too bad. IMHO, the reasons must be investigated before relasing this concept of bipartite graphs.

Thank you for the explanation, I added them.

I also uploaded some benchmark results again, but not with larger graphs, as they consume my RAM, and I think using swap will influence the result somehow oddly. Regardless, I will upload them, but it takes some time.

comment:27 Changed 12 years ago by Peter Kovacs

I agree that if you run out of physical RAM, your results won't be relevant.

According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!

On the other hand, the weighted bipartite algorithm is usually slower (by an order of magnitude) than the general matching algorithm as well as the network simplex and cost scaling min cost flow algorithms. (It's also interesting that in several cases, network simplex is the most efficient, especially for dense graphs.)

Do you have any insight about the modest performance of the weighted bp. matching? Do you have any idea to make it faster? Are you aware the bottleneck opperations of the algorithm?

comment:28 in reply to:  24 Changed 12 years ago by Balazs Dezso

Replying to poroszd:

Replying to alpar:

I still have problem with applying your patch to any available repo. Could you please create a version of these changesets applicable on top of the other changesets in #69, or to this (temporal) repo:

https://bitbucket.org/alpar/lemon-bipartite

Sorry, I missed the changes regarding the typesafe red and blue nodes. The above patches use them.

However, the weighted bipartite matching suffered a major slowdown using the typesafe node sets; I'm not sure if I messed up something or the implementation itself should be changed.

Which compiler options did you use for benchmarking? Could you also add the used hardware (CPU + RAM)? It would be also good to add these parameters to the result files.

comment:29 in reply to:  27 ; Changed 12 years ago by Balazs Dezso

Replying to kpeter:

According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!

I have few ideas to improve the HK algorithm:

  • We need a proof, that we found the optimal solution:
  • Using the std::list is usually very slow. I think, using std::vector and indices to the front and back of the list has significantly better performance.
  • I would not separate the red->blue and blue->red steps in the BFS.
  • I think, OutArcIt? and InArcIt? have usually better performance than IncEdgeIt?, you might try whether it improves the running time.
  • I don't think, that you need the current_edge structure. In the DFS part, if you visit a blue node, then you either use it the node a path or you know that there is no valid path backward to an uncovered red node. Therefore, a stack of IncEdgeIt/InArcIt? and a bool flag per node is sufficient. Note, that the bool flag is not really necessary, you can reset the level of the node. You can also avoid to use pointers.
  • I think, it is possible also to store levels only on the blue nodes, which decreases the space (but it might increase running time).

I think, it would be also good to compare this algorithm with the direct implementation of the preflow based matching. As I remember, that is extremly fast, and the augmenting path algorithms could beat it only for really sparse graphs.

Changed 12 years ago by Poroszkai Daniel

Attachment: bpm_upgrade.patch added

Improved Hopcroft-Karp, an two additional

Changed 12 years ago by Poroszkai Daniel

Attachment: bpm_upgrade.2.patch added

Improved Hopcroft-Karp; two preflow based matching

Changed 12 years ago by Poroszkai Daniel

Attachment: res3.txt added

Benchmark results

comment:30 in reply to:  29 ; Changed 12 years ago by Poroszkai Daniel

Thank You, I updated the Hopcroft-Karp following your suggestions, and added two variants of preflow based matching.

I also found some bug when initializing an elevator with non-default size. Could You check if my fix is correct?

Replying to deba:

Replying to kpeter:

According to the current results, the unweighted bipartite matching (Hopcroft–Karp) seems to be efficient, it is usually faster than (or at least competitive with) the general matching and preflow. Congrat.!

I have few ideas to improve the HK algorithm:

  • We need a proof, that we found the optimal solution:
  • Using the std::list is usually very slow. I think, using std::vector and indices to the front and back of the list has significantly better performance.
  • I would not separate the red->blue and blue->red steps in the BFS.
  • I think, OutArcIt? and InArcIt? have usually better performance than IncEdgeIt?, you might try whether it improves the running time.
  • I don't think, that you need the current_edge structure. In the DFS part, if you visit a blue node, then you either use it the node a path or you know that there is no valid path backward to an uncovered red node. Therefore, a stack of IncEdgeIt/InArcIt? and a bool flag per node is sufficient. Note, that the bool flag is not really necessary, you can reset the level of the node. You can also avoid to use pointers.
  • I think, it is possible also to store levels only on the blue nodes, which decreases the space (but it might increase running time).

I think, it would be also good to compare this algorithm with the direct implementation of the preflow based matching. As I remember, that is extremly fast, and the augmenting path algorithms could beat it only for really sparse graphs.

comment:31 in reply to:  30 ; Changed 12 years ago by Poroszkai Daniel

Uh, sorry, somehow I sent the patch twice...

comment:32 in reply to:  31 ; Changed 12 years ago by Alpar Juttner

Replying to poroszd:

Uh, sorry, somehow I sent the patch twice...

As usual, none of then I can apply to my repo:

$ hg -R /home/alpar/projects/LEMON/hg/lemon-bipartite import --exact /tmp/bpm_upgrade.2-1.patch
applying /tmp/bpm_upgrade.2-1.patch
abort: unknown revision '4928e7fa0a4062b96c9fada8abd40d3c7041bcae'!

Changed 12 years ago by Poroszkai Daniel

Attachment: bpm_pack.patch added

comment:33 in reply to:  32 ; Changed 12 years ago by Poroszkai Daniel

Replying to alpar:

This patch can be applied to the repo You linked above. It contains:

comment:34 in reply to:  33 Changed 12 years ago by Alpar Juttner

This patch can be applied to the repo You linked above. It contains:

It's much better now, thank you.

comment:35 Changed 11 years ago by Balazs Dezso

Milestone: LEMON 1.3 releaseLEMON 1.4 release

I would move this to 1.4.

Changed 5 years ago by Balazs Dezso

Attachment: 3180fd18651b.patch added

Changed 5 years ago by Balazs Dezso

Attachment: results.2.txt added

Changed 5 years ago by Balazs Dezso

Attachment: 5b4317f3ce36.patch added

comment:36 Changed 5 years ago by Balazs Dezso

I experimented with the simplified implementation of the current weighted general matching algorithms. The test implementations are in [3180fd18651b] and the running times are in results.2.txt.

Between the test implementations, I have chosen MaxWeightedBpMatching1, and I created a patch from it: [5b4317f3ce36]. This patch contains also perfect matching implementation.

Note: See TracTickets for help on using tickets.